Re: sliced find in Linq

by Frans Bouma [C# MVP] on 10/31/2007 7:35:00 PM Marc Gravell wrote:

> > take the 2nd page in a query on northwind employee filtered on order
> I went for customers, but same horse...
> The query below is deliberately not particularly well-done, so no
> jubes please! So basically - some things will work on SQL Server
> 2000, but you may wish to limit yourself to simple paged operations.
> I didn't want to start getting into workarounds for this specific
> query (perhaps a distinct field or view etc) - the more important
> point is "tread carefully" - or upgrade to SQL Server 2005 if you
> want to do anything non-trivial with LINQ.
>
> C#/LINQ:
> string country = "France";
> var query = from c in ctx.Customers
> join o in ctx.Orders on
> c.CustomerID equals o.CustomerID
> where o.ShipCountry == country
> orderby c.CompanyName
> select c;
> query = query.Skip(10).Take(10);
>
> Using SQL 2000:
> [Microsoft SQL Server 2000 - 8.00.760 (Intel X86)
> Standard Edition on Windows NT 5.0 (Build 2195; Service Pack 4)]
>
> Throws NotSupportedException:
> <q>This provider supports Skip() only over ordered queries returning
> entities or projections that contain all identity columns, where the
> query is a single-table (non-join) query, or is a Distinct, Except,
> Intersect, or Union (not Concat) operation.</q>

   Strange isn't it?

   Because all it takes is this:
1) create temptable with the same columns as the resultset + 1 identity
column, which is the field we'll be selecting on
2) use an INSERT INTO #temptable <normal select> query where you insert
a TOP (pagesize * (pagenumber+1) into the SELECT, so you only insert
the data you need at most, not the big resultset
3) issue a SELECT over the temptable which filters on the identity
column to select the page.
4) optional: drop the temptable.

   This isn't that slow as Microsoft wants you to believe, because if it
would, a lot of operations in the RDBMS would be 'terribly slow'
because a lot of operations use temptables under the hood.

   One could optimize this perhaps with an in-memory table type though
then support for sqlserver 7 will be out the window.

   The thing is that this is generic code, so you can create a routine
which you send a SQL query to and it can create the SQL statements for
you, all batched into 1 statement to send to the DB.

> Using SQL Express 2005:
> [Microsoft SQL Server 2005 - 9.00.3042.00 (Intel X86)
> Express Edition on Windows NT 5.1 (Build 2600: Service Pack 2)]
>
> SELECT TOP 10 [t2].[CustomerID], [t2].[CompanyName],
> [t2].[ContactName], [t2].[ContactTitle], [t2].[Address], [t2].[City],
> [t2].[Region], [t2].[PostalCode], [t2].[Country], [t2].[Phone],
> [t2].[Fax] FROM ( SELECT ROW_NUMBER() OVER (ORDER BY
> [t0].[CompanyName]) AS [ROW_NUMBER], [t0].[CustomerID],
> [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle],
> [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode],
> [t0].[Country], [t0].[Phone], [t0].[Fax] FROM [dbo].[Customers] AS
> [t0] INNER JOIN [dbo].[Orders] AS [t1] ON [t0].[CustomerID] =
> [t1].[CustomerID] WHERE [t1].[ShipCountry] = @p0 ) AS [t2]
> WHERE [t2].[ROW_NUMBER] > @p1 ORDER BY [t2].[CompanyName]
> ------------------------------- @p0 [String]: 'France' @p1 [Int32]: 10

   I find it interesting why they didn't go for WITH to create a CTE.

   Btw, I find the answer from the MS employee on your post in the Linq
forum not that great, it sounds like an excuse to not have to support
SQLSERVER 2000 that much, as that would hurt sales.

      FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 

Re: sliced find in Linq

by Frans Bouma [C# MVP] on 10/31/2007 7:37:00 PM Nicholas Paldino [.NET/C# MVP] wrote:

> I've been thinking about this a lot, and I have to say, I really
> dislike the implementation of skip and take on the T-SQL side.
>
> The problem I have is that the T-SQL side, the ROW_NUMBER ranking
> function is being used to produce an ordered result, and that order
> is used to determine what is taken, and what is skipped.
>
> The reason I have a problem with this is that the ROW_NUMBER function
> requires an order, an order that is not expressed on the .NET side of
> things. It just orders by the fields as they occur in the database.
> While it's great that they are going to do this consistently, it's
> something that just shouldn't be allowed when there is not an order
> operation on the query itself. Without an order specified, in SQL
> (as it should be in LINQ, I believe) the result should be
> indeterminate, which would make Take and Skip worthless.

   ROW_NUMBER requires an ordering, though you can do:
WITH __actualSet AS
(
    SELECT *,
        ROW_NUMBER() OVER (ORDER BY CURRENT_TIMESTAMP) AS __rowcnt
    FROM
    (
        -- the query to page here
    ) AS _tmpSet
)
SELECT * FROM __actualSet
WHERE [__rowcnt] > @rownumberLastRowOfPreviousPage
    AND [__rowcnt] <= @rownumberLastRowOfPage
ORDER BY [__rowcnt] ASC

   (ref:
http://weblogs.asp.net/fbouma/archive/2007/06/05/sqlserver-2005-paging-t
here-is-a-generic-wrapper-query-possible.aspx

   why they didn't use this is beyond me.

      FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 

Re: sliced find in Linq

by Frans Bouma [C# MVP] on 10/31/2007 7:41:00 PM Nicholas Paldino [.NET/C# MVP] wrote:

> I agree, it should be in the provider level. In this specific case,
> against a SQL Server database, where, if order is not defined, the
> order is, well, undefined.

   actually, by definition, paging over an unordered set is meaningless,
as it's undefined what's in page n+1.

   The lack of specifying an ordering in the main query comes from the
perception that a SELECT statement is an ordered set, but that's by
definition not true: SELECT has no defined order unless an ordering has
been specified, so if you specify SELECT without an order by, the row
order you'll receive is by definition undefined.

   Using TOP or a paging construct over such a set is then thus actually
bogus, because it's undefined what rows will be returned.

   In fact, it's that bad that it might be better to throw an exception
if there's no ordering specified when either skip or take has been
specified, because it's unclear what will be returned at runtime which
could lead to unclear bugs.

      FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 

Re: sliced find in Linq

by Frans Bouma [C# MVP] on 10/31/2007 7:44:00 PM Carl Daniel [VC++ MVP] wrote:

> Frans Bouma [C# MVP] wrote:
> > Carl Daniel [VC++ MVP] wrote:
> > > The only undesirable thiing I see in the SQL above is that order
> > > by over all of the returned columns. If Customers is a large
> > > table, this will result in SQL server doing the sort in tempdb.
> > > Since the original query was unordered, I'd like the above SQL to
> > > simply order by the primary key, since that alone must define a
> > > unique ordering, which is all that's required for this to work.
> > > (Perhaps your table doesn't have a primary key though?)
> >
> > The ROWNUMBER OVER statement requires an ordering, so normally you'd
> > place either the ordering of the inner sql there or a new one (the
> > PK field(s)).
>
> Yes, of course it does. But in this case, the query as stated in C#
> didn't specify the ordering, so any ordering could have been used.
> It appears that LINQ chose to use the most expensive ordering
> possible, when it should use the least expensive when no ordering was
> specified in the original expression. Hence my question to Jon - did
> this table have a primary key? If it didn't (and no unique keys
> either), then this is the best LINQ could do, but if any unique keys
> exist, LINQ should use one of those - perhaps it does, but I can't
> tell from this example.

   As I explained in another post in a different branch in this thread
;), paging or using TOP over a query without an ordering is
meaningless, as SELECT has by definition no specified order, it's
undefined: it's not defined in which order the rows will be returned to
you. So using TOP or a paging construct over such a set which isn't
ordered by a specified ordering will return a rowset that's by
definition undefined as you don't know the order of the set you page
over / limit the resultset on. :)

   A lot of developers have the perception that SELECT has always the
same order, but that's not true if there's no ORDER BY clause: the
RDBMS can re-order rows because of mempage constraints or other
internal housekeeping.

      FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 

Re: sliced find in Linq

by Marc Gravell on 10/31/2007 8:10:00 PM > actually, by definition, paging over an unordered set is meaningless,
> as it's undefined what's in page n+1.

I agree with you (see my other posts) - but /technically/ even then an
order clause (by itself) is not really enough to guarantee anything -
unless that order clause includes something unique, such as a primary
key. Of course there may-or-may-not be a primary key, and there may-or-
may-not be a clustering, and the primary key may-or-may-not be itself
clustered. Keeping it simple has advantages, but it would be tempting
to see if any sensible and simple rules (as in: more than all columns
in order) could be proposed for an implicit ordering... but in reality
in most cases it isn't going to matter much - so perhaps if it *does*
matter, the dev should remember to declare an order...

I think I've talked myself full-circle there, but that's life ;-p

 

Re: sliced find in Linq

by Marc Gravell on 10/31/2007 8:41:00 PM
> 2) use an INSERT INTO #temptable <normal select> query where you insert
> a TOP (pagesize * (pagenumber+1) into the SELECT, so you only insert
> the data you need at most, not the big resultset
A small note here is that this gets expensive when the user clicks on
the "last page" (page 11234 of 11234) button; I know that you know
this - I'm just saying...

> 4) optional: drop the temptable.
In a stored-procedure, yes - but isn't that mandatory in a flat
command text approach?

> perhaps with an in-memory table type
Generally, yes; but for larger volumns sometimes temp-tables are
preferable. Depends on the scenario, so very hard to write a generic
handler...

> I find it interesting why they didn't go for WITH to create a CTE
Yes; even when you can do the same thing either way, the WITH syntax
is often easier read than a nested query. Perhaps the SQL optimiser
prefers it? Dunno.

> Btw, I find the answer from the MS employee on your post in the Linq
> forum not that great
Well, maybe; but getting a statement of the differences is a step in
the right direction. But yes; the fact that it breaks (rather than
working sub-optimally via the #table approach) is a little surprising.

 

Re: sliced find in Linq

by Frans Bouma [C# MVP] on 11/1/2007 7:59:00 PM Marc Gravell wrote:
> > 2) use an INSERT INTO #temptable <normal select> query where you
> > insert a TOP (pagesize * (pagenumber+1) into the SELECT, so you
> > only insert the data you need at most, not the big resultset
> A small note here is that this gets expensive when the user clicks on
> the "last page" (page 11234 of 11234) button; I know that you know
> this - I'm just saying...

   Yes, that's a problem.
   However, a developer of software which offers thousands of pages of
result to a request should ask him/herself this: will the user traverse
tens of thousands or even millions of rows manually? No, in almost all
cases, the user won't. So offering thousands of pages is likely not
going to help much. ;)

   BUt I agree, it's worth mentioning and can bring down a server
unfortunately. However, as SqlServer is the only RDBMS system without
proper keywords in the SELECT statement for skip/take/paging, I think
it's more a problem of the vendor of the db system. E.g. picking oracle
or even mysql won't give you this problem ;) (ok, also not linq to sql
;)

> > 4) optional: drop the temptable.
> In a stored-procedure, yes - but isn't that mandatory in a flat
> command text approach?

   Temptables are dropped at the connection closing statement. This might
be right after the command, but it can also be that there will be other
commands over the same connection. This means that the temptable used
will occupy space in the tempdb catalog's data file. So dropping it
actively after the usage of the fetch could help performance :)

> > perhaps with an in-memory table type
> Generally, yes; but for larger volumns sometimes temp-tables are
> preferable. Depends on the scenario, so very hard to write a generic
> handler...

   yes, if you want truly the best performance for every scenario, you're
not going to cut it with temptables. They're the 'best' choice for a
generic scenario though, but not ideal...

   FB


--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------