The correct formulation of a task is important

For an example, let's look at simple task: to show the book with max number of copies in the library. It's a tricky exercise and it doesn't have the correct solution.

In reality, we have three different variants of the task:

  1. Show any book, which has a max number of copies;
  2. Show all books, which have a max number of copies;
  3. Show the book, which has more copies than any other book.

If there is just the one book with a max number of copies we get the same result for each SELECT query.

If there is just the one book with a max number of copies we get the same result for each SELECT query. But if there are two or more books, we get three different results:

  1. Name of any suitable book;
  2. List of all suitable books;
  3. Empty set.
Task 1. Find any book by rule

-- The first solution method: using TOP
SELECT TOP 1 [b_name],
             [b_quantity]
FROM [books]
ORDER BY [b_quantity] DESC

-- The second solution method: using FETCH NEXT
SELECT [b_name],
       [b_quantity]
FROM [books]
ORDER BY [b_quantity] DESC
OFFSET 0 ROWS
FETCH NEXT 1 ROWS ONLY

Task 2. Find all books by rule

-- The first solution method: using MAX
SELECT [b_name],
       [b_quantity]
FROM [books]
WHERE [b_quantity] = (SELECT MAX([b_quantity])
                      FROM [books])

-- The second solution method: using RANK
SELECT [b_name],
       [b_quantity]
FROM (SELECT [b_name],
             [b_quantity],
             RANK() OVER (ORDER BY [b_quantity] DESC) AS [rn]
      FROM [books]) AS [temporary_data]
WHERE [rn] = 1

MySQL and MS SQL Server require explicit naming of the subquery that is the data source, but Oracle does not.

The RANK function allows you to rank the rows of the sample (i.e., place them in the 1st, 2nd, 3rd, 4th, and so on places) according to the specified condition (in our case, in descending order of the number of book copies). Books with the same number of copies will take the same places, and the books with the maximum number of copies will be in the first place. It remains only to show these books, which took first place.

Task 3. Find unique book by rule

-- The first solution method: using ALL and subquery
SELECT [b_name],
       [b_quantity]
FROM [books] AS [ext]
WHERE [b_quantity] > ALL (SELECT [b_quantity]
                          FROM [books] AS [int]
                          WHERE [ext].[b_id] != [int].[b_id])

-- The second solution method: using table expression and RANK
WITH [ranked]
     AS (SELECT [b_name],
                [b_quantity],
                    RANK()
                        OVER (
                            ORDER BY [b_quantity] DESC) AS [rank]
         FROM [books]),
     [counted]
     AS (SELECT [rank],
                COUNT(*) AS [competitors]
         FROM [ranked]
         GROUP BY [rank])
SELECT [b_name],
       [b_quantity]
FROM [ranked]
     JOIN [counted]
        ON [ranked].[rank] = [counted].[rank]
WHERE [counted].[rank] = 1
      AND [counted].[competitors] = 1

The same books table appears in the query twice, under the name ext (for the outer part of the query) and int (for the inner part of the query). This is necessary so that the DBMS can apply the selection condition presented in line 6: for each row of the ext table, select the value of the b_quantity field from all rows of the int table, except for the row that is currently considered in the ext table.

The idea behind the second option is to avoid of correlating subqueries. To do this, in lines 1-7 data is prepared with the ranking of books by the number of their copies, in lines 8-12 in the general table expression counted the number of books that occupied the same place, and in the main part of the query in lines 13-19, the received data is combined with the filter "should be the first place, and there should be only one book in the first place".

Median times after 100 query executions are:

Anyway, for any of the DBMS we get faster execution of the query built on ranking:

SELECT * FROM [subscriptions]
    JOIN (SELECT [sb_id],
             ROW_NUMBER()
                OVER (
                    PARTITION BY [sb_subscriber]
                    ORDER BY [sb_start] ASC) AS [visit]
      FROM [subscriptions]) AS [prepared]
    ON [subscriptions].[sb_id] = [prepared].[sb_id]
WHERE [visit] = 1

Комментарии

Популярные сообщения из этого блога

MapBox-compatible file formats and limitations