The correct formulation of a task is important

For an example, let's look at simple task: to show the book with max number of copies in the library. It's a tricky exercise and it doesn't have the correct solution.

In reality, we have three different variants of the task:

Show any book, which has a max number of copies;
Show all books, which have a max number of copies;
Show the book, which has more copies than any other book.

If there is just the one book with a max number of copies we get the same result for each SELECT query.

If there is just the one book with a max number of copies we get the same result for each SELECT query. But if there are two or more books, we get three different results:

Name of any suitable book;
List of all suitable books;
Empty set.

Task 1. Find any book by rule


-- The first solution method: using TOP

SELECT TOP 1 [b_name],

             [b_quantity]

FROM [books]

ORDER BY [b_quantity] DESC
-- The second solution method: using FETCH NEXT

SELECT [b_name],

       [b_quantity]

FROM [books]

ORDER BY [b_quantity] DESC

OFFSET 0 ROWS

FETCH NEXT 1 ROWS ONLY

Task 2. Find all books by rule


-- The first solution method: using MAX

SELECT [b_name],

       [b_quantity]

FROM [books]

WHERE [b_quantity] = (SELECT MAX([b_quantity])

                      FROM [books])

-- The second solution method: using RANK

SELECT [b_name],

       [b_quantity]

FROM (SELECT [b_name],

             [b_quantity],

             RANK() OVER (ORDER BY [b_quantity] DESC) AS [rn]

      FROM [books]) AS [temporary_data]

WHERE [rn] = 1

MySQL and MS SQL Server require explicit naming of the subquery that is the data source, but Oracle does not.

The RANK function allows you to rank the rows of the sample (i.e., place them in the 1st, 2nd, 3rd, 4th, and so on places) according to the specified condition (in our case, in descending order of the number of book copies). Books with the same number of copies will take the same places, and the books with the maximum number of copies will be in the first place. It remains only to show these books, which took first place.

Task 3. Find unique book by rule


-- The first solution method: using ALL and subquery

SELECT [b_name],

       [b_quantity]

FROM [books] AS [ext]

WHERE [b_quantity] > ALL (SELECT [b_quantity]

                          FROM [books] AS [int]

                          WHERE [ext].[b_id] != [int].[b_id])

-- The second solution method: using table expression and RANK

WITH [ranked]

     AS (SELECT [b_name],

                [b_quantity],

                    RANK()

                        OVER (

                            ORDER BY [b_quantity] DESC) AS [rank]

         FROM [books]),

     [counted]

     AS (SELECT [rank],

                COUNT(*) AS [competitors]

         FROM [ranked]

         GROUP BY [rank])

SELECT [b_name],

       [b_quantity]

FROM [ranked]

     JOIN [counted]

        ON [ranked].[rank] = [counted].[rank]

WHERE [counted].[rank] = 1

      AND [counted].[competitors] = 1

The same books table appears in the query twice, under the name ext (for the outer part of the query) and int (for the inner part of the query). This is necessary so that the DBMS can apply the selection condition presented in line 6: for each row of the ext table, select the value of the b_quantity field from all rows of the int table, except for the row that is currently considered in the ext table.

The idea behind the second option is to avoid of correlating subqueries. To do this, in lines 1-7 data is prepared with the ranking of books by the number of their copies, in lines 8-12 in the general table expression counted the number of books that occupied the same place, and in the main part of the query in lines 13-19, the received data is combined with the filter "should be the first place, and there should be only one book in the first place".

Median times after 100 query executions are:

Anyway, for any of the DBMS we get faster execution of the query built on ranking:


SELECT * FROM [subscriptions]

    JOIN (SELECT [sb_id],

             ROW_NUMBER()

                OVER (

                    PARTITION BY [sb_subscriber]

                    ORDER BY [sb_start] ASC) AS [visit]

      FROM [subscriptions]) AS [prepared]

    ON [subscriptions].[sb_id] = [prepared].[sb_id]

WHERE [visit] = 1

Поиск по этому блогу

MEAL officer learning log

The correct formulation of a task is important

Комментарии

Отправить комментарий

Популярные сообщения из этого блога

Today's activity report #20

Data Table Filters within Charts in Power BI

Beneficiaries and Disiribution database