SQL COUNT function performance evaluation concerning different column types

The COUNT function can be used in the following five formats:

COUNT(*) - the classic version used to count the number of records;
COUNT(1) – alternative record of the classic version;
COUNT(primary_key) - alternative record of the classic version;
COUNT(field) — count of records, in the specified field of which there are no NULL-values;
COUNT(DISTINCT field) — counting without repeating records that do not have NULL values in the specified field.

The test_counts table contains the following fields:

id — auto-incrementing primary key (number);
fni — field without index (number or NULL);
fwi — field with index (number or NULL);
fni_nn — field without index and NULLs (number);
fwi_nn — field with index without NULLs (number).

The essence of the investigation is to sequentially add a thousand records to the table (from zero to ten million in increments of a thousand) and execute the following seven queries with COUNT after adding each tens of thousands of records:

-- Case 1: COUNT(*)
SELECT COUNT(*)
FROM [test_counts]

-- Case 2: COUNT(primary_key) SELECT COUNT([id]) FROM [test_counts]

-- Case 3: COUNT(1) SELECT COUNT(1) FROM [test_counts]

-- Case 4: COUNT(field_without_index) SELECT COUNT([fni]) FROM [test_counts]

-- Case 5: COUNT(field_with_index) SELECT COUNT([fwi]) FROM [test_counts]

-- Case 6: COUNT(DISTINCT field_without_index) SELECT COUNT(DISTINCT [fni]) FROM [test_counts]

-- Case 7: COUNT(DISTINCT field_with_index) SELECT COUNT(DISTINCT [fwi]) FROM [test_counts]

As can be seen from the graph, even with such a relatively small amount of data, all three DBMS showed a drop in the performance of the insert operation towards the end of the experiment. This effect was most pronounced in Oracle. MySQL showed the shortest time to complete the operation throughout the experiment (also MySQL results were the most stable).

MySQL

MS SQL

Oracle

For MySQL, the notion that COUNT(*) is the slowest is confirmed, but the COUNT(field_without_index) is unexpectedly the fastest. However, it is worth noting that on a smaller amount of data (up to a million records), the situation turns out to be completely different - COUNT (*) works the fastest.

MS SQL Server showed the expected results: COUNT(primary_key) was the fastest, COUNT(DISTINCT field_without) was the slowest. For smaller amounts of data, this result does not change.

Oracle results: slow work of COUNT(*) was confirmed, but all other queries showed an amazing result, very high stability of this result and complete independence from the amount of data being analyzed.

The general conclusion is to use COUNT(1) as, on average, one of the fastest options for different DBMS and different data volumes, because other options are sometimes faster, but sometimes slower.

Поиск по этому блогу

MEAL officer learning log

SQL COUNT function performance evaluation concerning different column types

Комментарии

Отправить комментарий

Популярные сообщения из этого блога

Today's activity report #20

Tips to increase the likelihood of correct geo-coding

Data Table Filters within Charts in Power BI