Dear readers of our blog, we'd like to recommend you to visit the main page of our website, where you can learn about our product SQLS*Plus and its advantages.
 
SQLS*Plus - best SQL Server command line reporting and automation tool! SQLS*Plus is several orders of magnitude better than SQL Server sqlcmd and osql command line tools.
 

REQUEST COMPLIMENTARY SQLS*PLUS LICENCE

Enteros UpBeat offers a patented database performance management SaaS platform. It proactively identifies root causes of complex revenue-impacting database performance issues across a growing number of RDBMS, NoSQL, and deep/machine learning database platforms. We support Oracle, SQL Server, IBM DB2, MongoDB, Casandra, MySQL, Amazon Aurora, and other database systems.

How to make SELECT COUNT(*) requests very fast

17 July 2020

When you run SELECT COUNT(*), the speed of results depends largely on the structure and settings of the database. Let’s do a survey on the Votes table in the Stack Overflow database – 300 GB version 2018-06, where the Votes table contains 150784380 rows and occupies about 5.3 GB of space.

I am going to make 3 measurements for each method:

  • How many pages it reads (with SET STATISTICS IO ON installed).
  • How much CPU time it uses (with SET STATISTICS TIME ON).
  • How fast does it run?

Do not dwell on small differences in operations – I am writing this post to show you the differences in general features and how my idea works when comparing different operations. In your environment with differences in iron, system version, etc. you can get other results, and this is good.

There are also other dimensions to these methods, depending on your own performance requirements: memory allocation, ability to perform without locks, and even the accuracy of the results in competitive queries. For the purpose of our tests, I will not talk about isolation levels or locks.

I run these tests on SQL Server 2019 (15.0.2070.41) on an 8-core 64G RAM virtual machine.

Plain COUNT (*) only with clustered line storage index and compatibility level 2017 and earlier

ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 140;
GO
SELECT COUNT(*) FROM dbo.Votes;
GO

 

The Votes table is only about 5.3GB in size, so I can put it all in a cache on my SQL Server. Even after the first query and data caching in RAM, it is still not fast:

  • Reading pages: 694389
  • CPU: 14.5 seconds processor time
  • Duration: 2 seconds

Compatibility Level 2019 (batch mode on line indices)

ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 150;
GO
SELECT COUNT(*) FROM dbo.Votes;
GO

 

In SQL Server 2019, there are batch mode operations on string indices, initially available only on column indexes. The benefits here are quite large, although we still deal only with string indices:

  • Reading pages: 694379
  • CPU: 5.2 seconds processor time
  • Duration: 0.7 sec.

A sharp reduction in CPU time is due to the batch mode. This is not obvious in terms of execution until you point the mouse at individual operators:

Compatibility Level 2019 (batch mode on line indices)

Batch mode is perfect for multiple requests that generate reports, aggregating large amounts of data.

Adding non-clusterized string indices, but using mode 2017 and earlier

I’m going to create an index on each of the 5 columns in the dbo.Votes table and then compare their sizes using sp_BlitzIndex:

CREATE INDEX IX_PostId ON dbo.Votes(PostId);
GO
CREATE INDEX IX_UserId ON dbo.Votes(UserId);
GO
CREATE INDEX IX_BountyAmount ON dbo.Votes(BountyAmount);
GO
CREATE INDEX IX_VoteTypeId ON dbo.Votes(VoteTypeId);
GO
CREATE INDEX IX_CreationDate ON dbo.Votes(CreationDate);
GO
sp_BlitzIndex @TableName = 'Votes';
GO

 

Check the number of rows in each index relative to its size. When SQL Server needs to calculate the number of rows in a table, it is smart enough to look at which of the objects is the smallest, and then use it to calculate.

Indexes can have different sizes depending on the data types of the indexed content, the size of the content of each line, the number of NULL values, etc.

Adding non-clusterized string indices, but using mode 2017 and earlier

I will return to Compatibility Level 2017 (removing the batch mode operation) and then do the counting:

ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 140;
GO
SELECT COUNT(*) FROM dbo.Votes;
GO

 

SQL Server chooses the BountyAmount index, one of the smaller 2Gb:

SQL Server chooses the BountyAmount index, one of the smaller 2Gb:

We save on reading fewer pages, but still read the same number of lines – 150M, so the processor time and duration does not actually change:

  • Reading pages: 263322
  • CPU: 14.8 seconds processor time
  • Duration: 2 seconds

If you want to reduce the CPU time and duration, you really need a different approach to counting – and batch mode operation will help.

Batch mode 2019 with unclustered string indices

So, let’s now test the batch mode operation with the available indexes:

ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 150;
GO
SELECT COUNT(*) FROM dbo.Votes;
GO

 

Here we still use the BountyAmount index and make the same number of reads as in variant #3, but we get a shorter processor time and step #2 duration:

  • Page Reading: 694379
  • CPU: 4.3 seconds processor time
  • Duration: 0.6 seconds

So far, it’s a winner. But let us remember that the batch mode was originally implemented in connection with column indexes, which are excellent tools for querying report generation….

Non-colonstructed column index with batch mode

I am intentionally working here in 2017 compatibility mode to find out what the reason is:

CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_BountyAmount ON dbo.Votes(BountyAmount);
GO
ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 140;
GO
SELECT COUNT(*) FROM dbo.Votes;
GO

 

The execution plan contains the scan operator for our new column index, and all operators in the plan refer to batch mode:

The execution plan contains the scan operator for our new column index

This is where I have to change the units of measurement:

  • Reading pages: 73922
  • CPU: 15 milliseconds
  • Duration: 21 milliseconds

Prospects: I know some developers who try to use system tables to quickly count the number of rows, but even they can not achieve such results in speed.

What you need to do to get SELECT COUNT(*) queries done quickly

In descending order of preference and speed, the best results first:

  • Have SQL server 2017 or newer, and build a column index on the table.
  • Have any version that supports batch mode on column indexes, and build a column index on a table – although your experience will vary greatly depending on the type of your query. To get to know the specifics, read about column indexes, especially those Niko articles that mention the word “batch” in the title.
  • Have SQL SERVER 2019 or newer and set the compatibility level to 150 (2019), even with string indices. You can still significantly reduce CPU usage thanks to the batch mode on line storage. This is really easy to do – you’ll probably need minimal changes to your application and database schema – although you won’t have surprisingly fast millisecond responses that a column index can provide.
 
Tags: , , , , ,

MORE NEWS

 

Preamble​​NoSql is not a replacement for SQL databases but is a valid alternative for many situations where standard SQL is not the best approach for...

Preamble​​MongoDB Conditional operators specify a condition to which the value of the document field shall correspond.Comparison Query Operators $eq...

5 Database management trends impacting database administrationIn the realm of database management systems, moreover half (52%) of your competitors feel...

The data type is defined as the type of data that any column or variable can store in MS SQL Server. What is the data type? When you create any table or...

Preamble​​MS SQL Server is a client-server architecture. MS SQL Server process starts with the client application sending a query.SQL Server accepts,...

First the basics: what is the master/slave?One database server (“master”) responds and can do anything. A lot of other database servers store copies of all...

Preamble​​Atom Hopper (based on Apache Abdera) for those who may not know is an open-source project sponsored by Rackspace. Today we will figure out how to...

Preamble​​MongoDB recently introduced its new aggregation structure. This structure provides a simpler solution for calculating aggregated values rather...

FlexibilityOne of the most advertised features of MongoDB is its flexibility.  Flexibility, however, is a double-edged sword. More flexibility means more...

Preamble​​SQLShell is a cross-platform command-line tool for SQL, similar to psql for PostgreSQL or MySQL command-line tool for MySQL.Why use it?If you...

Preamble​​Writing an application on top of the framework on top of the driver on top of the database is a bit like a game on the phone: you say “insert...

Preamble​​Oracle Coherence is a distributed cache that is functionally comparable with Memcached. In addition to the basic function of the API cache, it...

Preamble​​IBM pureXML, a proprietary XML database built on a relational mechanism (designed for puns) that offers both relational ( SQL / XML ) and...

  What is PostgreSQL array? In PostgreSQL we can define a column as an array of valid data types. The data type can be built-in, custom or enumerated....

Preamble​​If you are a Linux sysadmin or developer, there comes a time when you need to manage an Oracle database that can work in your environment.In this...

Preamble​​Starting with Microsoft SQL Server 2008, by default, the group of local administrators is no longer added to SQL Server administrators during the...