Select n random rows from SQL Server table
For a rapid-fire n random records from a table, use the NEWID()
function in conjunction with ORDER BY
like so:
Inject n
with your desired row count and YourTable
with your actual table name to get a new, reordered result set each time the query executes. Simple and effective for small to mid-sized tables, it's an easy win when performance isn't a primary focus.
Performance nuances for larger datasets
When your database scales up, ORDER BY NEWID()
might stumble. SQL Server will have to allocate a new GUID for each row and then reorder them. This operation can be taxing both in terms of time and resources. In situations like these, you would be better off considering more performance-oriented alternatives.
Using the TABLESAMPLE clause with large volumes of data
For massive datasets that require random sampling, the TABLESAMPLE
clause is an appealing option:
The TABLESAMPLE
clause pulls a scientifically valid random sample by selecting pages instead of rows, which contributes to a significant boost in speed for larger tables. However, take note that TABLESAMPLE
is a bit rough around the edges, it may not always return precisely 10% of rows, especially when dealing with smaller datasets.
Results fine-tuning with advanced randomization techniques
Even distribution is key, especially when filtering by a certain percentage. You can leverage the ABS
, CAST
, and BINARY_CHECKSUM
functions for this:
This formula uses checksums to each row a random disposition based on their content, aided by RAND()
to ensure the distribution retains a uniform spread across the required percentage.
Bias & randomness concerns in TABLESAMPLE
TABLESAMPLE
might introduce an element of bias in row selection - its default behavior is to retrieve random pages of data, not rows. This means a completely uniform distribution of rows is not always guaranteed.
Getting your hands dirty with RAND()
Including a seed value in RAND()
will result in the same sequence of numbers generated for a given seed in a single session. To generate random float values, employ RAND((CAST(NEWID() AS binary(4))))
which blends the randomness of NEWID()
with the numerical output of RAND()
.
Battle testing your query performance
Use SQL Server's in-built tools to measure the efficiency of your query and how it stacks up against others:
This sequence will spit out runtime and I/O statistics, and help you make an informed decision on the most performant method. Did someone say data-driven?
Targeting an exact number of rows
If your requirement revolves around retrieving a certain number of rows precisely, merging TABLESAMPLE
with common table expressions (CTEs) or subquery approaches will be ideal. This is especially necessary when you need a proportional representation from groups within the table.
Hybrid approach with a temporary filter table
Construct a temporary table with randomly assigned NEWID()
values, limit this to your required count, and join it back to the original table.
This methodology ensures we select a static number of rows and takes advantage of the intentional randomness NEWID()
brings to the table, without experiencing the drag on performance that sorting the complete table brings.
Beware of potential pitfalls
Care for larger datasets
When running queries on large datasets, ORDER BY NEWID()
is not your friend. Using it with huge datasets will tank your performance and response time. Always benchmark your query's performance on a dataset that closely mirrors your production data to keep frustratingly long runtimes at bay.
Ensuring randomness
When you're after a truly random sample, sidestep any unintentional patterns. For instance, when working with BINARY_CHECKSUM
, include enough variance in the columns used for the expression to minimize repeatable patterns.
Scientific-grade randomness
If you need scientific-grade randomness and you're considering BINARY_CHECKSUM
and RAND
for a significant application, it's prudent to consult with a statistician to verify your methodology. Someone might just thank you for it!
Was this article helpful?