Getting random row through SQLAlchemy

python

performance

sqlalchemy

database-optimization

byAnton Shumikhin·Nov 27, 2024

Retrieve a random row with SQLAlchemy by ordering your query with func.random() (for PostgreSQL or SQLite) or func.rand() (for MySQL), and then selecting the first result.

Example for PostgreSQL:

# calling all PostgreSQL users, time for a (random) roll-call!
random_row = session.query(MyModel).order_by(func.random()).first()

Example for MySQL/SQLite:

# MySQL/SQLite users, let's roll the dice!
random_row = session.query(MyModel).order_by(func.rand()).first()

Simply replace MyModel with your table class, and session with your SQLAlchemy session. This will yield a single row, selected randomly from your table.

Navigating performance complications

Impact of the `random()` function on performance

order_by(func.random()) can potentially strain your server when dealing with large databases, as the database needs to attach a random value to each row before sorting and selecting. This performance challenge is particularly significant in PostgreSQL.

Reducing performance complications

To minimize this strain, consider filtering down the dataset before using random(), or use SQLA's index-based selection. You might also consider loading only necessary columns using load_only(), which can make the processes more efficient and faster.

Advanced selection options for PostgreSQL users

PostgreSQL provides the tablesample() method, based on SQL:2003 TABLESAMPLE, which can improve performance by sampling a fraction of the table. This feature is supported in SQLAlchemy from version 1.1.

# Optimum Performance Level: PostgreSQL
from sqlalchemy import func
random_row = session.query(MyModel).options(func.tablesample('BERNOULLI', 10)).limit(1).one()

tablesample() delivers a significant performance boost as it operates on a sampled subset of data. Remember, bigger isn't always better!

Database-independent selection strategy

func.count() and offset() provide a generic solution that works across databases and scales with the data.

# There's no place like home (home being your dataset)
row_count = session.query(func.count('*')).select_from(MyModel).scalar()
random_index = random.randrange(0, row_count)
random_row = session.query(MyModel).offset(random_index).limit(1).one()

When you should look beyond `func.random()`

Tailored approaches per database

While random functions are handy, consider exploring your specific database's optimizations for selecting random rows. Institutional knowledge is power!

Testing and optimization

Before settling on your final implementation, make sure to benchmark test your solution. This ensures that the chosen method is both efficient and delivers the level of randomness you desire. Because in code, as in life, balance is key.

explain-codes / Python / Getting random row through SQLAlchemy

Linked

Sqlite - ORDER BY RAND()



Select n random rows from SQL Server table



How to pull a random record using Django's ORM?



Finding and Deleting Duplicate Rows in PostgreSQL



Grouped LIMIT in PostgreSQL: Show the first N rows for each group?

