Explain Codes LogoExplain Codes Logo

Activerecord find_each combined with limit and order

ruby
batching
query-optimization
active-record
Alex KataevbyAlex Kataev·Oct 2, 2024
TLDR

Want to apply limit and specific order to batches with find_each? Chain where, reorder, and limit like so:

Model.where('created_at > ?', 2.days.ago).reorder('created_at DESC').limit(1000).find_each do |item| # the magic happens here # remember, Rome wasn't built in a day, but this record was fetched in a second end

Note: By default, find_each orders by primary key; reorder overwrites this within the set limit. Nifty, isn't it? But remember, order continuity outside the limit isn't guaranteed.

Keys to efficient query, chunking, ordering, and limiting

Write home about Rails 6.1

Rails 6.1 enhanced find_each for descending order. Take advantage of this if you're on Rails 6.1 or higher.

When order matters - DIY batching

For earlier Rails versions, or more complex ordering, craft a custom batching method that leans on plucking ID caches and processes batches in original order. pluck those IDs right out of memory.

Mindful mixins

Beware of ActiveRecord objectives mutating during batch operations. To circumvent this, maintain order in memory.

Dodging duplicates

Sorting on non-unique columns? Keep an eye out for duplicate records. Implement a strategy to dodge duplicates in your custom methods.

Query: to batch, to order, and to limit

Caching IDs: a speedy solution

For large datasets, prioritize speed by caching IDs and then processing records in batches. This can optimize both memory usage and query efficiency.

FIELD() if you're in the MySQL club

If you're working with MySQL, consider using the FIELD() function to maintain order. Combine this with an ID cache to drive sorting efficiency into high gear.

Plucking away

Use pluck to quickly retrieve specific attributes from the database. This can reduce your memory footprint and give your processing speed a caffeine kick.

When enough is enough

Inside a custom loop, utilize yield to process precise numbers of records. Keep your sorter hat on and ensure the loop makes a swift exit when it hits the limit.

Truth of trade-offs: Efficient Queries vs. Memory Management

Efficient offsetting

Consider using order and offset in a loop. This allows you to maintain control over the record sequence. It's like having your own personal record conveyor belt.

Memory trade-offs: the cost of caching

Caching all IDs may speed things up, but it'll cost you in memory usage. Weigh the trade-offs before adopting this approach.

Large datasets? Divide and conquer

When processing large amounts of data, divide your query into manageable chunks for processing. Keep your eye on sort order whilst being efficient with memory.