Explain Codes LogoExplain Codes Logo

How do I split a list into equally-sized chunks?

python
list-chunking
memory-efficiency
performance-optimization
Anton ShumikhinbyAnton Shumikhin·Aug 27, 2024
TLDR

To chunk a list lst into equal parts of size n, use:

chunks = [lst[i:i + n] for i in range(0, len(lst), n)]

Replace n with your preferred chunk size and lst with your list. You'll get an array of smaller lists, each becoming a chunk.

Key advice when chunking lists

When you're splitting lists, especially big ones, consider memory efficiency and performance. So, a generator can often be a better fit:

def chunks(lst, n): """Yield Mister Anderson... sorry, yield successive n-sized chunks from lst.""" for i in range(0, len(lst), n): yield lst[i:i + n]

Adapting to the Python version

It's important to cater to both Python 3 range and Python 2 xrange when splitting lists:

chunks = [lst[i:i + n] for i in range(0, len(lst), n)] # Python 3 chunks = [lst[i:i + n] for i in xrange(0, len(lst), n)] # Python 2

Avoid exceptions when splitting

Ensure n > 0 to prevent any division by zero errors and halt any potential infinite loops.

Opting for numpy when needed

If you're working with numerical data or large arrays, consider using NumPy. It provides a convenient function to split an array:

import numpy as np array_chunks = np.array_split(np.array(lst), n)

Unleashing itertools with Python 3.12 and above

For those using Python 3.12 or later, itertools.batched brings built-in support for such chunking operations:

import itertools chunks = list(itertools.batched(lst, n)) # "To batch, or not to batch, that is the question."

If you need to pad the chunks, use itertools.zip_longest to fill up the incomplete chunk:

from itertools import zip_longest chunks = list(zip_longest(*[iter(lst)] * n, fillvalue=None)) # Added fillvalue can be a lifesaver. Or not.

Crafting chunks from custom iterables

For more flexible chunking, you could use iter, lambda, and islice:

from itertools import islice def gourmet_chunks(iterable, size): it = iter(iterable) # Not to be confused with IT (😂). chunk = list(islice(it, size)) while chunk: yield chunk chunk = list(islice(it, size))

This function could handle any iterable, offers on-demand chunking, and does not create interim lists for boosted performance.

Expert touches and pitfalls

Padded chunking with chain and repeat

You could use chain and repeat from itertools for consistent padding:

from itertools import chain, repeat def padded_chunks(iterable, size, padding): # Repeat after me: padding can be your best friend. return chunks(chain(iterable, repeat(padding)), size)

Watch out for padding value conflicts

Keep an eye for potential padding value conflicts. If your fill value clashes with actual data—oops!

Tailoring chunking to the data

Understand your input data. The type and length of the data can influence your chunking tactics, especially when using libraries like NumPy.