Explain Codes LogoExplain Codes Logo

Fastest way to list all primes below N

python
algorithm
optimization
numpy
Anton ShumikhinbyAnton Shumikhin·Dec 20, 2024
TLDR

The Sieve of Eratosthenes is your quickest bet. It uses a boolean array where non-primes are marked False. Primes are then appended to your list:

def get_primes(n): sieve, primes = [True] * n, [] for p in range(2, n): if sieve[p]: primes.append(p) for i in range(p*p, n, p): sieve[i] = False return primes

Call get_primes(N) to swiftly retrieve primes below N.

Breaking down the algorithm

The Sieve of Eratosthenes is a simple and efficient way to list primes up to N. It works by iteratively marking multiples of a number as non-prime (False), starting from 2. With each subsequent pass, we skip multiples of already discovered primes.

Cutting computational memory by half

A valid update is ignoring even numbers after 2. No other even number can be prime, hence this tweak reduces memory usage and computational load by roughly half:

def get_primes.optimized(n): sieve = [True if i%2 != 0 else False for i in range(n)] primes = [] for p in range(2, n): if sieve[p]: primes.append(p) for i in range(p*2, n, p): sieve[i] = False # Remove the "evil" non-primes. return primes

The numpy magic carpet

Numpy can speed up prime generation due to its superpower: optimised array operations. By using numpy arrays, we can achieve hardware acceleration for bulk operations, resulting in a significant speed improvement over Python's built-in lists:

import numpy def get_primes_numpy(n): sieve = numpy.ones(n//2, dtype=bool) for i in range(3, int(n**0.5)+1, 2): if sieve[i//2]: sieve[i*i//2::i] = False # "Avada Kedavra" to non-primes Voldemort loves. return numpy.r_[2, 2*numpy.nonzero(sieve)[0][1::]+1] # (2 included as a special case)

Other contenders for the throne

While the Sieve of Eratosthenes is a classic, other techniques like the Sieve of Atkin and Sundaram's Sieve offer their own benefits.

Sieve of Atkin: the big brother

More complex but optimizegable for extremely large N, the Sieve of Atkin uses complex formulations to skip irrelevant numbers. However, it can feel like you're reading Fourier's work instead of generating primes.

Sundaram's Sieve: the lit dark horse

Sundaram's Sieve is an elegant solution, which operates by depreciating numbers from the list that follow the form i+j+2*i*j for all i, j:

def sundaram3(max_n): numbers = list(range(3, max_n+1, 2)) half = (max_n)//2 initial = 4 for step in range(3, max_n+1, 2): for i in range(initial, half, step): numbers[i] = 0 initial += 2*(step+1) if initial > half: return [2]+[x for x in numbers if x]

Think of it as a horse race, where the algorithm you choose depends on the specific quirks of your track (use-case).

Making a choice

When deciding, consider the trade-offs. For smaller tasks, the simplicity of the Sieve of Eratosthenes or Sundaram's Sieve rule. For huge datasets, the Sieve of Atkin's complexity pays off. Always cross-verify the correctness of the generated primes, and remember - you're the one wearing the sorting hat here.