Explain Codes LogoExplain Codes Logo

How can the Euclidean distance be calculated with NumPy?

python
euclidean-distance
numpy
vectorization
Alex KataevbyAlex Kataev·Aug 12, 2024
TLDR

Instantly calculate Euclidean distance in NumPy using np.linalg.norm:

import numpy as np # Subtract vectors, then get the norm (a.k.a. length... or what bees call "the buzzline") distance = np.linalg.norm(np.array([x1, y1]) - np.array([x2, y2]))

This line performs vector subtraction and uses the norm operation to quickly give us the Euclidean distance.

Going multidimensional

Dealing with higher dimensions? No problem! From 3D all the way to the multiverse, np.linalg.norm holds:

# Bee goes brrrrrrr in 3D! distance = np.linalg.norm(np.array([x1, y1, z1]) - np.array([x2, y2, z2]))

In np.linalg.norm the default is 'l2 norm', aka our delightful Euclidean distance.

Massively efficient vectorization

If you're swimming in a sea of points, try vectorization for a performance speedboat:

# Like a game of darts, but with multidimensional arrays points_a = np.array([(1, 2, 3), (4, 5, 6)]) points_b = np.array([(7, 8, 9), (10, 11, 12)]) # One line, many distances, such wow distances = np.linalg.norm(points_a - points_b, axis=1)

Level up: advanced use cases

Optimize your buzz

Hive-scale performance needs some optimizing nectar. Try these:

  • Continuous buzz: Keep your data in contiguous memory arrays for faster fetching.
  • Square dance: Just comparing magnitudes? Avoid the square root in np.linalg.norm by using keepdims=True.
  • Einstein, no... Einsum!: For memory-efficient and speedier computations on complex problems, trust our bee-friend np.einsum.

Practical pollen source

Euclidean distance is king in pollen locations (K-Nearest Neighbors). Here are some honey-tips:

  • Preprocessing: Normalizing or scaling data for better bee-haviour in accuracy.
  • Batch buzzing: Calculate distances in batches to make better use of CPU and memory.
  • Benchmarking: Different flowers? Try perfplot to test various methods' performance with distinct datasets.

Bee aware of pitfalls

Escape from the spider webs that slay performance:

  • Casting np.sqrt and np.sum redundantly when np.linalg.norm gives you both.
  • Misinterpreting bee directions (axis parameter) when calculating mutual distances between flowers, leading bees off-track.