Explain Codes LogoExplain Codes Logo

How to smooth a curve for a dataset

python
dataframe
pandas
data-visualization
Nikita BarsukovbyNikita Barsukov·Feb 21, 2025
TLDR

Utilize the savgol_filter from scipy.signal to smooth your dataset effortlessly. Select a window length (an odd number) and polynomial order based on the degree of variability in your data. A smaller window traces noise closely, whereas a larger window fosters smoother curves.

import numpy as np from scipy.signal import savgol_filter import matplotlib.pyplot as plt # Random dataset x = np.linspace(0, 2*np.pi, 100) y = np.sin(x) + np.random.random(100) * 0.2 # Noisy sine wave # Smooth it like butter y_smooth = savgol_filter(y, 11, 3) # Use window_length=11, polyorder=3 # Painting the original and Picasso'ed version plt.plot(x, y, label='Original - the chaos') plt.plot(x, y_smooth, color='red', label='Smoothed - the calm after the storm') plt.legend() plt.show()

Tweak 11 and 3 to notice the fight between preserving details vs achieving smoothness.

Understanding more smoothing techniques

Smoothing data can sometimes be like choosing the perfect ice cream flavor — a lot of options but only a few might "smooth" your taste buds. Besides the Savitzky-Golay filter, we have three more contestants in our Ice Cream Parlor of Smoothing Techniques.

Taming beasts with LOWESS

LOWESS (Locally Weighted Scatterplot Smoothing) is your magic wand to tame non-parametric regression beasts. It's like playing connect-the-dots but with localized subsets of the data to build a gorgeous curve capturing the underlying trend.

Mastering the art of Moving averages

The moving average is the age-old, simple, yet elegant technique to smooth time-series data. It's a game of speed vs edge behavior manifestation.

  • Cast np.cumsum for a rabbit-speed calculation, but beware, it might leave behind edge artifacts.
  • On the other hand, np.convolve with mode='same' gently preserves your output size, a vital ingredient when comparing it with the original.

Party with Fourier transform

Turn on the disco lights for periodic data, as Fourier transform enters the party. It skillfully removes low-frequency noise and captures the main beat (frequency components) of your data.

Find your 'filter' mate

Like finding a perfect partner, choose your filter (like high-pass or low-pass) based on your smoothing goals. The FFT framework helps you construct and apply the selected filter.

Edge behavior and data visualization

Edge of glory

The edge behavior is the unsung hero in data smoothing. With the moving average or filtering, the ends of your data might either do a solo dance or ask for padding. Stay cautious as it can result in artifacts.

Visual validation

Treat your eyes with the visualization of smoothed vs original data. It is like watching a before-and-after home renovation show revealing whether the smoothing was a bit over the top or just not enough.

Balancing computation and quality

Running the relay of quickness and smoothness

In the relay of computation speed vs smoothing quality, pick your champion wisely, especially when dealing with larger datasets. While the quick sprinter, like running averages, may suffice for initial laps, you might need more sophisticated techniques for the winding down phase.

Selecting the right pause

In the symphony of moving averages, the right pause (delay) is crucial. Play around with different box sizes and delay lengths to form the perfect melody matching your data.

Seek wisdom

In the journey of mastering curve smoothing, don't hesitate to seek guidance from wise data wizards or treasure books (relevant literature).