Fitting empirical distribution to theoretical ones with Scipy (Python)?
In just three steps, you can fit an empirical distribution to a theoretical one using the stats
module from Scipy:
- Import
stats
:from scipy import stats
. - Select a theoretical distribution: for instance,
stats.norm
for Gaussian. - Fit to the data:
params = stats.norm.fit(data)
.
Here's a Python snippet example:
Check the fit quality with statistical tests or plots. Ensure the model represents your data accurately.
Exploring all available distributions
Now, let's go fishing in the pool of Scipy's distributions, looking for the one that fits your data best:
Smooth operator
When your fitted PDFs appear more like a porcupine than a snake, especially with integer data, considering smoothing:
This way, you'll get visually appealing fits that could better capture your data distribution.
Understanding unusual data
Sometimes, data can surprise us and not fit well with standard distributions. In such cases:
Selecting optimal models
When in doubt between several models, use AIC or BIC, or the log-likelihood to decide:
Predicting the future
With a fitted model, you can evaluate the probability of new data occurrences:
This could be useful for anomaly detection or hypothesis testing.
Useful scipy.stats functions
For discrete distributions, consider using:
bincount
: It's like counting M&Ms in a box. Useful for integer data.cumsum
: Think of it as filling a jar with marbles, one by one.
Also, check out resources like Wikipedia for some background on tail functions like ccdf
.
Was this article helpful?