Explain Codes LogoExplain Codes Logo

Plot two histograms on single chart

python
histograms
matplotlib
data-visualization
Nikita BarsukovbyNikita Barsukov·Aug 28, 2024
TLDR

To plot two histograms together in Python with matplotlib, you should invoke plt.hist() for each dataset. The alpha parameter controls the opacity of the histograms, which helps when the histograms overlap.

import matplotlib.pyplot as plt # Get your data here; this is just a placeholder data1 = [data_values_1] data2 = [data_values_2] # Semi-transparent histograms allow us to see overlapping regions plt.hist(data1, alpha=0.5, label='Dataset 1') plt.hist(data2, alpha=0.5, label='Dataset 2') # Add the legend and show the plot plt.legend() plt.show()

The alpha=0.5 creates semi-transparent histograms, making overlaps visible. Adjust this value based on how clear you want the overlaps to appear.

Planning your histograms

Good histograms require selecting the right bin sizes, using distinguishable colors, and properly normalizing your data, especially when you're comparing datasets.

Bin edge consistency

Using the same bin edges makes the comparison between the histograms clear and meaningful:

import numpy as np # Define your bins bins = np.linspace(start, end, num_bins) plt.hist(data1, bins=bins, alpha=0.5, label='Dataset 1') plt.hist(data2, bins=bins, alpha=0.5, label='Dataset 2')

Histogram normalization

When comparing datasets of different sizes, histograms should be normalized to compare their shapes:

plt.hist(data1, bins=bins, alpha=0.5, density=True, label='Dataset 1') plt.hist(data2, bins=bins, alpha=0.5, density=True, label='Dataset 2')

Normalization makes the area under the histogram curve sum to 1, ensuring comparability.

Dealing with different scales

For histograms with different scales, twinx() can be used to create a secondary y-axis:

fig, ax1 = plt.subplots() ax2 = ax1.twinx() ax1.hist(data1, alpha=0.5, label='Dataset 1') ax2.hist(data2, alpha=0.5, label='Dataset 2', color='red')

Using colors and labels

Distinguishing your datasets with color and label brings clarity and meaning:

plt.hist(data1, bins=bins, color='skyblue', alpha=0.7, label='Data group 1') plt.hist(data2, bins=bins, color='salmon', alpha=0.7, label='Data group 2')

Labels are key when you're dealing with overlaid histograms. Always pair them with the plt.legend() function.

Advanced techniques and troubleshooting

Preventing data overlap

Ensure no histogram hides the other:

  • Shift the bin edges.
  • Use transparent colors.
  • Adjust the zorder parameter.

Using weights in histograms

To balance differently-sized samples, weights can be helpful:

# Assume weight1 and weight2 are the weights for each data point plt.hist(data1, weights=weight1, bins=bins, alpha=0.6, label='Weighted Dataset 1') plt.hist(data2, weights=weight2, bins=bins, alpha=0.6, label='Weighted Dataset 2')

Dynamic data for examples

For illustration, random.gauss() can be used to generate data:

import random # Generate random data data_demo = [random.gauss(mu, sigma) for _ in range(1000)]

Clearing axis for new plots

Clear the axis to avoid confusion with old plots:

plt.cla() # Clears the current axes like an over-enthusiastic housemaid plt.clf() # Clears the entire figure like a spy wiping their existence