Plot two histograms on single chart

python

histograms

matplotlib

data-visualization

byNikita Barsukov·Aug 28, 2024

To plot two histograms together in Python with matplotlib, you should invoke plt.hist() for each dataset. The alpha parameter controls the opacity of the histograms, which helps when the histograms overlap.

import matplotlib.pyplot as plt

# Get your data here; this is just a placeholder
data1 = [data_values_1]
data2 = [data_values_2]

# Semi-transparent histograms allow us to see overlapping regions
plt.hist(data1, alpha=0.5, label='Dataset 1')
plt.hist(data2, alpha=0.5, label='Dataset 2')

# Add the legend and show the plot
plt.legend()
plt.show()

The alpha=0.5 creates semi-transparent histograms, making overlaps visible. Adjust this value based on how clear you want the overlaps to appear.

Planning your histograms

Good histograms require selecting the right bin sizes, using distinguishable colors, and properly normalizing your data, especially when you're comparing datasets.

Bin edge consistency

Using the same bin edges makes the comparison between the histograms clear and meaningful:

import numpy as np

# Define your bins
bins = np.linspace(start, end, num_bins)
plt.hist(data1, bins=bins, alpha=0.5, label='Dataset 1')
plt.hist(data2, bins=bins, alpha=0.5, label='Dataset 2')

Histogram normalization

When comparing datasets of different sizes, histograms should be normalized to compare their shapes:

plt.hist(data1, bins=bins, alpha=0.5, density=True, label='Dataset 1')
plt.hist(data2, bins=bins, alpha=0.5, density=True, label='Dataset 2')

Normalization makes the area under the histogram curve sum to 1, ensuring comparability.

Dealing with different scales

For histograms with different scales, twinx() can be used to create a secondary y-axis:

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist(data1, alpha=0.5, label='Dataset 1')
ax2.hist(data2, alpha=0.5, label='Dataset 2', color='red')

Using colors and labels

Distinguishing your datasets with color and label brings clarity and meaning:

plt.hist(data1, bins=bins, color='skyblue', alpha=0.7, label='Data group 1')
plt.hist(data2, bins=bins, color='salmon', alpha=0.7, label='Data group 2')

Labels are key when you're dealing with overlaid histograms. Always pair them with the plt.legend() function.

Advanced techniques and troubleshooting

Preventing data overlap

Ensure no histogram hides the other:

Shift the bin edges.
Use transparent colors.
Adjust the zorder parameter.

Using weights in histograms

To balance differently-sized samples, weights can be helpful:

# Assume weight1 and weight2 are the weights for each data point
plt.hist(data1, weights=weight1, bins=bins, alpha=0.6, label='Weighted Dataset 1')
plt.hist(data2, weights=weight2, bins=bins, alpha=0.6, label='Weighted Dataset 2')

Dynamic data for examples

For illustration, random.gauss() can be used to generate data:

import random

# Generate random data
data_demo = [random.gauss(mu, sigma) for _ in range(1000)]

Clearing axis for new plots

Clear the axis to avoid confusion with old plots:

plt.cla()  # Clears the current axes like an over-enthusiastic housemaid
plt.clf()  # Clears the entire figure like a spy wiping their existence

explain-codes / Python / Plot two histograms on single chart

Linked

Named colors in matplotlib



Improve subplot size/spacing with many subplots



How to set common axes labels for subplots



Removing white space around a saved image



How do I change the size of figures drawn with Matplotlib?



How can I get the color halfway between two colors?



Pandas get rows which are NOT in other dataframe



Planning your histograms Using colors and labels Advanced techniques and troubleshooting