What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of TensorFlow?

python

deep-learning

tensorflow

max-pooling

byAlex Kataev·Sep 8, 2024

When using tf.nn.max_pool, 'SAME' padding ensures the output size equals the input size by appending zero padding to the borders. 'VALID' padding, however, operates directly on the actual input, typically resulting in a smaller output size.

SAME example (2x2 filter, stride 2):

#Input:   Padded for 'SAME':  Output:
1 2         0 1 2 0            3 4
3 4   ->    0 3 4 0    ->         
          0 0 0 0    
#Padding, not swimming...yet!🌴🌊

VALID example (2x2 filter, stride 2):

# Input:     Output:
1 2          4
3 4     -> 
# No padded nonsense here, just strictly business!

With 'SAME' the dimensions are preserved, while with 'VALID' they reduce.

An in-depth dive into Padding

For an optimised neural network structure, comprehending the influence of padding on tf.nn.max_pool results is crucial. Padding fundamentally impacts your output sizes, feature map preservation, and overhead computational needs.

Unraveling the Padding Enigma

'SAME' padding guarantees that the output size mirrors the input size (assuming stride 1). It does this by padding zeros around the input edges so that every element experiences the joy of being a pooling center, crucial when you wish to keep the feature maps' dimensions constant through the layers.
'VALID' padding performs pooling without any extra padding, so you're only left with valid elements fully covered by the pooling window. This results in downsampling to keep things efficient.

Stride: Not just a chewing gum brand

With a stride > 1, 'SAME' padding can still cause output size shrinkage. However, the beauty of 'SAME' is that it tries to spread the love and padding equally - although if you have an odd dimension, it might play favorites with either the bottom or right side.
With strides > 1 and 'VALID' padding, you're basically playing hopscotch, skipping along and focusing on the valid areas. This method cuts corners (literally), leading to greater downsampling.

Handling the tricky edges

'SAME' padding makes sure the padding spreads evenly around the input. In situations with odd input dimensions, it might add an additional zero row or column at the bottom or right side, ensuring everything fits snugly.
'VALID' padding, on the other hand, keeps to the valid parts of the input without bothering with padding. This means the right-most or bottom-most sections might be left out in the cold if they're not a perfect fit.

Cracking the Numbers behind the Output

For 'SAME' padding, each output dimension is predicted by dividing the input size by the stride and taking the ceil function's value - if your input is stubborn and not a whole number, 'SAME' has got you covered by always rounding up.
For 'VALID' padding, the output size is computed as your input size minus the window size, divided by stride, plus one for good measure. However, remember to hold off on adding the extra one if there's no remainder.

Red Flags, Tips & Tricks, and Everything In Between

When choosing between 'SAME' and 'VALID' padding, consider your network's structure and objectives.

Flying the 'SAME' flag:

If you don't want to risk losing border information.
When you need the same spatial dimensions in your output feature map.
If you're fashioning a U-Net-style network where mirroring upscaling and downscaling factors matters.

Pledging allegiance to 'VALID':

If downsampling and reducing dimensions is your goal.
When the extraction of features matters more than maintaining spatial dimensions, especially deep down in your network structure.
If you're running a tight ship, 'VALID' padding often involves fewer calculations and saves computational resources.

A few nuts and bolts:

You might need to reshape the input before applying tf.nn.max_pool, depending on your padding strategy and the desired output dimensions.
Always take the time to predict your output dimensions beforehand to ensure they align with your downstream processes.
If in doubt, visualize it! Visualizing the impacts of different padding strategies, particularly when you're stacking multiple convolutional layers, will save you a headache in the long run.