Explain Codes LogoExplain Codes Logo

What kind of datatype should one use to store hashes?

sql
data-integrity
storage-optimization
hash-functions
Anton ShumikhinbyAnton Shumikhin·Dec 20, 2024
TLDR

Go for BINARY(32) when storing SHA-256 hashes, fitting their 256-bit length for optimal use of space. Here's a standard SQL column definition:

CREATE TABLE user_hashes ( -- Binary for Breakfast, Brisk and Brilliant! hash BINARY(32) NOT NULL );

This efficient design guarantees optimal storage and speedy query operations for your hash data.

Hash type and its corresponding datatype

When selecting datatypes for hashes, considering the byte size of your hash function output is vital. Here's a guide based on frequent hash types:

  • MD5: Chews data into BINARY(16) bits
  • SHA1: Crumbles data into BINARY(20) bits
  • SHA2_256: Crunches data into BINARY(32) bits
  • SHA2_512: Devours data into BINARY(64) bits

Matching the expected byte size of your chosen hash algorithm with the datatype size ensures data integrity and avoids extra storage overhead.

Fixed vs Variable: A word of caution

While VARCHAR or VARBINARY might seem attractive for their flexibility, they fail to benefit fixed-size hashes. Using BINARY brings performance advantages due to its fixed-length, which SQL Server can store and retrieve more efficiently than variable-length data.

Optimizing storage and performance

When dealing with hashes for passwords or sensitive data, storage security and query performance are key. Here's what to remember:

  • Trust the hash functions provided by your database, like SQL Server's HASHBYTES, for consistency and security.
  • Keep storage uniform by choosing a binary size that accommodates the largest hash you'll store.
  • Take note, larger binary sizes could increase backup size and possibly slow performance. Size matters, so choose wisely!

Handling gargantuan hashes

When it comes to secure applications needing big and complex hash keys from SHA3-512 or Whirlpool, you'll have to look beyond BINARY(64). Here's how:

  • For SHA3-512, go with BINARY(64) — Sha3 sha-sha-shakes it down to 512 bits.
  • For Whirlpool, outputs a 512-bit length hash? No worries — BINARY(64) fits like Cinderella’s glass slipper.

Know your hash algorithm output well before defining your table schema for best storage and performance.

Exception to every rule

While BINARY is typically your go-to choice, certain situations demand different datatypes. Keep these in mind:

  • When a hash is a part of a bigger composite key, efficiency might not be key.
  • If storing hashes temporarily or for one-time use (like in a staging table), VARBINARY might do the trick.
  • In systems where the past casts a long shadow (thanks to backward compatibility), you might stick to CHAR or VARCHAR.