Explain Codes LogoExplain Codes Logo

What's a PostgreSQL "Cluster" and how do I create one?

sql
database-management
postgresql-configuration
cluster-management
Nikita BarsukovbyNikita Barsukov·Dec 21, 2024
TLDR

A PostgreSQL "Cluster" is a grouping of the server process and the databases it manages. Use initdb to prepare the necessary directory structure and then pg_ctl to fire up the cluster:

# Let's set up a new home for our data initdb -D /desired/path/for/data # Start the magic pg_ctl -D /desired/path/for/data start

Please remember that /desired/path/for/data should point to the directory holding your PostgreSQL data files.

Understanding PostgreSQL "cluster"

PostgreSQL "Cluster" != typical computing "cluster". In PostgreSQL, a cluster indicates the postmaster - a server process and a collection of databases under its control. This in-process organization of databases isn't equivalent to a computing cluster (a collection of machines pooling resources).

Each PostgreSQL cluster sports its own unique data directory and configuration files, leading to data isolation and the flexibility to run different versions and configurations on one machine.

PostgreSQL Cluster Creation 101

Let's dive deeper into the practicalities surrounding cluster creation and management:

Creating Multiple Clusters

Recreating The Matrix, huh? To create additional clusters, you need user permissions to write to the data directory. For staters, on Debian/Ubuntu systems, use the pg_createcluster command:

# The Matrix has you... pg_createcluster <version> <cluster_name>

And to list all realities (clusters, I mean) use pg_lsclusters.

Friends Don't Let Friends Use Default Ports

Each cluster needs its unique port—a digital party shouldn't have gate-crashers. Specify this during initialization or via the postgresql.conf:

# 5433 is the new black initdb -D /path/to/data -p 5433

This initializes a new cluster at port 5433. 🚀

High Availability (HA)

For HA scenarios, it's not "just create and vamoose". You'd need additional config for aspects like replication and load balancing.

Data Directory Location

By default, the cluster data lives at /var/lib/postgresql/<version>/<clusterName>. But, you're free to choose (/free/to/choose it is!). Just ensure the permissions are set right.

Advanced Cluster Management

The management gets tougher as the clusters grow—more like herding cats. Helpful utilities:

  • pg_ctlcluster for the regular start, stop, and reload actions.
  • pg_rewind to restore sanity after a data mishap.
  • repmgr to handle replication and failovers.

Database Management within a Cluster

To add a new database within a cluster, just say the magic words:

# Add yet another apartment to the city CREATE DATABASE newdb;

Each database can have its unique set of extensions and schemas. Thus, a cluster allows for customization without conflicting with others.