Backup on object stores v1

EDB Postgres Distributed for Kubernetes supports online/hot backup of PGD clusters through physical backup and WAL archiving on an object store. This means that the database is always up (no downtime required) and that point-in-time recovery (PITR) is available.

Common object stores

Multiple object stores are supported, such as AWS S3, Microsoft Azure Blob Storage, Google Cloud Storage, MinIO Gateway, or any S3-compatible provider. Given that EDB Postgres Distributed for Kubernetes configures the connection with object stores by relying on EDB Postgres for Kubernetes, see the EDB Postgres for Kubernetes common object stores for backups documentation for more information.

Important

The EDB Postgres for Kubernetes documentation's Cloud Provider configuration section is available at spec.backup.barmanObjectStore. In EDB Postgres Distributed for Kubernetes examples, the object store section is at a different path: spec.backup.configuration.barmanObjectStore.

WAL archive

WAL archiving is the process that sends WAL files to the object storage, and it's essential to execute online/hot backups or PITR. In EDB Postgres Distributed for Kubernetes, each PGD node is set up to archive WAL files in the object store independently.

The WAL archive is defined in the PGD Group spec.backup.configuration.barmanObjectStore stanza, and is enabled as soon as a destination path and cloud credentials are set. You can choose to compress WAL files before they're uploaded and you can encrypt them. You can also enable parallel WAL archiving:

apiVersion: pgd.k8s.enterprisedb.io/v1beta1
kind: PGDGroup
[...]
spec:
  backup:
    configuration:
      barmanObjectStore:
        [...]
        wal:
          compression: gzip
          encryption: AES256
          maxParallel: 8

For more information, see the EDB Postgres for Kubernetes WAL archiving documentation.

Scheduled backups

Scheduled backups are the recommended way to configure your backup strategy in EDB Postgres Distributed for Kubernetes. When the PGD group spec.backup.configuration.barmanObjectStore and .spec.backup.schedulers[].schedule stanza is configured, the operator selects one of the PGD data nodes as the elected backup node in which it creates a Scheduled Backup resource.

The .spec.backup.schedulers[].method field allows you to define the scheduled backup method. Two backup methods are supported:

  • volumeSnapshot
  • barmanObjectStore (the default)

You can define more than one scheduler, but each method can be used by only one scheduler. That is, two schedulers aren't allowed to use the same method.

For object store backups, with the default barmanObjectStore method, use the stanza spec.backup.configuration.barmanObjectStore to define the object store information for both backup and WAL archiving. For more information, see Backup on object stores in the EDB Postgres for Kubernetes documentation.

To perform volumeSnapshot backups, you can select the volumeSnapshot method. Use the stanza spec.backup.configuration.barmanObjectStore.volumeSnapshot to define the volumeSnapshot configuration. For more information, see Backup on volume snapshots in the EDB Postgres for Kubernetes documentation.

This example shows how to use the volumeSnapshot method for backup. WAL archiving is still done onto the Barman object store.

apiVersion: pgd.k8s.enterprisedb.io/v1beta1
kind: PGDGroup
[...]
spec:
  backup:
    configuration:
      volumeSnapshot:
        className: csi-hostpath-snapclass
      barmanObjectStore:
        destinationPath: "<destination path here>"
        s3Credentials:
          accessKeyId:
            name: backup-storage-creds
            key: ID
          secretAccessKey:
            name: backup-storage-creds
            key: KEY
        wal:
          compression: gzip
          encryption: AES256
          maxParallel: 8
    schedulers:
    - method: volumeSnapshot
      schedule: "0 0 0 * * *"
      backupOwnerReference: self
      suspend: false
      immediate: true

For a comparison of these two backup methods, see Object stores or volume snapshots in the EDB Postgres for Kubernetes documentation.

The .spec.backup.schedulers[].schedule field allows you to define a cron schedule, expressed in the Go cron package format:

apiVersion: pgd.k8s.enterprisedb.io/v1beta1
kind: PGDGroup
[...]
spec:
  backup:
    schedulers:
    - method: barmanObjectStore
      schedule: "0 0 0 * * *"
      backupOwnerReference: self
      suspend: false
      immediate: true

If necessary, you can suspend scheduled backups by setting .spec.backup.schedulers[].suspend to true. This setting prevents new backups from being scheduled.

If you want to execute a backup as soon as the ScheduledBackup resource is created, set .spec.backup.schedulers[].immediate to true.

.spec.backupOwnerReference indicates the ownerReference to use in the created backup resources. The options are:

  • none Doesn't set an owner reference for created backup objects.
  • self Sets the ScheduledBackup object as owner of the backup.
  • cluster Sets the cluster as owner of the backup.
Warning

The .spec.backup.cron field is deprecated. Use .spec.backup.schedulers instead. While you can still use .spec.backup.cron, you can't use it at the same time as .spec.backup.schedulers.

Note

The EDB Postgres for Kubernetes ScheduledBackup object contains the cluster option to specify the cluster to back up. This option currently isn't supported by EDB Postgres Distributed for Kubernetes and is ignored if specified.

If an elected backup node is deleted, the operator transparently elects a new backup node and reconciles the ScheduledBackup resource accordingly.

Retention policies

EDB Postgres Distributed for Kubernetes can manage the automated deletion of backup files from the backup object store using retention policies based on the recovery window. This process also takes care of removing unused WAL files and WALs associated with backups that are scheduled for deletion.

You can define your backups with a retention policy of 30 days:

apiVersion: pgd.k8s.enterprisedb.io/v1beta1
kind: PGDGroup
[...]
spec:
  backup:
    configuration:
      retentionPolicy: "30d"

For more information, see the EDB Postgres for Kubernetes retention policies in the EDB Postgres for Kubernetes documentation.

Important

Currently, the retention policy is applied only for the elected Backup Node backups and WAL files. Given that each other PGD node also archives its own WALs independently, it's your responsibility to manage the lifecycle of those WAL files, for example by leveraging the object storage data retention policy. Also, if you have an object storage data retention policy set up on every PGD node directory, make sure it's not overlapping or interfering with the retention policy managed by the operator.

Compression algorithms

Backups and WAL files are uncompressed by default. However, multiple compression algorithms are supported. For more information, see the EDB Postgres for Kubernetes compression algorithms documentation.

Tagging of backup objects

It's possible to specify tags as key-value pairs for the backup objects, namely base backups, WAL files, and history files. For more information, see the EDB Postgres for Kubernetes documentation about tagging of backup objects.

On-demand backups of a PGD node

A PGD node is represented as single-instance EDB Postgres for Kubernetes Cluster object. As such, if you need to, it's possible to request an on-demand backup of a specific PGD node by creating a EDB Postgres for Kubernetes Backup resource. To do that, see EDB Postgres for Kubernetes on-demand backups in the EDB Postgres for Kubernetes documentation.

Hint

You can retrieve the list of EDB Postgres for Kubernetes clusters that make up your PGD group by running kubectl get cluster -l k8s.pgd.enterprisedb.io/group=my-pgd-group -n my-namespace.