PostgreSQL Backup and Restore best practices

JonR · April 3, 2024, 6:42pm

I am curious to know what other people are doing for a successful backup and restore of their Wappler PostgreSQL databases.

I’ve been doing some internal testing myself and successfully backup using the pg_dump command using multiple methods such as .tar archive or plain text .sql file but struggle with the pg_restore or the psql option. Either failing completely or complaining about “new data duplicate key value violates unique constraint” when trying to enter new data after the data has been successfully restored. I can enter data twice and on the second try it works but obviously I am doing it wrong.

I am aware of the built-in option of “Save Database Data and Structure” but that has never worked for me and that is something at least that I am aware of you cannot do on an automatic schedule so not the best option regardless.

Thanks for any insights

nomad · May 6, 2024, 3:17pm

I see it’s been a month, are you still looking into this? I use postgresql as well, and as useful as pg_dump is, it isn’t what most people think of as a “backup” – this surprised me when I learned it, because it is even called a backup in some of the official documentation.

If you’re a podcast person, this podcast covers backups in quite a few episodes: https://postgres.fm

If you prefer youtube, I liked this video although it is not a direct tutorial, I learned a lot: https://www.youtube.com/watch?v=mdjf9KtW92M

My undferstanding is that pg_dump writes out a recipe on how another database server can rebuild the same schema with the same data. So, it works great a lot of times as a backup… but it isn’t a backup. It’s more of a clean copy via recipe, so things unfortunately go wrong pretty regularly if the output from default pg_dump is used to try to restore a backup.

You can find videos on this by searching logical vs physical backups and see which option you think will work for your needs. After a few years of managing my own postgres instances, I now use managed databases… keep in mind backup/restore speed needs to be tested regularly, even if you do use a managed service…

JonR · May 6, 2024, 4:38pm

What I ended up with is a cronjob running on the docker server with a script to run the pg_dump. After much experimentation and testing I ended up with the script logging into the container itself and running the pg_dump then transferring the file after the backup is done instead of streaming while its backing up, removing the backup from the container when done. Was running into backup corruption issues if I did it via streaming through the container while its backing up.

As you say pg_dump can be particular in its parameters and setup. In my instance if I am doing a restore I can restore to the existing database or a new database with the tables already defined by wappler. It will not work if I tried a restore to a database that does not yet have defined tables with the parameters I am using, if I use different parameters to backup and restore that would be different. I do use pg_restore for the restore, again just using a script and simply specifying the filename to restore. This restore case works for me but your instance may be different.

I have additional automation in place to copy the backups offsite, notifications in place for issues, and have tested multiple backups and restores without issues.

So far seem to be happy with the setup for my particular use case but everyone’s use case will be different. This particular site is very, very low volume internal company usage but if it was much more critical I may revisit this kind of setup in the future.

nomad · May 6, 2024, 7:27pm

Glad you found a solution that works, I have definitely used similar setups before for years with success.

I didn’t run into issues with restoring/rebuilding from pg_dump until the site grew and the database became complex. May never be an issue for simple/smaller datasets. Thanks for sharing your solution.