Learn more about log shipping replication, an advanced PostgreSQL database administration topic discussed in this article.
Desire to create a live backup of a postgresql 8.3 system.
For other versions of Pg, the setup is basically the same, however there may be some version specific caveats. Read the release notes for the version of postgres you have.
The live backup is to be used for failover and cannot be used for load balancing, querying, etc. If using tablespace, they must exist in the same location.
Both the master and the slave must be running exactly the same version of postgresql, particularly the datetime integer option must be the same, either floating point or integer. This is because we are duplicating the data on both machines. ALSO THE ARCHITECTURE MUST BE THE SAME!!! 64Bit to 64Bit, or 32Bit to 32Bit.
PostgreSQL's write ahead logs (WAL) enable this feature. Before a transaction is completed it is written to the WAL, then the transaction is written to the database. There is a facility to call a program before the WAL is written, this is referred to as the archive_command. What happens here is that for every WAL log written it is also handled by this command. HOT backup utilizes this to transport the WAL log to the slave machine.
On the slave machine we initiate continuous recovery mode where the database instance is continually reading the WAL logs as they appear and applying them to the replicated database. Essentially making a duplicate of the master database in real time. Please note that there is no PostgreSQL connection between the two servers. The slave does not 'know' when the master has failed. An evaluation needs to be made by the Database Administrator to determine whether or not to turn on the slave Postgres instance.
1. In order to copy files from one machine to the other we need to be able to ssh from the master to the slave as the postgres user without passwords. This is accomplished using ssh keys. A ssh key for the postgresql user needs to be generated on the master and put into the slave's authorized_keys file.
On the master, do
ssh-copy-id -i ~/.ssh/id_dsa.pub postgres@<slave ip>
2. On the slave we require a contrib module called pg_standby. This module does the work of reading the WAL files as they appear and making them available to the slave instance. This module is found in the contrib directory of postgresql and needs to be built.
3. It is desirable to transfer these files over the shortest network path as possible. To this end I recommend that a separate network card be placed in the master and slave and connected with as short a path as possible, ideally a crossover cable.
4. Using `visudo` or `EDITOR=nano visudo` add the following entry to /etc/sudoers on both. This allows the postgres user to use the init.d start script on the slave.
postgres ALL=NOPASSWD: /etc/init.d/postgresql
Preparing Postgres on the Master and Slave
1. Login to the Master as postgres user
2. Enable archive mode on the master by editing the postgresql.conf. Change the archive commands: (The change to archive_mode requires are restart to take effect)
archive_mode = on
archive_command = 'rsync -a %p email@example.com:/home/postgres/walfiles/%f'
# archive_command = '/bin/true' # Used for testing, and doesn't require a restart
3. Restart postgresql, and check that it is in archive mode. It will be copying over walfiles.
4. Login to the Slave as postgres user
5.Create a directory - /home/postgres/walfiles. This corresponds to the directory in the Master's config.
6. Create a file /pgsql/recovery.conf, with the following content:
restore_command = '/usr/local/pgsql/bin/pg_standby -l -d -s 2 -t /tmp/pgsql.trigger.5432 \ /home/postgres/walfiles %f %p %r 2>>standby.log'
7. Remove the contents of /usr/local/pgsql/data, this is getting replaced by the copy from the Master.
rm -rf /usr/local/pgsql/data/*
8. Log out of the Slave
9. Log in to the Master as postgres user
10. As user 'postgres', place the db into safe copy mode. This allows postgres to be able to copy from the filesystem:
psql -U postgres -c "SELECT pg_start_backup('dhreplication');"
11. As user 'postgres', rsync the entire database cluster from the master to the slave with:
rsync -avz /usr/local/pgsql/data/* 10.5.12.37:/usr/local/pgsql/data/
NOTE: during the rsync process, several warnings are normal
12. As user postgres, take pg out of safe copy mode with:
psql -U postgres -c "SELECT pg_stop_backup();"
13. You are done on the master, log out.
14. Login to the slave as the postgres user
15. Edit the postgresql.conf file and turn off archive_mode. This is now the config file from the MASTER that will be running on the slave, remember, we just rsync'd the whole data dir from master to the slave...
archive_mode = off
15. As user postgres, create a symbolic link to the recovery.conf file:
16. ln -s /usr/local/pgsql/recovery.conf /usr/local/pgsql/data/recovery.conf
17. Start postgresql
18. Monitor standby.log, and postgresql log file for errors
tail -f /usr/local/pgsql/data/standby.log /var/pgsql/pgsql.log
1. Login to the master as the postgres user
2. Change the archive_command parameter to return true
archive_command = /bin/true
3. Reload postgresql.conf
4. Log off master
5. Log onto the slave as the postgres user
6. Edit postgresql.conf and set:
archive_mode = off
7. Monitor your logs:
tail -f /usr/local/pgsql/data/standby.log /var/log/pgsql/pgsql.log
8. Then issue:
This creates the file /tmp/pgsql.trigger.5432, which takes postgres on the slave out of recovery mode, and ready for production.
9. Connect your clients to the server.
What the logging should show
Examples of expected behavior on a normally running slave replica
1. As the root user, start a tail on the Slave of the following log files
'tail -f serverlog standby.log recovery.conf'
2. As the root user, start a tail on the Slave of the disk free command 'df'
'watch -n.1 'df''
3. To get some activity transferring, from the MASTER, logged into psql, perform a REINDEX DATABASE <databasename>; You will start to see the bits fly with the command in #2 above. They should increase to a certain point, and then purge themselves.
If you see the words 'PANIC' ,'FATAL' or 'UNEXPECTED'. Please call for assistance. Chances are that if these are occurring in the logging on your slave the best thing to do, as long as the Master is functioning correctly, is to rebuild the Slave following these instructions. If there is truly an issue with the Master, and the Slave is producing errors on startup, call for help.
Examples of problems on the slave
==> /var/log/pgsql/pgsql.log <==
Jul 28 17:09:42 xtuplelive postgres: [11-1] 2009-07-28 17:09:42 BST PANIC: unexpected pageaddr 1/3AFF2000 in log file 1, segment 70, offset 16719872
Jul 28 17:09:42 xtuplelive postgres: [1-1] 2009-07-28 17:09:42 BST LOG: startup process (PID 6567) was terminated by signal 6: Aborted
Jul 28 17:09:42 xtuplelive postgres: [2-1] 2009-07-28 17:09:42 BST LOG: aborting startup due to startup process failure
==> /var/log/pgsql/pgsql.log <==
Jul 28 17:13:26 xtuplelive postgres: [9-1] 2009-07-28 17:13:26 BST FATAL: WAL ends before end time of backup dump
Jul 28 17:13:26 xtuplelive postgres: [1-1] 2009-07-28 17:13:26 BST LOG: startup process (PID 6863) exited with exit code 1
Jul 28 17:13:26 xtuplelive postgres: [2-1] 2009-07-28 17:13:26 BST LOG: aborting startup due to startup process failure
To reset the slave
Login to the slave as the root user and start from step # 6 in the instructions under 'Preparing Postgres on the Master and Slave'