Patroni
Watchdog support
Having multiple PostgreSQL servers running as master can result in transactions lost due to diverging timelines. This situation is also called a split-brain problem. To avoid split-brain Patroni needs to ensure PostgreSQL will not accept any transaction commits after leader key expires in the DCS. Under normal circumstances Patroni will try to achieve this by stopping PostgreSQL when leader lock update fails for any reason. However, this may fail to happen due to various reasons:
-
Patroni has crashed due to a bug, out-of-memory condition or by being accidentally killed by a system administrator.
-
Shutting down PostgreSQL is too slow.
-
Patroni does not get to run due to high load on the system, the VM being paused by the hypervisor, or other infrastructure issues.
To guarantee correct behavior under these conditions Patroni supports watchdog devices. Watchdog devices are software or hardware mechanisms that will reset the whole system when they do not get a keepalive heartbeat within a specified timeframe. This adds an additional layer of fail safe in case usual Patroni split-brain protection mechanisms fail.
Patroni will try to activate the watchdog before promoting PostgreSQL to
master. If watchdog activation fails and watchdog mode is required
then the node will refuse to become master. When deciding to participate
in leader election Patroni will also check that watchdog configuration
will allow it to become leader at all. After demoting PostgreSQL (for
example due to a manual failover) Patroni will disable the watchdog
again. Watchdog will also be disabled while Patroni is in paused state.
By default Patroni will set up the watchdog to expire 5 seconds before
TTL expires. With the default setup of loop_wait=10
and ttl=30
this gives HA loop at least 15 seconds (ttl
- safety_margin
-
loop_wait
) to complete before the system gets forcefully reset. By
default accessing DCS is configured to time out after 10 seconds. This
means that when DCS is unavailable, for example due to network issues,
Patroni and PostgreSQL will have at least 5 seconds (ttl
-
safety_margin
- loop_wait
- retry_timeout
) to come to a
state where all client connections are terminated.
Safety margin is the amount of time that Patroni reserves for time
between leader key update and watchdog keepalive. Patroni will try to
send a keepalive immediately after confirmation of leader key update. If
Patroni process is suspended for extended amount of time at exactly the
right moment the keepalive may be delayed for more than the safety
margin without triggering the watchdog. This results in a window of time
where watchdog will not trigger before leader key expiration,
invalidating the guarantee. To be absolutely sure that watchdog will
trigger under all circumstances set up the watchdog to expire after half
of TTL by setting safety_margin
to -1 to set watchdog timeout to
ttl // 2
. If you need this guarantee you probably should increase
ttl
and/or reduce loop_wait
and retry_timeout
.
Currently watchdogs are only supported using Linux watchdog device interface.
Setting up software watchdog on Linux
Default Patroni configuration will try to use /dev/watchdog
on Linux
if it is accessible to Patroni. For most use cases using software
watchdog built into the Linux kernel is secure enough.
To enable software watchdog issue the following commands as root before starting Patroni:
modprobe softdog
# Replace postgres with the user you will be running patroni under
chown postgres /dev/watchdog
For testing it may be helpful to disable rebooting by adding
soft_noboot=1
to the modprobe command line. In this case the
watchdog will just log a line in kernel ring buffer, visible via
dmesg.
Patroni will log information about the watchdog when it is successfully enabled.