patroni.ha module

class _patroni.ha.Failsafe(_dcs: AbstractDCS) View on GitHub

Bases: object +

__init\\__(dcs: AbstractDCS) → None View on GitHub

+

_reset_state() → None View on GitHub

+

is_active() → bool View on GitHub

Is used to report in REST API whether the failsafe mode was activated.
On primary the self._last_update is set from the set_is_active() method and always returns the correct value.
On replicas the self._last_update is set at the moment when the primary performs POST /failsafe REST API calls. The side-effect - it is possible that replicas will show failsafe_is_active values different from the primary. +

property _leader: Leader | None_

+

set_is_active(value: float) → None View on GitHub

+

update(data: Dict[ str, Any]) → None View on GitHub

+

update_cluster(cluster: Cluster) → Cluster View on GitHub
class _patroni.ha.Ha(_patroni: Patroni) View on GitHub

Bases: object +

__init\\__(patroni: Patroni) View on GitHub

+

delete_leader(_last_lsn: int | None = None) → None View on GitHub

+

do_reinitialize(_cluster: Cluster) → bool | None View on GitHub

+

_failsafe_config() → Dict[ str, str] | None View on GitHub

+

_get_failover_action_name() → str View on GitHub

Return the currently requested manual failover action name or the default failover. +

Returns

str representing the manually requested action (manual`+ ``+failover` if no leader is specified in the /failover in DCS, switchover otherwise) or failover if /failover is empty. +

get_node_to_follow(_cluster: Cluster) → Leader | Member | None View on GitHub

Determine the node to follow. +

Parameters

cluster – the currently known cluster state from DCS.

Returns

the node which we should be replicating from. +

_handle_crash_recovery() → str | None View on GitHub

+

_handle_dcs_error() → str View on GitHub

+

_handle_rewind_or_reinitialize() → str | None View on GitHub

+

is_healthiest_node(_members: Collection[Member], check_replication_lag: bool = True) → bool View on GitHub

This method tries to determine whether I am healthy enough to became a new leader candidate or not. +

_run_cycle() → str View on GitHub

+

sync_replication_slots(_dcs_failed: bool) → List[ str] View on GitHub

Handles replication slots. +

Parameters

dcs_failed – bool, indicates that communication with DCS failed (get_cluster() or update_leader())

Returns

list[str], replication slots names that should be copied from the primary +

acquire_lock() → bool View on GitHub

+

bootstrap() → str View on GitHub

+

bootstrap_standby_leader() → bool | None View on GitHub

If we found ‘standby’ key in the configuration, we need to bootstrap not a real primary, but a ‘standby leader’, that will take base backup from a remote member and start follow it. +

call_failsafe_member(data: Dict[ str, Any], member: Member) → bool View on GitHub

+

cancel_initialization() → None View on GitHub

+

check_failsafe_topology() → bool View on GitHub

Check whether we could continue to run as a primary by calling all members from the failsafe topology. + Note

If the `+/failsafe+` key contains invalid data or if the `+name+` of our node is missing in the `+/failsafe+` key, we immediately give up and return `+False+`.
We send the JSON document in the POST request with the following fields:
  Standby nodes are using information from the `+slots+` dict to advance position of permanent replication slots while DCS is not accessible in order to avoid indefinite growth of `+pg_wal+`.
  +
  Returns:::
    `+True+` if all members from the `+/failsafe+` topology agree that this node could continue to run as a `+primary+`, or `+False+` if some of standby nodes are not accessible or don’t agree.
+
check_timeline() → bool View on GitHub
Returns

True if should check whether the timeline is latest during the leader race. +

clone(clone_member: Leader | Member | None = None, msg: str = '(without leader)') → bool | None View on GitHub

+

delete_future_restart() → bool View on GitHub

+

demote(mode: str) → bool | None View on GitHub

Demote PostgreSQL running as primary. +

Parameters

mode – One of offline, graceful, immediate or immediate-nolock. offline is used when connection to DCS is not available. graceful is used when failing over to another node due to user request. May only be called running async. immediate is used when we determine that we are not suitable for primary and want to failover quickly without regard for data durability. May only be called synchronously. immediate-nolock is used when find out that we have lost the lock to be primary. Need to bring down PostgreSQL as quickly as possible without regard for data durability. May only be called synchronously. +

enforce_follow_remote_member(message: str) → str View on GitHub

+

enforce_primary_role(message: str, promote_message: str) → str View on GitHub

Ensure the node that has won the race for the leader key meets criteria for promoting its PG server to the ‘primary’ role. +

evaluate_scheduled_restart() → str | None View on GitHub

+

failsafe_is_active() → bool View on GitHub

+

fetch_node_status(member: Member) → _MemberStatus View on GitHub

This function perform http get request on member.api_url and fetches its status :returns: _MemberStatus object +

fetch_nodes_statuses(members: List[Member]) → List[_MemberStatus] View on GitHub

+

follow(demote_reason: str, follow_reason: str, refresh: bool = True) → str View on GitHub

+

future_restart_scheduled() → Dict[ str, Any] View on GitHub

+

get_effective_tags() → Dict[ str, Any] View on GitHub

Return configuration tags merged with dynamically applied tags. +

get_failover_candidates(exclude_failover_candidate: bool) → List[Member] View on GitHub

Return a list of candidates for either manual or automatic failover. + Exclude non-sync members when in synchronous mode, the current node (its checks are always performed earlier) and the candidate if required. If failover candidate exclusion is not requested and a candidate is specified in the /failover key, return the candidate only. The result is further evaluated in the caller Ha.is_failover_possible() to check if any member is actually healthy enough and is allowed to poromote. +

Parameters

exclude_failover_candidate – if True, exclude failover.candidate from the candidates.

Returns

a list of Member ojects or an empty list if there is no candidate available. +

get_remote_member(member: Leader | Member | None = None) → RemoteMember View on GitHub

Get remote member node to stream from.
In case of standby cluster this will tell us from which remote member to stream. Config can be both patroni config or cluster.config.data. +

handle_long_action_in_progress() → str View on GitHub

Figure out what to do with the task AsyncExecutor is performing. +

handle_starting_instance() → str | None View on GitHub

Starting up PostgreSQL may take a long time. In case we are the leader we may want to fail over to. +

has_lock(info: bool = True) → bool View on GitHub

+

is_failover_possible(*, cluster_lsn: int = 0, exclude_failover_candidate: bool = False) → bool View on GitHub

Checks whether any of the cluster members is allowed to promote and is healthy enough for that. +

Parameters
Returns

True if there are members eligible to become the new leader. +

is_failsafe_mode() → bool View on GitHub
Returns

True if failsafe_mode is enabled in global configuration. +

is_healthiest_node() → bool View on GitHub

Performs a series of checks to determine that the current node is the best candidate. + In case if manual failover/switchover is requested it calls manual_failover_process_no_leader() method. +

Returns

True if the current node is among the best candidates to become the new leader. +

is_lagging(wal_position: int) → bool View on GitHub

Returns if instance with an wal should consider itself unhealthy to be promoted due to replication lag. +

Parameters

wal_position – Current wal position.
:returns True when node is lagging +

is_leader() → bool View on GitHub
Returns

True if the current node is the leader, based on expiration set when it last held the key. +

is_paused() → bool View on GitHub
Returns

True if in maintenance mode. +

is_standby_cluster() → bool View on GitHub
Returns

True if global configuration has a valid “standby_cluster” section. +

is_sync_standby(cluster: Cluster) → bool View on GitHub
Returns

True if the current node is a synchronous standby. +

is_synchronous_mode() → bool View on GitHub
Returns

True if synchronous replication is requested. +

load_cluster_from_dcs() → None View on GitHub

+

manual_failover_process_no_leader() → bool | None View on GitHub

Handles manual failover/switchover when the old leader already stepped down. +

Returns

+

notify_mpp_coordinator(event: str) → None View on GitHub

Send an event to the MPP coordinator. +

Parameters

event – the type of event for coordinator to parse. +

post_bootstrap() → str View on GitHub

+

post_recover() → str | None View on GitHub

+

primary_stop_timeout() → int | None View on GitHub
Returns

“primary_stop_timeout” from the global configuration or None when not in synchronous mode. +

process_healthy_cluster() → str View on GitHub

+

process_manual_failover_from_leader() → str | None View on GitHub

Checks if manual failover is requested and takes action if appropriate. + Cleans up failover key if failover conditions are not matched. +

Returns

action message if demote was initiated, None if no action was taken +

process_sync_replication() → None View on GitHub

Process synchronous standby beahvior.
Synchronous standbys are registered in two places postgresql.conf and DCS. The order of updating them must be right. The invariant that should be kept is that if a node is primary and sync_standby is set in DCS, then that node must have synchronous_standby set to that value. Or more simple, first set in postgresql.conf and then in DCS. When removing, first remove in DCS, then in postgresql.conf. This is so we only consider promoting standbys that were guaranteed to be replicating synchronously. +

process_unhealthy_cluster() → str View on GitHub

Cluster has no leader key +

recover() → str View on GitHub

Handle the case when postgres isn’t running. + Depending on the state of Patroni, DCS cluster view, and pg_controldata the following could happen: + __\\__

pg_rewind is executed if it is necessary, or optinally, the data directory could

be removed if it is allowed by configuration.

  after `+crash+``+ +``+recovery+` and/or `+pg_rewind+` are executed, postgres is started in recovery.
  \\__\\__
  +
  Returns:::
    action message, describing what was performed.
+
reinitialize(_force:  https://docs.python.org/3/library/functions.html#bool[bool] = False_) →  https://docs.python.org/3/library/stdtypes.html#str[str] |  https://docs.python.org/3/library/constants.html#None[None] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L1593-L1613[View on GitHub]link:#patroni.ha.Ha.reinitialize[];;
+
release_leader_key_voluntarily(_last_lsn:  https://docs.python.org/3/library/functions.html#int[int] |  https://docs.python.org/3/library/constants.html#None[None] = None_) →  https://docs.python.org/3/library/constants.html#None[None] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L1209-L1212[View on GitHub]link:#patroni.ha.Ha.release_leader_key_voluntarily[];;
+
restart(_restart_data:  https://docs.python.org/3/library/typing.html#typing.Dict[Dict][ https://docs.python.org/3/library/stdtypes.html#str[str],  https://docs.python.org/3/library/typing.html#typing.Any[Any]]_, _run_async:  https://docs.python.org/3/library/functions.html#bool[bool] = False_) →  https://docs.python.org/3/library/typing.html#typing.Tuple[Tuple][ https://docs.python.org/3/library/functions.html#bool[bool],  https://docs.python.org/3/library/stdtypes.html#str[str]] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L1534-L1581[View on GitHub]link:#patroni.ha.Ha.restart[];;
  conditional and unconditional restart
+
restart_matches(_role:  https://docs.python.org/3/library/stdtypes.html#str[str] |  https://docs.python.org/3/library/constants.html#None[None]_, _postgres_version:  https://docs.python.org/3/library/stdtypes.html#str[str] |  https://docs.python.org/3/library/constants.html#None[None]_, _pending_restart:  https://docs.python.org/3/library/functions.html#bool[bool]_) →  https://docs.python.org/3/library/functions.html#bool[bool] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L1491-L1508[View on GitHub]link:#patroni.ha.Ha.restart_matches[];;
+
restart_scheduled() →  https://docs.python.org/3/library/functions.html#bool[bool] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L1531-L1532[View on GitHub]link:#patroni.ha.Ha.restart_scheduled[];;
+
run_cycle() →  https://docs.python.org/3/library/stdtypes.html#str[str] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L1977-L1986[View on GitHub]link:#patroni.ha.Ha.run_cycle[];;
+
schedule_future_restart(_restart_data:  https://docs.python.org/3/library/typing.html#typing.Dict[Dict][ https://docs.python.org/3/library/stdtypes.html#str[str],  https://docs.python.org/3/library/typing.html#typing.Any[Any]]_) →  https://docs.python.org/3/library/functions.html#bool[bool] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L1510-L1517[View on GitHub]link:#patroni.ha.Ha.schedule_future_restart[];;
+
set_is_leader(_value:  https://docs.python.org/3/library/functions.html#bool[bool]_) →  https://docs.python.org/3/library/constants.html#None[None] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L213-L222[View on GitHub]link:#patroni.ha.Ha.set_is_leader[];;
  Update the current node’s view of it’s own leadership status.
  +
  Will update the expiry timestamp to match the dcs ttl if setting leadership to true, otherwise will set the expiry to the past to immediately invalidate.
  +
  Parameters:::
    *value* – is the current node the leader.
+
set_start_timeout(_value:  https://docs.python.org/3/library/functions.html#int[int] |  https://docs.python.org/3/library/constants.html#None[None]_) →  https://docs.python.org/3/library/constants.html#None[None] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L1748-L1753[View on GitHub]link:#patroni.ha.Ha.set_start_timeout[];;
  Sets timeout for starting as primary before eligible for failover.
  +
  Must be called when async_executor is busy or in the main thread.
+
should_run_scheduled_action(_action_name:  https://docs.python.org/3/library/stdtypes.html#str[str]_, _scheduled_at:  https://docs.python.org/3/library/datetime.html#datetime.datetime[datetime] |  https://docs.python.org/3/library/constants.html#None[None]_, _cleanup_fn:  https://docs.python.org/3/library/typing.html#typing.Callable[Callable][[...],  https://docs.python.org/3/library/typing.html#typing.Any[Any]]_) →  https://docs.python.org/3/library/functions.html#bool[bool] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L1295-L1328[View on GitHub]link:#patroni.ha.Ha.should_run_scheduled_action[];;
+
shutdown() →  https://docs.python.org/3/library/constants.html#None[None] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L1988-L2033[View on GitHub]link:#patroni.ha.Ha.shutdown[];;
+
sync_mode_is_active() →  https://docs.python.org/3/library/functions.html#bool[bool] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L224-L229[View on GitHub]link:#patroni.ha.Ha.sync_mode_is_active[];;
  Check whether synchronous replication is requested and already active.
  +
  Returns:::
    `+True+` if the primary already put its name into the `+/sync+` in DCS.
+
_static _sysid_valid(_sysid:  https://docs.python.org/3/library/stdtypes.html#str[str] |  https://docs.python.org/3/library/constants.html#None[None]_) →  https://docs.python.org/3/library/functions.html#bool[bool] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L1651-L1656[View on GitHub]link:#patroni.ha.Ha.sysid_valid[];;
+
touch_member() →  https://docs.python.org/3/library/functions.html#bool[bool] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L354-L417[View on GitHub]link:#patroni.ha.Ha.touch_member[];;
+
update_cluster_history() →  https://docs.python.org/3/library/constants.html#None[None] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L809-L828[View on GitHub]link:#patroni.ha.Ha.update_cluster_history[];;
+
update_failsafe(_data:  https://docs.python.org/3/library/typing.html#typing.Dict[Dict][ https://docs.python.org/3/library/stdtypes.html#str[str],  https://docs.python.org/3/library/typing.html#typing.Any[Any]]_) →  https://docs.python.org/3/library/stdtypes.html#str[str] |  https://docs.python.org/3/library/constants.html#None[None] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L915-L918[View on GitHub]link:#patroni.ha.Ha.update_failsafe[];;
+
update_lock(_update_status:  https://docs.python.org/3/library/functions.html#bool[bool] = False_) →  https://docs.python.org/3/library/functions.html#bool[bool] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L277-L316[View on GitHub]link:#patroni.ha.Ha.update_lock[];;
  Update the leader lock in DCS.
  +
  Note
After successful update of the leader key the `+AbstractDCS.update_leader()+` method could also optionally update the `+/status+` and `+/failsafe+` keys.
The `+/status+` key contains the last known LSN on the leader node and the last known state of permanent replication slots including permanent physical replication slot for the leader.
  Last, but not least, this method calls a `+Watchdog.keepalive()+` method after the leader key was successfully updated.
  +
  Parameters:::
    *update_status* – `+True+` if we also need to update the `+/status+` key in DCS, otherwise `+False+`.
  Returns:::
    `+True+` if the leader key was successfully updated and we can continue to run postgres as a `+primary+` or as a `+standby_leader+`, otherwise `+False+`.
+
wakeup() →  https://docs.python.org/3/library/constants.html#None[None] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L2044-L2048[View on GitHub]link:#patroni.ha.Ha.wakeup[];;
  Trigger the next run of HA loop if there is no “active” leader watch request in progress.
  +
  This usually happens on the leader or if the node is running async action
+
watch(_timeout:  https://docs.python.org/3/library/functions.html#float[float]_) →  https://docs.python.org/3/library/functions.html#bool[bool] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L2035-L2042[View on GitHub]link:#patroni.ha.Ha.watch[];;
+
while_not_sync_standby(_func:  https://docs.python.org/3/library/typing.html#typing.Callable[Callable][[...],  https://docs.python.org/3/library/typing.html#typing.Any[Any]]_) →  https://docs.python.org/3/library/typing.html#typing.Any[Any] https://github.com/zalando/patroni/blob/9d231aeecdd69f4d06c75a702755fa70d8c2b5f6/patroni/ha.py#L772-L807[View on GitHub]link:#patroni.ha.Ha.while_not_sync_standby[];;
  Runs specified action while trying to make sure that the node is not assigned synchronous standby status.
  +
  Tags us as not allowed to be a sync standby as we are going to go away, if we currently are wait for leader to notice and pick an alternative one or if the leader changes or goes away we are also free.
  +
  If the connection to DCS fails we run the action anyway, as this is only a hint.
  +
  There is a small race window where this function runs between a primary picking us the sync standby and publishing it to the DCS. As the window is rather tiny consequences are holding up commits for one cycle period we don’t worry about it here.
class _patroni.ha._MemberStatus(_member: Member, reachable: bool, in_recovery: bool | None, wal_position: int, data: Dict[ str, Any]) View on GitHub

Bases: Tags, _MemberStatus + Node status distilled from API response. + Consists of the following fields: +

Variables:

+

abc_impl = <_abc._abc_data object>_

+

failover_limitation() → str | None View on GitHub

Returns reason why this node can’t promote or None if everything is ok. +

classmethod _from_api_response(_member: Member, json: Dict[ str, Any]) → _MemberStatus View on GitHub
Parameters
Returns

_MemberStatus object +

property _tags: Dict[ str, Any]_

Dictionary with values of different tags (i.e. nofailover). +

property _timeline: int_

Timeline value from JSON. +

classmethod _unknown(_member: Member) → _MemberStatus View on GitHub

Create a new class instance with empty or null values. +

property _watchdog_failed: bool_

Indicates that watchdog is required by configuration but not available or failed.


© Copyright 2015 Compose, Zalando SE. Revision 9d231aee.

Built with Sphinx using a theme provided by Read the Docs.

Read the Docs v: latest

+ Builds