Available in PagerDuty Process Automation Commercial products.
To support Autotakeover, you must first:
Scheduled jobs are owned by the last cluster member who modified them. Jobs can also be controlled using Cluster Manager. If a cluster member goes down, all scheduled jobs on that cluster member must be moved to another cluster node. This process can be performed automatically using the heartbeat and Autotakeover features in Process Automation version 2.1.0 and later releases.
Configure the heartbeat by adding the following settings in
# heartbeat interval in seconds
# initial delay after startup to send heartbeat
# remote execute/abort message processing interval in seconds
# age in seconds since last heartbeat to consider another member inactive
# age in seconds since last heartbeat to consider another member dead
Then configure Autotakeover by adding the following settings in
# enables autotakeover for members detected as "dead"
# policy indicates which nodes to take over. "Any": all dead nodes. "Static": only allowed uuids. "RemoteExecution": respect the remote execution policy defined on the cluster.
# delay in seconds to wait after sending autotakeover proposal
rundeck.clusterMode.autotakeover.delay = 60
# sleep in minimum seconds between autotakeover atttempts for a particular destination
rundeck.clusterMode.autotakeover.sleep = 300
Autotakeover Recover Executions
If for some reason, your Rundeck instance goes offline while a job is running and that job gets marked as incomplete, jobs configured with retry settings are taken over by another online instance using the recover execution policy. To use the recover execution policy, add the following to your
# enable auto cleanup of stale jobs on member death
# policy for members to accept as targets of auto cleanup. Can be 'Any' or 'Static'
# if static, config requires 'allowed' setting
# delay in seconds before proceeding with auto-retry proposal
# delay in seconds before doing another auto-retry of the same member
The options are:
Any: the auto take-over process will assign the scheduled owner of a job to
anycluster active member
Static: If using static policy, you can configure a list of allowed member UUIDs to proceed with auto take-over if they are marked as dead. If a member is marked as dead and not in this list, auto take-over will not be performed. For example:
- RemoteExecution: This policy is used when you have the remote execution enabled, and the auto take-over process will
respectthe remote execution policy defined on the cluster. see Remote Job Execution