# Health Checks
Health Checks allow you to check the Health Status of Nodes periodically and on-demand. You can show the heatlh status visually in the GUI, and use the status to filter out unhealthy nodes when running Jobs.
Configure how you want to determine the Health Statuses of your Nodes, using a Command or Script.
Capture output of the command or script to add as attributes to your nodes.
Expose the status as Node Attributes using the Health Status Node Enhancer, and use the health check attributes inside your node filters.
Use the secondary node filter feature of Jobs to pre-filter out unhealthy nodes, and see which nodes will be targetted and which will be filtered out before running a Job.
# Health Checks System
The Health Checks System operates across several parts of the Rundeck System:
- Project Nodes - as determined by your Nodes configuration.
- Health Checks configuration - the definition of which checks to run in your project configuration.
- Node Execution - Command and Script execution use the configured Node Executor for the nodes
- ACL Policies - Access control definition used by the Health Checks when performing Node Execution
- Health Checks System - periodically and asychronously performs the Health Checks
- Health Checks Cache - a cache of the results of the health checks
- Node Enhancer - the "Health Status" Node enhancer layers additional attributes onto the Project Nodes by reading data from the Health Checks Cache
You enable Health Checks in your Project configuration.
You can configure multiple Health Check Plugins for each project, and each Health Check can apply to all nodes (default) or a select set of nodes using a Node Filter.
Each Health Check can have a "label", which identifies it within the generated Node Attributes.
# Running Health Checks
Health Checks will be run on-demand asynchronously and the results will be cached for a period of time. The on-demand aspect is triggered when Health Status information is requested of the Health Checks System. The request can be triggered by accessing the Nodes page of Rundeck, or otherwise reading the Nodes data, such as preparing to run a Job. Initially, each node would be given an "Unknown" status, until the Health Checks are completed.
After a period of time, the Health Checks results will expire, and another request for Nodes data would trigger a refresh of the data.
You can also Refresh the results in the GUI directly, which will cause the checks to be run again for the nodes.
# Status Results
Each Health Check will result in a Health Status:
- Healthy - the check reported Healthy results
- Unhealthy - the check reported Unhealthy results
- Error - a problem creating or executing the Health Check
- Skipped - the check was not applied to the node (e.g. it did not match the filter)
- Unknown - the check could not run or was not conclusive
Visit the "Project Settings... > Edit Nodes" page. Under the Configuration tab, check the "Health Checks Enabled" checkbox:
Alternately, in the project configuration properties file, add the configuration:
Visit the sidebar link "Health Checks"
Click on the "Configure" Tab, and add a Health Check Plugin. Here we add the simple Command Health Check plugin, and leave the default command of
uname. Click "Save" and "Save" again.
Return to the Nodes Tab to see a list of nodes.
You may see a message "Unauthorized: cannot execute on node". You will need to add an ACL Policy to allow the Health Check System to run commands and scripts on the target nodes. See Access Control.
Once you have configured Access Control, you should see successful "healthy" checks:
Return to the "Project Settings... > Edit Nodes" page. Under "Enhancers" click "Add a new Node Enhancer" and choose "Health Status".
You can modify the settings, or keep the defaults. Make sure "UI Status Attributes" is added, to add UI indicators. Then click "Save" and "Save" again.
Visit the "Nodes" link in the Sidebar. You should now see the healthy status indicators for the nodes:
# Job Filter
You can use the "Exclude Filter" in Job definitions to filter out unhealthy nodes, while still indicating in the UI those nodes will be excluded. Make sure to set "Show Excluded Nodes" to "Yes". If some nodes are unhealthy you will see the node but it will be crossed out:
When you run the Job, the excluded nodes will be indicated and automatically deselected. Note: if "Show Excluded Nodes" is set to "No", the excluded nodes will not be shown at all.
# Access Control
To execute commands and scripts on Nodes, the Health Checks System adopts a username/role of "system/system".
You can control what nodes are allowed to be executed on by adding an appropriate ACL Policy.
Here is an example ACL Policy to allow access to all nodes within a project.
by: group: system for: node: - allow: run description: Allow run on all nodes for system Health Checks
You can also change the Username and Role adopted by the Health Checks System with the following configuration in
# Health Status Node Attributes
When the "Health Status" Node Enhancer is applied, it will add Node Attributes to each node checked.
The attributes it adds contains the summary Health Status, as well as individual health check statuses. It can also add UI status attributes, and cache information.
By default the prefix for all healthcheck attributes is
healthcheck: but this can be modified.
healthcheck:statusthe overall health status. One of
healthcheck:durationthe total time for performing healthchecks, in milliseconds, e.g.
healthcheck:lastCheckTimethe last check time timestamp, e.g.
Tue Nov 26 09:54:18 PST 2019
Individual Health Check results. If a "label" is defined on the health check plugin configuration, the attribute will use the label in the prefix. Otherwise, the attribute will use the health check number in the prefix. E.g.
healthcheck:label:statusplugin status result
healthcheck:label:messageany message from the plugin
healthcheck:label:$ATTRany attributes captured by the plugin