# Health Checks (Enterprise)

Available in Rundeck Enterprise

# Overview

Health Checks allow you to check the Health Status of Nodes periodically and on-demand. You can show the heatlh status visually in the GUI, and use the status to filter out unhealthy nodes when running Jobs.

Health Checks
Health Checks

Configure how you want to determine the Health Statuses of your Nodes, using a Command or Script.

Capture output of the command or script to add as attributes to your nodes.

Expose the status as Node Attributes using the Health Status Node Enhancer, and use the health check attributes inside your node filters.

Use the secondary node filter feature of Jobs to pre-filter out unhealthy nodes, and see which nodes will be targetted and which will be filtered out before running a Job.

# Health Checks System

The Health Checks System operates across several parts of the Rundeck System:

  • Project Nodes - as determined by your Nodes configuration.
  • Health Checks configuration - the definition of which checks to run in your project configuration.
  • Node Execution - Command and Script execution use the configured Node Executor for the nodes
  • ACL Policies - Access control definition used by the Health Checks when performing Node Execution
  • Health Checks System - periodically and asychronously performs the Health Checks
    • Health Checks Cache - a cache of the results of the health checks
  • Node Enhancer - the "Health Status" Node enhancer layers additional attributes onto the Project Nodes by reading data from the Health Checks Cache

# Usage

You enable Health Checks in your Project configuration.

You can configure multiple Health Check Plugins for each project, and each Health Check can apply to all nodes (default) or a select set of nodes using a Node Filter.

Each Health Check can have a "label", which identifies it within the generated Node Attributes.

# Running Health Checks

Health Checks will be run on-demand asynchronously and the results will be cached for a period of time. The on-demand aspect is triggered when Health Status information is requested of the Health Checks System. The request can be triggered by accessing the Nodes page of Rundeck, or otherwise reading the Nodes data, such as preparing to run a Job. Initially, each node would be given an "Unknown" status, until the Health Checks are completed.

After a period of time, the Health Checks results will expire, and another request for Nodes data would trigger a refresh of the data.

You can also Refresh the results in the GUI directly, which will cause the checks to be run again for the nodes.

# Status Results

Each Health Check will result in a Health Status:

  • Healthy - the check reported Healthy results
  • Unhealthy - the check reported Unhealthy results
  • Error - a problem creating or executing the Health Check
  • Skipped - the check was not applied to the node (e.g. it did not match the filter)
  • Unknown - the check could not run or was not conclusive

# Setup

  1. Visit the "Project Settings... > Edit Nodes" page. Under the Configuration tab, check the "Health Checks Enabled" checkbox:

    Health Checks Enabled
    Health Checks Enabled

    Alternately, in the project configuration properties file, add the configuration:

    project.healthcheck.enabled=true
    
  2. Visit the sidebar link "Health Checks"

    Sidebar - Health Checks Link
    Sidebar - Health Checks Link
  3. Click on the "Configure" Tab, and add a Health Check Plugin. Here we add the simple Command Health Check plugin, and leave the default command of uname. Click "Save" and "Save" again.

    Configure - Add Health Check Plugin
    Configure - Add Health Check Plugin
  4. Return to the Nodes Tab to see a list of nodes.

    You may see a message "Unauthorized: cannot execute on node". You will need to add an ACL Policy to allow the Health Check System to run commands and scripts on the target nodes. See Access Control.

    Health Checks - Unauthorized Warning
    Health Checks - Unauthorized Warning
  5. Once you have configured Access Control, you should see successful "healthy" checks:

    Health Checks - Healthy checks
    Health Checks - Healthy checks
  6. Return to the "Project Settings... > Edit Nodes" page. Under "Enhancers" click "Add a new Node Enhancer" and choose "Health Status".

    Health Checks - Add Node Enhancer
    Health Checks - Add Node Enhancer

    You can modify the settings, or keep the defaults. Make sure "UI Status Attributes" is added, to add UI indicators. Then click "Save" and "Save" again.

    Health Checks - Add Health Status Enhancer
    Health Checks - Add Health Status Enhancer
  7. Visit the "Nodes" link in the Sidebar. You should now see the healthy status indicators for the nodes:

    Health Checks - Node Health Status UI
    Health Checks - Node Health Status UI

# Job Filter

You can use the "Exclude Filter" in Job definitions to filter out unhealthy nodes, while still indicating in the UI those nodes will be excluded. Make sure to set "Show Excluded Nodes" to "Yes". If some nodes are unhealthy you will see the node but it will be crossed out:

Health Checks - Job Definition - Exclude Unhealthy Nodes
Health Checks - Job Definition - Exclude Unhealthy Nodes

When you run the Job, the excluded nodes will be indicated and automatically deselected. Note: if "Show Excluded Nodes" is set to "No", the excluded nodes will not be shown at all.

Health Checks - Run Job - Exclude Unhealthy Nodes
Health Checks - Run Job - Exclude Unhealthy Nodes

# Access Control

To execute commands and scripts on Nodes, the Health Checks System adopts a username/role of "system/system".

You can control what nodes are allowed to be executed on by adding an appropriate ACL Policy.

Here is an example ACL Policy to allow access to all nodes within a project.

by:
  group: system
for:
  node:
    - allow: run
description: Allow run on all nodes for system Health Checks

You can also change the Username and Role adopted by the Health Checks System with the following configuration in rundeck-config.properties:

rundeck.healthcheck.access.username=system
rundeck.healthcheck.access.role=system

# Health Status Node Attributes

When the "Health Status" Node Enhancer is applied, it will add Node Attributes to each node checked.

The attributes it adds contains the summary Health Status, as well as individual health check statuses. It can also add UI status attributes, and cache information.

By default the prefix for all healthcheck attributes is healthcheck: but this can be modified.

  • healthcheck:status the overall health status. One of HEALTHY,UNHEALTHY,UNKNOWN,ERROR or SKIPPED

Cache information:

  • healthcheck:duration the total time for performing healthchecks, in milliseconds, e.g. 950
  • healthcheck:lastCheckTime the last check time timestamp, e.g. Tue Nov 26 09:54:18 PST 2019

Individual Health Check results. If a "label" is defined on the health check plugin configuration, the attribute will use the label in the prefix. Otherwise, the attribute will use the health check number in the prefix. E.g. healthcheck:mycheck: or healthcheck:1:.

  • healthcheck:label:status plugin status result
  • healthcheck:label:type plugin type
  • healthcheck:label:message any message from the plugin
  • healthcheck:label:$ATTR any attributes captured by the plugin