Skip to main content

Automated Diagnostics - Kubernetes Logs


Automated Diagnostics - Kubernetes Logs

Solution Summary

This solution will walk through an example of enriching PagerDuty incidents by retrieving diagnostic data from a single data-source using a Rundeck Job. At the end of building this solution, when an incident is created in PagerDuty, users will be presented with a button to retrieve recent logs from Kubernetes pods and view those logs from within the PagerDuty user-interface.

The design principles outlined in this solution are applicable to most other use-cases for retrieving diagnostic-data or invoking remediation.

Solution Prerequisites

For this guide, Process Automation or Rundeck Community must be installed and running. Instructions for both products are provided below.

A PagerDuty account with the Automation Actions add-on enabled is also required. Automation Actions is available as an add-on for Business and Digital Operations pricing plans. Please contact usopen in new window if you would like to upgrade your plan or to trial Automation Actions.

This solution is meant to demonstrate design principles, and therefore the steps outlined in this Rundeck Job may not be applicable to your specific environment.

Configure Rundeck Job

  1. In PagerDuty, generate an API Access Key with (at minimum) Responder permissions. This can either be a General Access API Keyopen in new window or a User Token API Keyopen in new window.
  2. Copy the PagerDuty API Key into Rundeck's Key Storageopen in new window as a Password or into your secrets-manager that is integrated with Rundeck.
  3. Download the sample Rundeck Job YAML from this linkopen in new window. (Right click and select Save Link As... and be sure to append .yaml to the file name). Note that this Job definition will only work with Rundeck Enterprise. Click the Rundeck Community tab if you are using Rundeck Community.
  4. Upload the sample job to your Process Automation instance by navigating to the Jobs tab, selecting Job Actions in the upper-right, then selecting Upload Definition. You can find more detailed instructions for uploading a Job Definition hereopen in new window.

    Upload Job

  5. Edit the Job by clicking Edit This Job:

    Edit Job

  6. Click into the Workflow tab and then in the Options section, select the k8s_selector option, and modify the selector to determine which pods to pull logs from:

    Edit Selector

    Edit Selector2

  7. Click into Step 2 ("Post Logs to PagerDuty"). Click the Select button next to API Key to select your API Key from Key Storage. If you used a User Token API Key for PagerDuty, then be sure to modify the email-address as well:

    Edit Job2

  8. Click Save on the step as well as Save on the Job.

Note

This Rundeck Job is meant to be invoked from PagerDuty, not through the Rundeck GUI. There is a hidden Job Option for the PagerDuty Incident ID. If you run the Job directly from the Rundeck Interface, the Job will fail on Step 2, as it is expecting to have the PagerDuty incident ID as an input parameter.

Configure PagerDuty Automation Actions

PagerDuty Automation Actions Architecture

The PagerDuty Automation Actions Runner is installed in your environment and requires outbound-only access to the PagerDuty SaaS platform as well as bi-directional communication with your Rundeck instance. You do not need to allow for any inbound protocols from PagerDuty to your infrastructure.

  1. Create a Rundeck User API Token by navigating to User Icon -> Profile and click the + next to User API Tokens:

    RD Token
  2. Enter a Name for the API Token and choose a Role that has the correct levels of permissions to invoke the uploaded Job.
  3. Follow the instructions outlined hereopen in new window to install and configure the PagerDuty Actions Runner. Optionally use the PagerDuty API Token generated earlier for the Rundeck Job, or generate a new API Token - this token needs Read Only permissions.
  4. Use the API Token generated in Step 1 and the Rundeck URL to fill in the rundeck_token and rundeck_url fields in the pdrunner-creds configuration file.
  5. In Rundeck, copy the Job ID from the job invocation page:

    Job ID

  6. In PagerDuty, navigate to Automation -> Rundeck Actions -> Add Action:

    Add Action

  7. Fill in the Automation Action details with the desired Name and Description. Select rundeck as the type of action and Diagnostic as the category.
    Paste the job ID into the Job ID field and insert -pd_incident_id ${pagerduty.incidentId} into the Rundeck arguments field:

    Action Details

  8. Select the Runner that you installed from Step 3 and then select the same Kubernetes service associated with the Kubernetes Selector from Step 6 of configuring the Rundeck Job.

Run the Auto-Diagnostics Action from PagerDuty Incidents

When incidents are created on the Service associated with the Rundeck Action, there will now be an option in the Run Actions dropdown that will trigger the automation configured in the prior sections to retrieve Kubernetes Logs:

  1. Click on the Run Actions dropdown, and then click the automation-action configured in the prior section:

    Run Action

    This will post the diagnostic-data to the PagerDuty incident timeline:

    Timeline Output