# Automated Diagnostics

# What is PagerDuty's Automated Diagnostics Solution?

Automated diagnostics is a solution provided by integrating PagerDuty's Incident Response and Runbook Automation products. By automating the retrieval of “diagnostic” data during incidents, you can shorten the length of incidents, reduce the number of individuals paged to help with resolution, and gather evidence for fixing the root-cause after the incident.

# Use Cases

There are multiple use-cases and benefits to the Automated Diagnostics solution. Here are a few of the most common examples:

  1. Improve Triage: surfacing diagnostic data can improve the time spent troubleshooting and the number of people pulled into incidents.
  2. Capture Environment State: by capturing the environment or application "state" during an incident, operations engineers and developers have evidence to help them fix code-level bugs and configuration errors - perhaps a while after the incident has been resolved.
  3. Realtime Updates: by querying backend services in realtime, an Incident Commander can more easily provide updates to stakeholders during an incident.

For more details on these use-cases, see this section of the solution-guide.

# Prebuilt Automation

PagerDuty provides a solution that helps users start automating diagnostics quickly. This Solution consists of prebuilt Automation Jobs that retrieve data from common infrastructure and services for investigating, debugging and diagnosing incidents:

Automated Diagnostics within PagerDuty
Automated Diagnostics within PagerDuty

Verbose Diagnostics in Process Automation
Verbose Diagnostics in Process Automation

As an example, if an incident is triggered for a service running in Kubernetes, PagerDuty Runbook Automation can retrieve information from logs, API’s, databases and other sources that support this service. This could be triggered with the click of a button or through event-driven invocation.

# Simplifying and Sharing Diagnostics

Diagnostics retrieved using Runbook Automation can be made available in multiple interfaces such as PagerDuty's Mobil App, Slack, and Microsoft Teams:

Diagnostics in Slack
Diagnostics in Slack

# Examples & Templates

This guide includes a full section on Examples & Best Practices - a preview of that is shown here:

Category Examples

Amazon Web Services
Stopped ECS Task Errors

ELB Targets Health

CloudWatch Logs

Microsoft Azure
Function App Health

Troubleshoot Azure File Sync

Load Balancer Health Probes

Google Cloud Platform
Debug Load Balancer Health Checks

Troubleshooting Firewall Rules

GKE Cluster Connectivity

Linux OS
List Top CPU Consuming Processes

Retrieve Errors from Syslog

List Top Disk Consuming Files

Windows OS
Active Directory Replication Diagnostics

Retrieve IIS Web Server Logs

SMB Connection Failures

APIs
Check Internal API Response Body

Retrieve Diagnostics from SaaS Tools


Kubernetes
Retrieve Recent Pod Logs

Recent Kubernetes Events

Pod Status & Error Messages

Databases
Top Resource Consuming Queries

Blocking Locks

Missing Indexes

Network Devices
BGP Route Flapping

Spanning Tree Issues

Duplex Mismatch

Observability Integrations
Retrieve Application Logs

Surface Relevant Graphs

Capture Time Sensitive Diagnostics