Skip to main content

Automated Diagnostics


Automated Diagnostics

Public Cloud Providers

For individuals responding to incidents in the public cloud, useful diagnostics can be retrieved from the public-cloud provider about the health of the platform services as well as from the infrastructure and applications running on the public cloud provider. This section outlines methods for retrieving diagnostics from the public cloud providers’ platform services. If you’re looking for details on retrieving diagnostics from the OS, applications and databases, refer to their respective sections.

AWS

For AWS users, some examples of diagnostics would be:

  • Check the health of all EC2 instances behind an Application Load Balancer (ALB) or Elastic Load Balancer (ELB).
  • Retrieve reasons for stopped ECS tasks.
  • Look in CloudWatch logs for operating-system and application errors.

There are multiple plugins that allow users to pull diagnostics from common AWS Services:

In addition to using the AWS Plugins, it is also possible to harness the AWS CLI within your Automation Instance:

AWS CLI in a Job Step
AWS CLI in a Job Step

If using Process Automation (on-premise), or a Runneropen in new window, then you can also execute scripts that leverage the AWS SDK, such as Boto3 for python.

These multiple methods of communication with AWS allow you to be flexible in your approach for retrieving Diagnostics or managing your AWS environments.

Azure

For users of Azure, the most common method of retrieving diagnostics from the public cloud platform is by “wrapping around” the Azure CLI using the Command Job Step plugin. As an example, you may want to retrieve the health of a Function App:

az monitor metrics list --resource myresource --resource-group myresourcegroup --resource-type "Microsoft.Web/sites" --metric "HealthCheckStatus" --interval 5m
Azure CLI checks health of Function App
Azure CLI checks health of Function App

Another example would be to check the health of an Azure container registry:

Azure CLI checks errors in Container Registry
Azure CLI checks errors in Container Registry

Point of Interest

Azure has a full article hereopen in new window dedicated to diagnosing Container Registry behavior.
A diagnostic runbook could incorporate the
az acr check-health command and translate the output using the error codes found in this articleopen in new window.

There are multiple plugins that allow users to pull diagnostics from Azure services:

Google Cloud Platform (GCP)

For users of Google Cloud Platform (GCP), the most common method of retrieving diagnostics from the public cloud platform is by “wrapping around” the gcloud CLI using the Command Job Step plugin. As an example, you can retrieve the current health status of instances in a backend service:

Gcloud CLI checks backend instances health
Gcloud CLI checks backend instances health

Point of Interest

This blog from Google Cloud on Debugging Health Checks in Load Balancingopen in new window on Google Compute Engine outlines a number of steps for diagnosing Health Check failures.
These steps can be treated as a runbook and “transposed” into your Automation instance using the Remote Command

There are multiple plugins that allow users to pull diagnostics from GCP services: