Automated Diagnostics
Automated Diagnostics
Public Cloud Providers
For individuals responding to incidents in the public cloud, useful diagnostics can be retrieved from the public-cloud provider about the health of the platform services as well as from the infrastructure and applications running on the public cloud provider. This section outlines methods for retrieving diagnostics from the public cloud providers’ platform services. If you’re looking for details on retrieving diagnostics from the OS, applications and databases, refer to their respective sections.
AWS
For AWS users, some examples of diagnostics would be:
- Check the health of all EC2 instances behind an Application Load Balancer (ALB) or Elastic Load Balancer (ELB).
- Retrieve reasons for stopped ECS tasks.
- Look in CloudWatch logs for operating-system and application errors.
There are multiple plugins that allow users to pull diagnostics from common AWS Services:
- Query CloudWatch Logs
- Query Athena tables
- Check ELB Targets Status
- Retrieve failed ECS container messages
- Invoke script via Lambda
In addition to using the AWS Plugins, it is also possible to harness the AWS CLI within your Automation Instance:
If using Runbook Automation Self-Hosted, or a Runner, then you can also execute scripts that leverage the AWS SDK, such as Boto3 for python.
These multiple methods of communication with AWS allow you to be flexible in your approach for retrieving Diagnostics or managing your AWS environments.
Azure
For users of Azure, the most common method of retrieving diagnostics from the public cloud platform is by “wrapping around” the Azure CLI using the Command Job Step plugin. As an example, you may want to retrieve the health of a Function App:
az monitor metrics list --resource myresource --resource-group myresourcegroup --resource-type "Microsoft.Web/sites" --metric "HealthCheckStatus" --interval 5m
Another example would be to check the health of an Azure container registry:
Point of Interest
Azure has a full article here dedicated to diagnosing Container Registry behavior.
A diagnostic runbook could incorporate the az acr check-health
command and translate the output using the error codes found in this article.
There are multiple plugins that allow users to pull diagnostics from Azure services:
Google Cloud Platform (GCP)
For users of Google Cloud Platform (GCP), the most common method of retrieving diagnostics from the public cloud platform is by “wrapping around” the gcloud CLI using the Command Job Step plugin. As an example, you can retrieve the current health status of instances in a backend service:
Point of Interest
This blog from Google Cloud on Debugging Health Checks in Load Balancing on Google Compute Engine outlines a number of steps for diagnosing Health Check failures.
These steps can be treated as a runbook and “transposed” into your Automation instance using the Remote Command
There are multiple plugins that allow users to pull diagnostics from GCP services: