Enable Diagnostics for RabbitMQ
Enable Diagnostics for RabbitMQ
When troubleshooting RabbitMQ servers or containers, gathering relevant logs, system-level metrics and environment information can provide valuable insights into the overall state of the node.
To streamline this process, the RabbitMQ team has developed an official support tools repository. This repository contains the "rabbitmq-collect-env.sh" shell script that collects RabbitMQ-specific logs, along with selected OS logs and system-level metrics, to aid in debugging and diagnosing issues.
In this how-to, we will explore how to effectively use the "rabbitmq-collect-env.sh" script and integrate it into a Rundeck job for debugging RabbitMQ servers/containers.
What is RabbitMQ?
RabbitMQ is an open-source message-broker software that allows applications to communicate with each other using messaging protocols. It is a message queuing system that enables different applications or services to asynchronously exchange messages or data in a decoupled manner.
RabbitMQ supports several messaging protocols, including Advanced Message Queuing Protocol (AMQP), Streaming Text Oriented Messaging Protocol (STOMP), Message Queuing Telemetry Transport (MQTT), and others. It also provides features such as message routing, reliable message delivery, message acknowledgments and message queuing, which make it a robust and scalable solution for building distributed systems.
In RabbitMQ, messages are sent to a queue by a producer and then consumed by one or more consumers. Consumers can subscribe to a specific queue or set of queues and receive messages as they arrive. RabbitMQ also supports message exchange patterns, such as direct, topic, fanout, and headers, which allow for more complex message routing scenarios.
RabbitMQ can be used in various scenarios where different applications or services need to communicate with each other asynchronously. Some common use cases for RabbitMQ include:
Microservices architecture: RabbitMQ is often used as a messaging layer in microservices architecture to enable communication between different microservices.
Event-driven systems: RabbitMQ can be used in event-driven systems where different components need to react to events or changes in the system.
You can learn more about RabbitMQ here.
Debugging RabbitMQ through Rundeck Job
This example uses the "rabbitmq-collect-env.sh" script, an open-source tool provided by the RabbitMQ team. It gathers RabbitMQ logs, selected OS logs, system-level metrics (such as iostat and kernel limits) and other environment information. While some of this data may not be directly related to RabbitMQ, it can offer additional insights into the overall state of the node and assist in troubleshooting.
Using this Rundeck workflow, you’ll work through the following steps:
- Run the rabbitmq-collect-env script on the RabbitMQ server:
To debug a RabbitMQ server/container, executes the "rabbitmq-collect-env.sh" script on the target node. The script gathers various logs and environment information and creates a compressed archive file for analysis. - Analyze the collected data:
Once the "rabbitmq-collect-env" script completes, you will have a compressed archive file containing all the gathered logs and environment information. - Extract the archive:
Analyze the contents using tools or techniques appropriate for your debugging requirements. Pay attention to RabbitMQ-specific logs, system metrics, and any anomalies that might point to a root cause. - Send the compressed file to a file server:
The compressed file is posted to an FTP service and all the data generated in step 4 is sent to an external web service as a notification. - Clean up the Rabbit MQ server:
The compressed file is deleted from the RabbitMQ server.
So, the following job definition achieves this workflow:
- defaultTab: nodes
description: |-
A Job example that collects data from a RabbitMQ server and puts the dump
file in a FTP server and notifies about the relevant information to a
web service.
executionEnabled: true
id: 101f7d6f-a58a-4bfb-a548-7325978eefaf
loglevel: INFO
name: CollectRabbitMQData
nodeFilterEditable: false
nodefilters:
dispatch:
excludePrecedence: true
keepgoing: false
rankOrder: ascending
successOnEmptyNodeFilter: false
threadcount: '1'
filter: 'docker:Config.Hostname: rmq'
nodesSelectedByDefault: true
notification:
onsuccess:
plugin:
configuration:
authentication: None
body: |-
Operating System Details:
${export.rmqos}
RabbitMQ Data
${export.rmqse}
contentType: application/json
method: POST
noSSLVerification: 'true'
remoteUrl: ${option.webserviceurl}
timeout: '30000'
type: HttpNotification
notifyAvgDurationThreshold: null
options:
- hidden: true
label: FTP Service Hostname
name: ftp_hostname
required: true
value: ftp
- hidden: true
label: FTP Service Password
name: ftp_password
required: true
secure: true
storagePath: keys/ftp_password
valueExposed: true
- hidden: true
label: FTP Service User
name: ftp_user
required: true
value: admin
- label: Web Service URL
name: webserviceurl
required: true
value: https://webhook.site/xxx-xxx-xxx-xxx-xxx
plugins:
ExecutionLifecycle: {}
scheduleEnabled: true
schedules: []
sequence:
commands:
- fileExtension: .sh
interpreterArgsQuoted: false
plugins:
LogFilter:
- config:
invalidKeyPattern: \s|\$|\{|\}|\\
logData: 'true'
name: output_archive
regex: .*\'(.*)\'.*
replaceFilteredResult: 'false'
type: key-value-data
scriptInterpreter: /bin/bash
scripturl: https://raw.githubusercontent.com/rabbitmq/support-tools/main/scripts/rabbitmq-collect-env
- configuration:
cycles: '1'
interval: '3'
progress: 'true'
description: Wait three seconds
nodeStep: true
type: nixy-waitfor-sleep-workflow-node-step
- description: Uncompress the dump to print some values in the job
exec: cd /var/log/rabbitmq; tar xvf ${data.output_archive}
- description: Operating System related data
plugins:
LogFilter:
- config:
captureMultipleKeysValues: 'true'
hideOutput: 'false'
logData: 'true'
name: os
regex: (.*)
type: key-value-data-multilines
script: |
echo "#############"
echo "SERVER HEALTH"
echo "#############"
echo ""
echo "Hostname:"
cat /var/log/rabbitmq/rmq/system/hostname
echo ""
echo "Operating System:"
cat /var/log/rabbitmq/rmq/system/uname
echo ""
echo "Uptime:"
cat /var/log/rabbitmq/rmq/system/uptime
echo ""
echo "VMSTAT data:"
cat /var/log/rabbitmq/rmq/system/vmstat
- description: RabbitMQ specific data
plugins:
LogFilter:
- config:
captureMultipleKeysValues: 'true'
hideOutput: 'false'
logData: 'true'
name: rmq
regex: (.*)
type: key-value-data-multilines
script: |
echo "#############"
echo "SERVER HEALTH"
echo "#############"
echo ""
echo "RMQ Environment:"
cat /var/log/rabbitmq/rmq/rabbitmq/rabbitmqctl_environment
echo ""
echo "RMQ Status:"
cat /var/log/rabbitmq/rmq/rabbitmq/rabbitmqctl_status
echo ""
echo "RMQ PID Limits:"
cat /var/log/rabbitmq/rmq/rabbitmq/rabbitmq_pid_limits
- description: 'Copies the dump file to a ftp server, this steps could be changed
to another remote service'
script: 'lftp -e "put -O / @data.output_archive@; bye" -u @option.ftp_user@,@option.ftp_password@
@option.ftp_hostname@'
- configuration:
export: rmqos
group: export
value: ${data.os*}
description: RabbitMQ Operating System data
nodeStep: false
type: export-var
- configuration:
export: rmqse
group: export
value: ${data.rmq*}
description: RabbitMQ Service Exported Variable
nodeStep: false
type: export-var
- description: 'Last step: Deletes the dump files from the RMQ Server'
exec: rm ${data.output_archive}; rm -rf /var/log/rabbitmq/rmq; echo "All done!"
keepgoing: false
strategy: sequential
uuid: 101f7d6f-a58a-4bfb-a548-7325978eefaf
The "rabbitmq-collect-env" script is a valuable tool for debugging RabbitMQ servers/containers. By incorporating it into your troubleshooting process, you can collect relevant logs, system-level metrics, and environment information, aiding in the identification and resolution of issues. Integrating the script with Rundeck further streamlines the debugging workflow, enabling automation and centralized management of RabbitMQ debugging tasks.
Resources
- Rabbit MQ documentation.
- Rabbit MQ support Repository.