Operations Lead
Logging in a Docker Hosting World
Docker is reinventing the way we package and deploy our applications, bringing new challenges to hosting. In this blog post I will provide a recipe for logging your Docker packaged applications.
Goals
Going into this I had 2 major goals:
- Zero remote console logins - What is the number one cause for a developer / sys admin to login to the console on the remote environment? To inspect the logs. If we expose our logs via an API we immediately cut out the vast majority of our remote logins.
- Aggregation - I personally despise having to log into multiple hosts and inspect the logs for a clustered service. It can result in you being "grumpy" at the situation well before you start on the task you are meant to do.
If we have all our logs in one place, we don't have to worry about having to access multiple machines to analyse data.
Components
The following are the components which make up a standard logging pipeline.
Storage
I have started with the most core piece of the puzzle, the storage.
The component is in charge of:
- Receiving the logs
- Reliably storing them for a retention preiod (eg. 3 months)
- Exposing them via an interface (API / UI)
Some open source options:
- Logstash (more specifically the ELK stack)
- Graylog
Some services where you can get this right out of the box:
- Loggly
- AWS CloudWatch Logs
- Papertrail
These services don't require you to run Docker container based hosting. You can run these right now on your existing infrastructure.
However, they do become a key component when hosting Docker-based infrastructure because we are constantly rolling out new containers in place of the old ones.
Collector
This is an extremely simple service tasked with the job to collect all the logs and push them to the remote service.
Don't confuse simple with important though. I highly recommend you setup monitoring for this component.
Visualiser
On most occasions the "storage" component provides an interface for interacting with the logged data.
In addition we can also write applications to consume the "storage" components API and provide a command line experience.
Implementation
So how do we implement these components in a Docker hosting world? The key to our implementation is the Docker API.
In the below example we have:
- Started a container with an "echo" command
- Queried the Docker logs API via the Docker CLI application
What this means, is that we can pick up all the logs for a service IF the services inside the container are printing to STDOUT instead of logging to a file.
With this in mind, we developed the following logs pipeline, and open sourced some of the components:
- Expose service logs to the Docker daemon (Apache / Drupal Watchdog / Syslog)
- Collect the logs (https://github.com/previousnext/log)
- Visualise via the UI and CLI (https://github.com/nickschuch/cloudwatch-cli)
Conclusion
I feel like we have achieved a lot by doing this.
Here are some takeaways:
- The logs pipeline is generic and not Drupal specific
- We didn't reinvent the wheel on how logs are shipped to remote services.
- Some interesting projects were built along the way which can be used standalone.