Application log collection scheme in container

Source: Rancher Labs

Containerized application log collection challenges

Application log collection, analysis and monitoring is an important part of the daily operation and maintenance work, properly handle the application log collection is often an important issue in the application of container.

Docker log processing method is through the docker engine to capture each container process STDOUT and STDERR, by setting a different log driver for the contrainer to achieve the collection of container logs, the default json-file log driver is the container STDOUT / STDERR output saved Disk, and then the user can use the docker logs <container> to query.

When deploying a traditional application, the way an application logs a log is usually recorded in a file, usually (but not necessarily) recorded in the / var / log directory. After the application of the container, unlike the past, all the log on the host system in a unified location, the log scattered in many different containers of isolated environment.

How to collect applications written in the container log records, have the following challenges:

1) Resource consumption If you run a log collection process in each container, such as log tools such as logstatsh / fluentd, logs collection tools such as logstatsh / fluentd consume a lot of system resources when the host container density is high. The above method is the most simple and intuitive, but also the most consumption of resources.

2) applications invade some of the traditional applications, especially the legacy system, write log mechanism is often unable to configure and change, including the application log format, storage address and so on. Log collection mechanism, to avoid the need to modify the application.

3) log source identification using a unified application log collection program, the log scattered in many different containers of the isolated environment, the need to address the log source identification problem.

The function of log source recognition can help you identify log sources even if a container is dispatched to another host during runtime while using the rancher platform's named rule feature for container_name.

Containerized application log collection scheme

The following is a low-resource resource consumption, no application of intrusion, you can clearly identify the log source of a unified log collection program, the program has been in the Rui Yunzhi customers have successfully implemented the case.

Write a picture here

In this scenario, a wise2c-logger will be deployed at each host, and wise2C will listen to the event of the docker engine. When a new container is created and destroyed, it will determine whether the local volume associated with the log is created or destroyed, According to lables, wise2c-logger will dynamically configure logstatsh input, filter and output, to achieve the application log collection and distribution.

1) How to configure the application container when the need for the application container to mount a dedicated log written volume, in order to distinguish the volume and other data volume container, we define the volume in the container, through the volume_from command to the application Container, the following is an example: demo application docker-compose file

Write a picture here

The web-data container uses a local volume, mount to the / var / log directory (which can also be other directories), define several tags in web-data, and io.wise2c.logtype Description This container contains the log directory, Inside the value elasticsearch, kafka can be used to indicate the log output or filter conditions.

Then we now look at the wiselogger general workflow:

Write a picture here

Listen to the new log container -> get the log container's type and local directory -> generate a new logstash configuration:

1) wise2c-looger intercepts the docker events event, checks if there is a log container created or destroyed;

2) When the log container is created (by container label), the container's volume is in the host's path;

3) reconfigure the wise2c-logger built-in logstatsh configuration file, set the new input, filter and output rules.

Write a picture here

Here is the wise2c-logger in the rancher platform made catalog need docker-compose.yml screenshot, we can work together with the above description of the process.


At present we are still on the Wise2C-logger for further optimization:

1) Collect the STDOUT / STDERR log of the container. Especially for the container that uses the json-file driver, the container's STDIN / STDERR log is collected by scanning the json-file directory of the container host.

2) more built-in log collection program currently built-in default using logstatsh for log collection, and filtering and some simple transcoding logic. Future wise2C-logger can support some of the more lightweight log collection programs, such as fluentd, filebeat and so on.

Q & A

Q: Have you done a performance test? I side of the module's log throughput is relatively large. For example, on the basis of the number of log output based on the main logger module reserved for the number of system resources to ensure its normal and stable work?
A: did not do a lot of pressure, but we are now using the normal use of the bottleneck did not hit the performance. We do not have to logger resource constraints, but can take 300 ~ 400M memory, because there are logstash reasons.

Q: "Generate log container" means that each application container should correspond to a log container? So that the resource consumption will not be bigger? K8s the kind of log acquisition performance consumption than each application container corresponding to a log container high?
A: refers to each application container corresponds to a log container. Although each application has a log container, the log container is start once and does not consume runtime resources.

Q: What do you mean by the start once? I said that resources are a lot of logs to the time, so many log containers to consume a lot of io it, CPU usage will rise, will not affect the application of the use of containers CPU?
A: No, the log container is only generated and will not run continuously.

Q: how to monitor local volume?
A: can monitor the file directory, you can also request docker daemon.

Q: direct use syslog driver, can be done on the application of non-invasive?
A: start the container when the use of Syslog driver parameters can be used, so that almost no additional resources.

Q: This program is not to ensure that the application container log to output to / var / log ah?
A: No, you can freely define, logstah can catch syslog.

Q: can the syslog driver collect the log files in the container? Can the log of different flows in the container distinguish?
A: the local application of the container loglog syslog can be collected, the same can be completed, but the container of the local log This I personally feel that the application with the container environment without localized, stateless contrary to it.

Q: Finally, you said, re-configure the logstash configuration file, it seems that you are through the wiselog container to collect all the log? It is only dynamic configuration logstash inside parameters.
A: Yes, the collection is now done logstash, simple file collection, optional program quite a lot, there is no need to rebuild the wheels.

Q: That this program actually has a question, why not learn the kind of k8s, directly fixed that directory, through the regular expression to collect log files, but to do this dynamic? Is there any benefit? At the moment I feel that the two sets of programs are almost the same.
A: In order to reduce the invasion of the application. Because many users of the existing system can no longer be modified, this is done in order to reduce the user's existing program changes, for the most important "compatible with existing".

Q: Is there any other visualization program besides kibana?
A: For es, there is no other better program.

Q: If it is mounted log directory, logstash can be collected to the host, but also need other plug-ins do?
A: through the container can identify the application of the business logic, you can get the service name.

Q: Some applications output the log name is the same, there will be no conflict, such as I started two containers in a host, all write to xx.log will be a problem.
A: No, give each application container with a log volume container to solve this problem. This is also a difficult problem when we come out of the program. So one of the benefits of this program is that each application can be free to set the log directory, do not consider and other applications conflict, and homeless host will not be the same application conflict.

Q: the last listen to others that all the log thrown into the standard output, do not know Kaopu not?
A: someone reported that this approach, the log volume is large, docker daemon will collapse.

Heads up! This alert needs your attention, but it's not super important.