Docker HEALTHCHECK instruction

Docker healthcheck instructions
Docker healthcheck instructions

There can only be one HEALTHCHECK instruction in a Dockerfile

In this post we will show you how to use the Docker HEALTHCHECK instruction when defining a Dockerfile.

HEALTHCHECK has two forms:

  • HEALTHCHECK [OPTIONS] CMD command (check container health by running a command inside the container)
  • HEALTHCHECK NONE (disable any healthcheck inherited from the base image)

The DockerHEALTHCHECK instruction tells Docker how to test a container to check that it is still working. This can detect cases such as a web server that is stuck in an infinite loop and unable to handle new connections, even though the server process is still running. Healthchecks are quite important for High Availability as we can resize and scale our environment should we detect an issue with the container.

When a container has a healthcheck specified, it has a health status in addition to its normal status. This status is initially starting. Whenever a health check passes, it becomes healthy (whatever state it was previously in). After a certain number of consecutive failures, it becomes unhealthy.

The options that can appear before CMD are:

--interval=DURATION (default: 30s)
--timeout=DURATION (default: 30s)
--start-period=DURATION (default: 0s)
--retries=N (default: 3)

The health check will first run interval seconds after the container is started, and then again interval seconds after each previous check completes.

If a single run of the check takes longer than timeout seconds then the check is considered to have failed.

It takes retries consecutive failures of the health check for the container to be considered unhealthy.

start period provides initialization time for containers that need time to bootstrap. Probe failure during that period will not be counted towards the maximum number of retries. However, if a health check succeeds during the start period, the container is considered started and all consecutive failures will be counted towards the maximum number of retries.

There can only be one HEALTHCHECK instruction in a Dockerfile. If you list more than one then only the last HEALTHCHECK will take effect.

The command after the CMD keyword can be either a shell command (e.g. HEALTHCHECK CMD /bin/check-running) or an exec array (as with other Dockerfile commands; see e.g. ENTRYPOINT for details).

The command’s exit status indicates the health status of the container. The possible values are:

0: success - the container is healthy and ready for use
1: unhealthy - the container is not working correctly
2: reserved - do not use this exit code

For example, to check every five minutes or so that a web-server is able to serve the site’s main page within three seconds:

HEALTHCHECK --interval=5m --timeout=3s \
  CMD curl -f http://localhost/ || exit 1

To help debug failing probes, any output text (UTF-8 encoded) that the command writes on stdout or stderr will be stored in the health status and can be queried with docker inspect. Such output should be kept short (only the first 4096 bytes are stored currently).

When the health status of a container changes, a health_status event is generated with the new status.

The HEALTHCHECK feature was added in Docker 1.12.

In order to test our healthcheck we are going to define a Dockerfile with a NGINX container and a HEALTHCHECK instruction running curl locally.

The Dockerfile will look like this:

# Version: 0.0.1
FROM ubuntu:16.04
RUN apt-get update; apt-get install -y nginx
RUN apt-get install -y curl
RUN echo 'I am Healthy' > /var/www/html/health.html
RUN echo 'Hello World' >/var/www/html/index.html
EXPOSE 80
HEALTHCHECK --interval=1m --timeout=3s CMD curl -f http://localhost/health.html || exit 1

The default ubuntu image doesn’t include the curl utility so we install it using the RUN Dockerfile instruction. Then we create a health.html which we will check for making sure all is good.

What we are going to do with this Dockerfile is simulate a health check on the backend of an application. Let’s say that when all the stack is UP and RUNNING we generate the health.html file. If there is some error with the application we shouldn’t see the file so what we are going to do is to manually delete that file from within the container and observe if the health check fails. In that way we know the health check is working as expected. The curl command includes the -f flag (Short for –fail). This flag indicates curl that we want an exit code different to zero if there are server errors. This way when we try to fetch the health.html file from the server and it doesn’t exist curl will produce an exit code different to 0 and the Docker healthcheck will pick that up.

So lets run our container with the following command and see what happens:

docker run -d -p 80:80 --name static_web static_web nginx -g "daemon off;"

Let’s wait a minute and check with docker ps and docker inspect if the health check is passing:

# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                   PORTS                NAMES
492ea4197c41        static_web          "nginx -g 'daemon of…"   2 minutes ago       Up 2 minutes (healthy)   0.0.0.0:80->80/tcp   static_web
# docker inspect --format='{{json .State.Health.Status}}' static_web
"healthy"

We can see the status is healthy for the container. What we are going to do now is to simulate a failure, so we will remove the file from the container running:

# docker exec -i -t static_web /bin/bash
root@492ea4197c41:/# rm /var/www/html/health.html 

And now we wait 3 minutes, to give some time to the health check to retry 3 times and we check if status changed to unhealthy:

# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                     PORTS                NAMES
492ea4197c41        static_web          "nginx -g 'daemon of…"   8 minutes ago       Up 8 minutes (unhealthy)   0.0.0.0:80->80/tcp   static_web
# docker inspect --format='{{json .State.Health.Status}}' static_web
"unhealthy"

Great!, Exactly as we expected. This show how we can monitor the health for our containers based on our web stack status.