Tuesday, December 20, 2016

Building Docker Images with Packer and Ansible

I recently worked on a project where we were explicitly asked to combine these three technologies. While I'm not hundred percent convinced it is the right pick for the customer it may be useful in some cases so I'll outline how to use them together.

Let's start with what they are.

Docker

Docker (https://www.docker.com) - build, ship, run. It is a tool made for running many small services. It replaces now common paradigm of hardware with VMs with containers in the operating system. It is useful when developing microservices, a generic service oriented architecture or when one has a few dependencies of the system and wants to decrease the dev-prod disparity and have it all the same from bottom up.

Packer

Packer (https://www.packer.io) is a tool for creating machine and container images for multiple platforms from a single source configuration. The configuration is excodessed with a set of provisioners which can be any combination of shell, Chef, Puppet, Ansible, Salt, you name it. The target platform is excodessed with a build. One can provide multiple builders at once though I guess it won't be as straightforward but we will get to that.

Ansible

Ansible (https://www.ansible.com) - automation for everyone. Dubbed as the most developer friendly automation tool it sports easy syntax, it is simple to translate from what devs are usually most acquiented - Bash - and it runs masterless so one does not need a special master node to handle updates like in cases of Chef and Puppet. The learning curve is favorable.

Why Would One Use Combine Them?

In our case we have a few services (2-5) which need some other services (LDAP, RabbitMQ, database) and need to be highly available (so everything needs to run at least twice and an ambassador pattern comes in handy). There is also a DMZ part with reverse proxies for SSL termination and loadbalancing. We have tens of environments but there are probably three categories of these environments and then they differ only in secrets - database, passwords, certificates. The rest can be configured using convention over configuration. We decided to bake the environment specific configuration into our images at the deployment time.

Now we need to build the Docker images we deploy. We already knew Ansible and as it turns out one can leverage that knowledge (and the Ansible tools like templates, Vault plugin, and more) to configure and to some extent build Docker images when one employs Packer. Another reason is that the customer is not hard set on Docker and may turn to AWS for instance. Then, so far teoretically, part of the job is done and we only need to configure another builder.

I would say it may also be handy for operations in bigger organizations where they need to maintain base images used by several teams with different target platforms. Then one may be able to output different images on demand relatively smoothly. Also one may leverage different provisioners (like all the main four - Chef, Puppet, Ansible, Salt) if say security team one and sysadmin team likes the other.

Anyway, for us it was requirement and this is how we made it work.

Putting Things Together

In our case we have a Packer configuration that contains Ansible provisioner and Docker builder. packer build first starts the base image in local Docker engine, then runs Ansible provisioner on it and then can tag the resulting image and push it to a Docker registry.

Packer

Packer is configured with a json file. The build is then started with packer build config.json.

In our case it looked like this:

Let`s go over it section by section since it was not as evident to figure it out.

Variables

  • One need to declare any variable that will be used in the config even if it is passed as a parameter. The parameters are passed into the build like this:
    packer build -var "env=$ENV" configure-web-app.json
  • Variables are referenced in the Packer configuration with {{user `app_version`}} placeholders.

Provisioners

  • type ansible for saying we will use Ansible. Sounds obvious but there is also ansible-local which invokes Ansible on the image but then one needs to install Ansible on the image.
  • playbook_file - references an Ansible playbook. The path is relative.
  • groups - needs to match with hosts in the playbook.
  • extra_arguments - here one passes variables that get into the playbook as well as some Ansible related configuration. The particularly hard ones were:
    • ansible_connection="docker" since the default is ssh and the documentation around Docker does not even mention another type.
    • ansible_user="root" since otherwise it throws some weird error and one finds the right answer in some bug reports. Again, sadly, not much help in the documentation.

Yes it is annoying to have to enumerate all the variables. We assume that the ansible_connection and _user would have to be overridden to something else for other builders.

Builders

We use only the Docker builder. We define:

  • image - the base image to start from.
  • run_command - it may look cryptic at start but these are just parameters to invoke docker with. The result would look like: docker -dit --name default run my-registry/web-app:1.0.42 /bin/bash. Hence it will not output standard output (-d/--detached), it will run interactive (-i/--interactive) so it will not terminate immediately, it will allocate pseudo tty (-t/--tty), the container will be named default so you know what to remove if you terminate Packer in the middle of the run, the base image will be what you've specified. It will run Bash.
  • commit - means the image will be commited (Packer does not say really what that means) but it can later be tagged and pushed so I assume it will be commited to the docker engine harbor or how they call the store of built images.

Post-processors

Post-processors allow you to say what should happen with the build result. We tag the image and push it to the local repository where it is picked up by the deployment (docker-compose).

Ansible

We should mention that running Ansible implies Python has to be installed on every image built which may bloat them a bit.

An Ansible playbook can be something simple like: The role tasks/main.yml can contain following:

Just a bit on what we are doing. We deploy a Java Spring web application running on Tomcat so the deployed image is based on a base image with Java, Tomcat and some helper scripts installed (like wait-for-it.sh). Then we add the application WAR file and configuration like environment properties or logback XML.

Since the configuration may differ for different usecase a bit we support different profiles. It may also as application evolves so it needs to be versioned. As I discussed here I find it useful to have such configuration separated from the source code so I used Spring Cloud Config Server backed with a Git repository (Spring Cloud Config example).

Secrets - Vault and ansible-vault

There are people who don't like secrets in Git in plaintext. We found so far two ways to do it:

  • ansible-vault - a built-in tool in Ansible that allows encrypting and decrypting files.
  • Vault (by Hashicorp) - an application that is storing securely secrets and allows access to them via remote API. It supports host of features like access tokens, fine-grained access rights, auditing, revoking only secrets which were compromised, certificates provisioning on demand and more.

Ansible-vault

Any file can be encrypted with ansible-vault encrypt file. ansible-vault also supports operations edit and decrypt.

If you look at our Ansible role at include secured variables task it references a file in files folder with name like sec-vars-my-env-5.yml which is encrypted with ansible-vault. When Ansible encounters such a file it decrypts it and then uses it. The password can be provided manually but to avoid having it floating around in commands one can specify a password file. Have a look in the Packer configuration JSON for --ansible-vault-password-file option.

The downside of ansible-vault is that one can use only one file to secure all files in one context - so the most fine-grained it gets is one password for every environment if there is no generic encrypted file for all of them. Also there is hardly any auditing of changes since one can only tell the file was changed. One can only dream about revoking someone's access etc.

So we used it only to showcase we can secure some parts of the configuration and it also helped us identify what all needs to be secured.

Vault

We are yet to probe in the Vault direction so there may be another post about that. I think it is also not relevant for this post.

Issues

Combining few tools with different abstraction inevitably introduces some problems.

Building Debian-based Image on CentOS

For instance, we work for a big company who provided us with VMs with latest CentOS. We run locally mostly Debian based distros. Everything works fine. At some point we need to build an image on such a host. Now one may want to install a package in the Ansible role. The base image is Debian. The build fails with:

Internet is silent. Digging did not help so I had to resolve to thinking and documentation :) Turns out Ansible's package module abstracts from different package managers but under-the-hood uses its modules like apt or yum which then call real apt-get or yum. But there is a catch - the apt module requires python-apt and aptitude on the host that executes the module. In our case the CentOS host. So our builds run only if underlying host has the right architecture. Well, we can (and did) revert to Dockerfiles for these kinds of tasks but it breaks the whole abstraction. We can also probe ansible-local Packer provisioner since it runs on the very machine it configures.

The Documentation Could be Better

Case in point, the whole setup of the magic of Packer config. When it is set up it's done for good but the lack of documentation on things like ansible_connection and ansible_user may repel some right at the very beginning.

Proxy

How to run a Packer with Docker behind corporate proxy? Well, so far we don't know and we resolved to run these tasks with plain Dockerfiles running docker build --build-arg http_proxy which propagates the system proxy setting to Docker for image build.


2 comments: