Containers are an incredibly powerful tool for deploying and scaling sites. For a long time, though, I felt as though I’d missed the boat on what they are and how to use them. Every post I could find was a tutorial that explained how to start one, but not what they are. This is the post I wish I had found first. It’s not a tutorial and it’s not about lessons learned. We’re going to start with the basics and work our way up to understanding containers as a concept.
The Short Answer
So, what is a container? The short answer: a container is a single running process that lives in an isolated file system. As with most things, the short answer isn’t very satisfying though.
The Long Answer
At first, I was tempted to incorrectly think of containers as as just another term for a virtual machine. That “container” was just a word coined by the popular Docker project, just as Vagrant calls its virtual machines, “boxes.” That was a dangerous assumption, albeit an easy one to make. A “container” just sounds like a Vagrant “box.” Adding to the confusion, many of their benefits overlap one another. If you stop reading here, take this one thing away: a container is not a virtual machine.
As long as we’re defining containers by what they’re not, I should make it clear that Docker is not a container, it’s just a tool for managing containers. And there are many like it; Rkt and LXC (the original container runtime) are the big ones. Nor are containers new – the technology has been around for more than a decade. Docker just made them mainstream.
Alright, so what is a container then? Earlier, I said that containers are processes with file system isolation. In order to know why that’s such a powerful concept, we need to know a little about how Linux works.
The Deep Dive
In the Linux world, everything having to do with your computer can be represented by a file (more properly, a file descriptor). Your hard drive, your USB port, stdin and stdout, they all live in your filesystem as a file. To write a file to a USB stick, you can pretty much just write to the file (directories are file descriptors too) that represents it at /dev/usb. The same goes for anything else on the system.
Even processes can be represented by file descriptors under the /proc directory. Finally, commands like 'grep' are really just files at the end of the day. They’re just saved in an executable format containing instructions for your processor to execute – so we typically just call them ‘executables.’
It’s these executables, like grep, cat, vim, wget, or curl, that make Linux systems really usable. Even your shell – probably bash – is an executable. If you run /bin/bash you’ll just start a new session and when you kill it by typing exit you’ll just end up in your original session.
Differing bundles of these executable and configuration files are what define a Linux distribution. Put another way, a distribution is just a bunch of files shipped with a Linux kernel. For example, one distribution might not even come with bash, but zsh or fish by default.
Once you’ve installed a particular distribution, you usually start adding your own files to it. You might install Nginx, Apache, or Node to run some web app. The versions of those files that you choose become your app’s dependencies.
Now, with those things in mind, we can answer the question, “why is a process with file system isolation so powerful?” By installing all those files to some directory and starting a process as if they are the only thing on the computer, we can create a whole new world for that process. Since a Linux distro is just a bundle of files, you can run a process as if it’s in an Ubuntu system on a CentOS machine. You can run 3 different versions of PHP in conjunction with 4 different versions of Node, and another 2 versions of Ruby – each of them living side by side in their respective containers. No NVM, no RVM, no need to run different servers just to run different apps and dependencies. If it’s Linux, it works. That’s the power of containers.
At this point, it’d be fair to think, “soo... it’s like a VM then?” There’s a bit more to a virtual machine though. A VM emulates the processor, the GPU, the memory space. Everything. You can even emulate a different processor architecture. When you block off 2GB of RAM for a VM, that memory is forever lost to the rest of your system. In a VM, you install a new Linux kernel for every machine.
Containers are just isolated processes
In a container, you’re just starting a new process. It’s like running a long command from the shell. It runs until it exits. A container runs as long as the process is alive. As a side effect, your server can run things in a much more efficient way. When one container needs more memory, the Linux kernel will just allocate it more, and when the process is idling, its memory will be freed up for other processes. The same goes for CPU time.
This isolation brings up a few challenges. If this process is all locked up inside some file root, how does it talk to the outside world? Since we know everything's a file, how does an Apache or Nginx container actually serve requests (since a port is represented by a file)?
Where Docker fits in
That’s where container management tools like Docker, Rkt, LXC and others come in – they let you define various mappings from your host system into the container. Almost like symlinks that can cross the container barrier, these tools create a mapping between your host machine and your container. This means Apache can listen on port 80 inside the container and really be taking requests from a randomly assigned port on the host machine. This means that If you run five different Apache containers listening on port 80, they would still all be listening on a dedicated port. File storage can be mapped into a container as well by mounting a directory of the host machine at some location in the container. Since we’re not in a VM, there’s little to no overhead either – no NFS, rsync, or other tools. Your containerized process is writing and reading straight to the host.
Earlier, I said that Docker wasn’t a container, just a tool for managing them. Starting a process in another directory isn’t new, there’s chroot for that. Mounting directories isn’t new either. All this can and has been done for years. What tools like Docker have done is provide abstractions on top of all this tech to bring it to the masses.
Now that we’ve got the concepts down, hopefully you can better follow the many tutorials out there, play around, and find ways to take advantage of containerization for yourself.