Win a copy of Svelte and Sapper in Action this week in the JavaScript forum!

Aidan Hobson Sayers

Author
+ Follow
since May 09, 2016
Cows and Likes
Cows
Total received
5
In last 30 days
0
Total given
0
Likes
Total received
1
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Aidan Hobson Sayers

Kent O. Johnson wrote:Regarding database I/O performance, I had an issue where using a PostgreSQL database container with volumes from a data-only container from within a VM gave very unpredictable and erratic I/O behavior, which I am thinking is because of multiple layers of swapping between the physical machine and the virtual machine. I was using volumes so I am thinking the problem was not a container problem at all, just the configuration of the I/O driver for VMWare, the hypervisor I used.

Switching everything to physical machines outside of containers made things much faster and more consistent. I considered trying using containers on the physical machine though I hadn't got to that yet. I am glad you mentioned the different storage drivers, that they can affect performance, though it sounds like they are only relevant for in-container I/O. Is that right? If I wanted to run a PostgreSQL database container or any database container without mounting volumes then how would I go about achieving similar performance to using the DBMS on bare metal?

Using databases inside of containers seems to unnecessarily complicate backups since there is no SSHing into a container, though I could see an automated backup running to a directory mounted as a volume to solve that problem. Would you solve the backup use case like that?

Regarding network I/O, do you find that the virtual network device layer used by the Docker Engine adds significant enough overhead to warrant bypassing it all together with the "--net=host" option? How much of a performance gain have you seen that give? This is the first time I have seen that option and would like to learn more about it and where it would make sense.

Kent



Hi Kent

As far as I've seen, databases (large binary files frequently changed) are about the worst thing you can have inside a container. Aside from the storage driver thing docker layering will also do very poorly (e.g. if you run your schema in a new layer on a large dbspace, the whole dbspace will be copied). I generally avoid doing I/O heavy things inside a container if possible.

Storage drivers are a fundamental part of Docker, but generally you can get by without knowing anything about them. However, knowing how they work gives you some insight into edge cases (like databases). I'll briefly note the how each one works below:
  • aufs (https://docs.docker.com/engine/userguide/storagedriver/aufs-driver/) - each layer of an image consists of a set of files, and looking at the whole filesystem of a running container works by looking at the files of all layers for the container image (where files in layers higher up 'hide' files from layers lower down). When a container wants to write to a file, the file is copied from the highest layer with that file and used as the container copy.
  • overlayfs (https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/) - basically the same as aufs
  • devicemapper (https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/) - layers are a bunch of pointers to the actual block contents of files, and file contents are stored in a big pool shared between layers. When a container wants to write to a file, it just needs to copy the appropriate block of the file and alter one of the block pointers.
  • btrfs (https://docs.docker.com/engine/userguide/storagedriver/btrfs-driver/) - approximately the same idea as devicemapper
  • zfs (https://docs.docker.com/engine/userguide/storagedriver/zfs-driver/) - again, approximately the same idea as devicemapper
  • vfs - a 'fake' layer driver that copies the entire contents of the parent layer on creation


  • We can make some observations just on the designs outlined above:
  • aufs/overlayfs 'copy-file-on-write' means the first time you try and write to a large file (e.g. a database) it will copy the whole file, which will be slow, but once done should give fast access because you have your own copy
  • devicemapper/btrfs/zfs 'copy-block-on-write' means there isn't a huge penalty the first time you write to a file so you can get started quicker and the disk space reuse is better, but it may not be as fast as aufs or native access
  • vfs is astoundingly slow to start up, shares no disk space between layers...but should be as fast as raw filesystem access once created

  • My experience is mainly with aufs and devicemapper, so you should definitely (at minimum) benchmark the others before accepting my claims. It doesn't hurt to benchmark them all! You can also look at the table at the bottom of https://docs.docker.com/engine/userguide/storagedriver/selectadriver/ for what Docker Inc suggests are the important points about each driver.

    Unfortunately, there's more to consider than just performance. For example,
  • AUFS is considered a highly stable driver by Docker Inc, but for a particular unusual I/O heavy application I was using 6 months ago, I would weekly see kernel panics bringing down my system!
  • Devicemapper is currently the only commercially supported driver on centos/red hat
  • BTRFS/overlayfs/devicemapper (partition)/zfs can require special setup which may be nontrivial


  • Honestly, I look at heavy I/O in containers with great scepticism, just because I've lived with a lot of pain with unstable drivers. Of course, things are always improving! But volumes (and possibly volume drivers) feel like the most reliable approach to me for now.
    4 years ago

    Kent O. Johnson wrote:Ian and Aiden,

    Does your book cover performance tuning or considerations of Docker Containers? I see production concerns in chapters 11 and 12 and am pleased to see a chapter on security. I have used the "docker stats" command to watch performance at the CLI and Shipyard. I have also used memory config props in the docker-compose.yml files. Do you go in depth on production performance tuning?



    Hi Kent

    Technique 91 (chapter 11) contains information on cAdvisor, a tool we felt would get people started with container monitoring. A few other techniques in chapter 11 deal with resource control if you want to keep your containers under control, and technique 97 (chapter 12) talks about opting out of Docker containerisation if you want to harness the full power of your machine (the most common examples here being volumes, which are covered a fair amount throughout the book as well as here, and --net=host, to bypass whatever method Docker is currently using to proxy network traffic).

    Drilling down into the performance of Docker itself is tricky, because it's changing so rapidly. For example, I believe I recall a regression in some version after 1.3 where having more than 100 containers would bring Docker performance to a halt - obviously they fixed this very quickly! Note that the performance of the Docker daemon itself (usually!) doesn't affect individual containers so needing to tune the daemon is rare. For individual applications, it's likely best to use whatever tools are most appropriate for your application by injecting them in your container with nsenter.

    One thing that is missing from the book is a discussion on the different storage drivers and how they can impact the I/O performance of your application - a skeleton outline of this technique is actually sitting on my computer! However, it would end up being a look at the detail of some Docker internals which is perhaps not of huge interest to most readers - it's somewhat uncommon to do a lot of I/O in a container (a database being one of the few obvious example) and the pain that can be caused by the storage drivers is why I tend to recommend using volumes for your database 'dbspace' (or 'tablespace') to get reliable performance.

    So I suppose the short answer to your question is "no", mainly because the precise details of tuning depends on what you're doing in the container! Let me know if you have any specific questions and I'll see if I can answer directly or if there's anything relevant in the book.

    Aidan
    4 years ago

    Mauricio Mena wrote:I read the contents of the book and I found a very interesting section about the container orchestration about managing multiple docker containers, my question is if this section could provide the principles/basis in which the deployment of the docker containers works in the cloud, or is any other specific section that cover the cloud scenarios.

    Thanks



    Hi Mauricio

    Chapter 9 (which I believe is the one you're talking about) tries to give an overview of the available orchestration tools in the ecosystem. Having some method for orchestration is fairly crucial if you're planning to run multiple containers across multiple machines - doing it manually gets tiring very quickly! This isn't necessarily cloud specific (the tools would work equivalently well there on a bunch of spare desktop machines lying around at home), but putting your containers on a cloud provider is undeniably the most common use-case.

    We also cover a few miscellaneous bits relevant to the cloud elsewhere in the book - a technique in chapter 4 talks about using docker-machine to create a virtualbox VM containing Docker, and notes that you can follow a very similar process for creating machines on cloud providers with Docker on them. Chapter 11 and 12 are dedicated to talking about the realities of running Docker in production, and the techniques here apply wherever you decide to run Docker. The 'securing your docker api' technique in chapter 10 is worth a mention as well - it's a very bad idea to expose your Docker api port externally on the cloud without securing it with TLS!

    However, the process of actually starting your machines up, getting Docker on them, managing them and destroying them when done is left as an exercise to the reader (the official documentation has some examples for using docker-machine on DigitalOcean and AWS https://docs.docker.com/machine/ which may be a useful start). There's such a variety of cloud providers we tried not to get bogged down in the specifics - we could talk about using an Amazon-specific service (like DynamoDB) with Docker, but that wouldn't help you much if you moved to DigitalOcean! Instead the book aims to provide information that will apply regardless of what machines your containers are running on or how you create and destroy them.
    4 years ago

    John Wilson wrote:I see in the About the Book section a mention of deploying microservices, but do not see any mention of microservices in the table of contents. Does the book contain much discussion of microservices and the role and usage of docker when developing microservices? Or setting up containers to be used in a microservice architecture?



    Hi John

    (below I reference the TOC at https://manning.com/books/docker-in-practice - you may need to use the arrows at the end of chapter headings to see the contents of the chapters)

    First I'll mention some concrete points in the book relevant to microservices.

    We first talk about microservice containers in chapter 3, "3.1.3. TECHNIQUE n Splitting a system into microservice containers". The chapter is mostly about the value Docker can provide when used as a monolith, but this particular part gives some pointers on splitting up the Dockerfile of a web+app+db stack as a hint towards microservices. The technique concludes with a reference to the first technique of chapter 8, "8.1.1. TECHNIQUE n Simple Docker Compose Cluster". This and the subsequent technique introduce you to Docker compose, a tool for starting up a number of interacting containers. This is extremely useful for prototyping microservices on a single machine! You can easily play around with your containers and make sure it all works before trying to deploy them for real.

    A crucial component of microservices is orchestration and service discovery (I suspect these are what you're referring to when you say "setting up containers to be used in a microservice architecture"). Chapter 9 is completely dedicated to covering this and tries to do a survey of the possibilities available in the Docker ecosystem to give you the information you need to be able to make a decision about what to use. There's very little better than being able to try out a tool for yourself!

    Chapter 5 deserves a mention as well as it contains a fair amount of information on ways to build small images. This has particular relevance for microservices - if your images are >1GB each, it may take quite a while to deploy 30 of them!

    Hopefully this answers the practicalities of your second question, both from the point of view of building your images and for using these images as running containers.

    To touch on your first question - because so much of Docker is designed to be able to be used in a microservices context, microservices-related information tends to be woven throughout the book (for example, chapter 7 has "7.3.1. TECHNIQUE n Informing your containers with etcd", which can be very useful). It's presented in this way to try and give you the tools you need to be able to solve specific problems you face yourself. If your use-case is covered by the simple web+app+db example in chapter 3 then moving to a microservices architecture will be straightforward. However, more complex arrangements can require careful thought and a strong knowledge of the current system - general guidelines for developing complex microservices do exist (http://12factor.net/ being a well known example) but bear in mind they're not unbreakable rules! My personal feeling as that by understanding the problem and knowing what options are available to you, you're more likely to make good decisions for your particular situation.

    Let me know if anything is unclear or you'd like more detail.
    4 years ago

    paul nisset wrote:
    Hi,
    Is Docker useful for every deployment scenario or is better for certain situations?

    Thank you,
    Paul



    Hi Paul

    Docker shines in many situations...but is completely inappropriate in others!

    First, a few examples of where you wouldn't use Docker:
  • You want to distribute a Windows application for users to download and run - when Windows support is released and stable, it will only be for Windows server (for the foreseeable future)
  • You want to test device drivers or kernel modules for Linux - software requiring low level kernel integration or hardware access is possible to use in Docker (I believe!), but you lose a number of Docker benefits (e.g. container isolation is meaningless as all the changes are made to the host kernel) and you'll probably run into strange issues since this isn't an area with much testing.
  • You want to test your software on different kernel versions - Docker uses the kernel from the host, so this simply isn't possible.


  • For many other situations, Docker works fantastically well. Here are a few ways I've used (and continue to use) Docker:
  • As a local development environment - I install a bunch of software for whatever project I'm currently working on and save it as an image. This means I don't need to remember the way I installed a custom version of nodejs every time I want to upgrade, I just change my Dockerfile and rebuild.
  • As a lightweight environment to test applications in - from simulating the interaction between 20 different servers to seeing how your application runs on old Linux distributions, Docker can help.
  • Moving normal applications around on servers - this is the most typical use of Docker, to take applications with a number of dependencies, package them up in an image and run them on a server. The Docker ecosystem then gives a helping hand with building things on top of this, like zero-downtime upgrade for web applications and service discovery.


  • As with most things though, the real answer is "it depends". For example, consider the last point about deploying applications onto servers - if you have a single application running on a single server and already have a reliable and dead-simple method for deploying the latest version, Docker may increase complexity with little immediate benefit! But if down the line you're trying to move to another server and struggling because you forgot how you set up the server in the first place (which packages you installed, which config files you changed etc), that's probably a great opportunity to use Docker and end up with a repeatable build process to give you more flexibility in future.

    If you have a specific scenario in mind then I'd be happy to talk about it!
    4 years ago