WWW.DUMAIS.IO
ARTICLES
OVERLAY NETWORKS WITH MY SDN CONTROLLERSIMPLE LEARNING SWITCH WITH OPENFLOWINSTALLING KUBERNETES MANUALLYWRITING A HYPERVISOR WITH INTEL VT-X CREATING YOUR OWN LINUX CONTAINERSVIRTIO DRIVER IMPLEMENTATIONNETWORKING IN MY OSESP8266 BASED IRRIGATION CONTROLLERLED STRIP CONTROLLER USING ESP8266.OPENVSWITCH ON SLACKWARESHA256 ASSEMBLY IMPLEMENTATIONPROCESS CONTEXT ID AND THE TLBTHREAD MANAGEMENT IN MY HOBBY OSENABLING MULTI-PROCESSORS IN MY HOBBY OSNEW HOME AUTOMATION SYSTEMINSTALLING AND USING DOCKER ON SLACKWARESYSTEM ON A CHIP EMULATORUSING JSSIP AND ASTERISK TO MAKE A WEBPHONEC++ WEBSOCKET SERVERSIP ATTACK BANNINGBLOCK CACHING AND WRITEBACKBEAGLEBONE BLACK BARE METAL DEVELOPEMENTARM BARE METAL DEVELOPMENTUSING EPOLLMEMORY PAGINGIMPLEMENTING HTTP DIGEST AUTHENTICATIONSTACK FRAME AND THE RED ZONE (X86_64)AVX/SSE AND CONTEXT SWITCHINGHOW TO ANSWER A QUESTION THE SMART WAY.REALTEK 8139 NETWORK CARD DRIVERREST INTERFACE ENGINECISCO 1760 AS AN FXS GATEWAYHOME AUTOMATION SYSTEMEZFLORA IRRIGATION SYSTEMSUMP PUMP MONITORINGBUILDING A HOSTED MAILSERVER SERVICEI AM NOW HOSTING MY OWN DNS AND MAIL SERVERS ON AMAZON EC2DEPLOYING A LAYER3 SWITCH ON MY NETWORKACD SERVER WITH RESIPROCATEC++ JSON LIBRARYIMPLEMENTING YOUR OWN MUTEX WITH CMPXCHGWAKEUPCALL SERVER USING RESIPROCATEFFT ON AMD64CLONING A HARD DRIVECONFIGURING AND USING KVM-QEMUUSING COUCHDBINSTALLING COUCHDB ON SLACKWARENGW100 MY OS AND EDXS/LSENGW100 - MY OSASTERISK FILTER APPLICATIONCISCO ROUTER CONFIGURATIONAASTRA 411 XML APPLICATIONSPA941 PHONEBOOKSPEEDTOUCH 780 DOCUMENTATIONAASTRA CONTACT LIST XML APPLICATIONAVR32 OS FOR NGW100ASTERISK SOUND INJECTION APPLICATIONNGW100 - DIFFERENT PROBLEMS AND SOLUTIONSAASTRA PRIME RATE XML APPLICATIONSPEEDTOUCH 780 CONFIGURATIONUSING COUCHDB WITH PHPAVR32 ASSEMBLY TIPAP7000 AND NGW100 ARCHITECTUREAASTRA WEATHER XML APPLICATIONNGW100 - GETTING STARTEDAASTRA ALI XML APPLICATION

CREATING YOUR OWN LINUX CONTAINERS

2016-07-10

I'm not trying to say that this solution is better than LXC or docker. I'm just doing this because it is very simple to get a basic container created with chroot and cgroups. Of course, docker provides much more features than this, but this really is the basis. It's easy to make containers in linux, depending on the amount of features you need.

cgroups

cgroups are a way to run a process while limiting its resources such as cpu time and memory. The it works is by creating a group (a cgroup), define the limits for various resources (you can control more than just memory and CPU time) and then run a process under that cgroup. It is important to know that every child process of that process will also run under the same same cgroup. So if you start a bash shell under a cgroup, then any program that you manually launch from the shell will be constrained to the same cgroup than the one of the shell.

If you create a cgroup with a memory limit of 10mb, this 10mb limit will be applicable to all processes running in the cgroup. The 10mb is a constraint on the sum of memory usage by all process under the same cgroup.

Configuring

On slackware 14.2 RC2, I didn't have to install or setup anything and cgroups were available to use with the tools already installed. I had to manually enable cgroups in the kernel though since I compiled my own kernel. Without going into the details (because this is covered in a million other websites) you need to make sure that:

  • cgroup kernel modules are built-in are loaded as modules
  • cgroup tools are installed
  • cgroup filesystem is mounted (normally accessible through /sys/fs/cgroup/)

Here's how to run a process in a cgroup

cgcreate -g memory:testgroup # now the "/sys/fs/cgroup/memory/testgroup" exists and contains files that control the limits of the group # assign a limit of 4mb to that cgroup echo "4194304" > /sys/fs/cgroup/memory/testgroup/memory.limit_in_bytes # run bash in that cgroup cgexec -g memory:testgroup /bin/bash

Note that instead of using cgexec, you could also just write the current shell's PID into /sys/fs/cgroup///task. Then your shell, and whatever process you start from it, would execute in the cgroup.

Making your own containers

A container is nothing more than a chroot'd environment with processes confined in a cgroup. It's not difficult to write your own software to automate the environment setup. There is a "chroot" system call that already exist. For cgroups, I was wondering if there was any system calls available to create them. Using strace while running cgcreate, I found out that cgcreate only manipulates files in the cgroup file system. Then I got the rest of the information I needed from the documentation file located in the Documentation folder of the linux kernel source: Documentation/cgroups/cgroups.txt.

Creating a cgroup

To create a new cgroup, it is simply a matter of creating a new directory under the submodule folder that the cgroups needs to control. For example, to create a cgroup that controls memory and cpu usage, you just need to create a directory "AwesomeControlGroup" under /sys/fs/cgroup/memory and /sys/fs/cgroup/cpu. These directories will automatically be populated with the files needed to control the cgroup (cgroup fs is a vitual filesystem, so files do not exist on a physical medium).

Configuring a cgroup

To configure a cgroup, it is just a matter of writing parameters in the relevant file. For example: /sys/fs/cgroup/memory/testgroup/memory.limit_in_bytes

Running a process in a cgroup

To run a process in a cgroup, you need to launch the process, take its PID and write it under /sys/fs/cgroup///task My "container creator" application (let's call it the launcher) does it like this:

  • The launcher creates a cgroup and sets relevant parameters.
  • The launcher clones (instead of fork. now we have a parent and a child)
  • The parent waits for the child to die, after which it will destroy the cgroup.
  • The child writes its PID in the /sys/fs/cgroup///task file for all submodules (memory, cpu, etc)
  • At this point, the child runs as per the cgroup's constraints.
  • The child invokes execv with the application that the user wanted to have invoked in the container.

The reason I use clone() instead of fork, is that clone() can use the CLONE_NEWPID flag. This will create a new process namespace that will isolate the new process for the others that exist on the system. Indeed, when the cloned process queries its PID it will find that it is assigned PID 1. Doing a "ps" would not list other processes that run on the system since this new process is isolated.

Destroying a cgroup

To destroy a cgroup, just delete the /sys/fs/cgroup// directory

So interfacing with cgroups from userland is just a matter of manipulating files in the cgroup file system. It is really easy to do programmatically and also from the shell without any special tools or library.

My container application

My "container launcher" is a c++ application that chroots in a directory and run a process under a cgroup that it creates. To use it, I only need to type "./a.out container_path". The container_path is the path to a container directory that contains a "settings" files and a "chroot" directory. The "chroot" directory contains the environment of the container (a linux distribution maybe?) and the "settings" file contains settings about the cgroup configuration and the name of the process to be launched.

You can download my code: cgroups.tar

Example

I've extracted the slackware initrd image found in the "isolinux" folder of the dvd.

cd /tmp/slackware/chroot gunzip < /usr/src/slackware64-current/isolinux/initrd.img | cpio -i --make-directories

Extracting this in /tmp/slackware/chroot gives me a small linux environment that directory and I've created a settings file in /tmp/slackware. Call this folder a "container", it contains a whole linux environment under the chroot folder and a settings file to indicate under what user the container should run, what process it should start, how much ram max it can get etc. For this example, my settings file is like this:

user: 99 group: 98 memlimit: 4194304 cpupercent: 5 process:/bin/bash arg1:-i

And running the container gives me:

[12:49:34 root@pat]# ./a.out /tmp/slackware Mem limit: 4194304 bytes CPU shares: 51 (5%) Added PID 9337 in cgroup Dropping privileges to 99:98 Starting /bin/bash -i [17:26:10 nobody@pat:/]$ ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND nobody 1 0.0 0.0 12004 3180 ? S 17:26 0:00 /bin/bash -i nobody 2 0.0 0.0 11340 1840 ? R+ 17:26 0:00 ps aux [17:26:14 nobody@pat:/]$ ls a a.out bin boot cdrom dev etc floppy init lib lib64 lost+found mnt nfs proc root run sbin scripts sys tag tmp usr var [17:26:15 nobody@pat:/]$ exit exit Exiting container

Networking

When cloning with CLONE_NEWNET, the child process gets a separate netwrok namespace. It doesn't see the host's network interfaces anymore. So in order to get networking enabled in the container, we need to create a veth pair. I am doing all network interface manipulations with libnl (which was already installed on a stock slackware installation). The veth pair will act as a kind of tunnel between the host and the container. The host side interface will be added in a bridge so that it can be part of another lan. The container side interface will be assigned an IP and then the container will be able to communicate wil all peers that are on the same bridge. The bridge could be used to connect the container on the LAN or within a local network that consists of only other containers from a select group.

The launcher creates a "eth0" that appears in the container. The other end of the veth pair is added in a OVS bridge. An ip address is set on eth0 and the default route is set. I then have full networking functionality in my container.

Settings for networking

bridge: br0 ip: 192.168.1.228/24 gw: 192.168.1.1

Result

Mem limit: 4194304 bytes CPU shares: 51 (5%) Added PID 14230 in cgroup Dropping privileges to 0:0 Starting /bin/bash -i [21:59:33 root@container:/]# ping google.com PING google.com (172.217.2.142): 56 data bytes 64 bytes from 172.217.2.142: seq=0 ttl=52 time=22.238 ms ^C --- google.com ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 22.238/22.238/22.238 ms [21:59:36 root@container:/]# exit exit Exiting container