Mounting your home directory in LXD

As of LXD stable 2.0.8 and feature release 2.6, LXD has support for various UID and GID map related manipulaions. A common question is: "How do I bind-mount my home directory into a container?" and before the answer was "well, it's complicated but you can do it; it's slightly less complicated if you do it in privleged containers". However, with this feature, now you can do it very easily in unprivileged containers.

First, find out your uid on the host:

$ id
uid=1000(tycho) gid=1000(tycho) groups=1000(tycho),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),112(lpadmin),124(sambashare),129(libvirtd),149(lxd),150(sbuild)

On standard Ubuntu hosts, the uid of the first user is 1000. Now, we need to allow LXD to remap to remap this id; you'll need an additional entry for root to do this:

$ echo 'root:1000:1' | sudo tee -a /etc/subuid /etc/subgid

Now, create a container, and set the idmap up to map both uid and gid 1000 to uid and gid 1000 inside the container.

$ lxc init ubuntu-daily:z zesty
Creating zesty
$ lxc config set zesty raw.idmap 'both 1000 1000'

Finally, set up your home directory to be mounted in the container:

$ lxc config device add zesty homedir disk source=/home/tycho path=/home/ubuntu

And leave an insightful message for users of the container:

$ echo 'meshuggah rocks' >> message

Finally, start your container and read the message:

$ lxc start zesty
$ lxc exec zesty cat /home/ubuntu/message
meshuggah rocks

And enjoy the insighed offered to you by your home directory :)

LXD networking: lxdbr0 explained

Recently, LXD stopped depending on lxc, and thus moved to using its own bridge, called lxdbr0. lxdbr0 behaves significantly differently than lxcbr0: it is ipv6 link local only by default (i.e. there is no ipv4 or ipv6 subnet configured by default), and only HTTP traffic is proxied over the network. This means that e.g. you can't SSH to your LXD containers with the default configuration of lxdbr0.

The motivation for this change mostly to avoid picking subnets for users, because this can cause breakage, and have users pick their own subnets. Previously, the script that set up lxcbr0 looked around on the host's network, and picked the first 10.0.*.1 address for the bridge that was available. Of course, in some cases (e.g. networks which weren't visible at the time of bridge creation) this can break routing for users' networks.

So, if you want to have parity with lxcbr0, you'll need to configure the bridge yourself. There are a few ways to do this. For a step by step walkthrough of just configuring the bridge, simply do:

sudo dpkg-reconfigure -p medium lxd

And answer the questions however you like. If you've never configured LXD at all (and e.g. want to use a fancy filesystem like ZFS), try:

sudo lxd init

Which will configure all of LXD (both the filesystem and lxdbr0). Finally, you can edit the file /etc/default/lxd-bridge and then do a:

sudo service lxd-bridge stop && sudo service lxd restart

For feature parity with lxcbr0, you can use something like the following (note the 10.0.4.*, so as not to conflict with lxcbr0):

# Whether to setup a new bridge or use an existing one
USE_LXD_BRIDGE="true"

# Bridge name
# This is still used even if USE_LXD_BRIDGE is set to false
# set to an empty value to fully disable
LXD_BRIDGE="lxdbr0"

# Path to an extra dnsmasq configuration file
LXD_CONFILE=""

# DNS domain for the bridge
LXD_DOMAIN="lxd"

# IPv4
## IPv4 address (e.g. 10.0.4.1)
LXD_IPV4_ADDR="10.0.4.1"

## IPv4 netmask (e.g. 255.255.255.0)
LXD_IPV4_NETMASK="255.255.255.0"

## IPv4 network (e.g. 10.0.4.0/24)
LXD_IPV4_NETWORK="10.0.4.1/24"

## IPv4 DHCP range (e.g. 10.0.4.2,10.0.4.254)
LXD_IPV4_DHCP_RANGE="10.0.4.2,10.0.4.254"

## IPv4 DHCP number of hosts (e.g. 250)
LXD_IPV4_DHCP_MAX="253"

## NAT IPv4 traffic
LXD_IPV4_NAT="true"

# IPv6
## IPv6 address (e.g. 2001:470:b368:4242::1)
LXD_IPV6_ADDR=""

## IPv6 CIDR mask (e.g. 64)
LXD_IPV6_MASK=""

## IPv6 network (e.g. 2001:470:b368:4242::/64)
LXD_IPV6_NETWORK=""

## NAT IPv6 traffic
LXD_IPV6_NAT="false"

# Run a minimal HTTP PROXY server
LXD_IPV6_PROXY="false"

And that's it! That's all you need to do to configure lxdbr0.

Sometimes, though, you don't really want your containers to live on a separate network than the host because you want to ssh to them directly or something. There are a few ways to accomplish this, the simplest is with macvlan:

lxc profile device set default eth0 parent eth0
lxc profile device set default eth0 nictype macvlan

Another way to do this is by adding another bridge which is bridged onto your main NIC. You'll need to edit your /etc/network/interfaces.d/eth0.cfg to look like this:

# The primary network interface
auto eth0
iface eth0 inet manual # note the manual here

And then add a bridge by creating /etc/network/interfaces.d/containerbr.cfg with the contents:

auto containerbr
iface containerbr inet dhcp
  bridge_ports eth0

Finally, you'll need to change the default lxd profile to use your new bridge:

lxc profile device set default eth0 parent containerbr

Restart the networking service (which if you do it over ssh, may boot you :), and away you go. If you want some of your containers to be on one bridge, and some on the other, you can use different profiles to accomplish this.

linux.conf.au 2016 talk

Last week I did this ridiculous thing where I flew around the world in the easterly direction, giving talks at FOSDEM and linux.conf.au. The linux.conf.au staff always do a great job of making talk videos, and this year was no exception.

My talk was on LXD and live migration, a brief history of both as well as a status update and some discussion of future work on both. There were also lots of questions in this talk, so there's a lot of discussion of basic migration questions and inner workings.

Unforatunately, I can't embed it here, so I'll give you a link instead. Also, keep in mind at the time I was giving this talk I had been up for ~40 hours, so I forgot some English words here and there :)

https://www.youtube.com/watch?v=ol85OJxDaHc

Using the LXD API from Python

After our recent splash at ODS in Vancouver, it seems that there is a lot of interest in writing some python code to drive LXD to do various things. The first option is to use pylxd, a project maintained by a friend of mine at Canonical named Chuck Short. However, the primary client of this is OpenStack, and thus it is python2. We also don't want to add a lot of dependencies in this module, so we're using raw python urllib and friends, which as you know can sometimes be...painful :)

Another option would be to use python's awesome requests module, which is considerably more user friendly. However, since LXD uses client certificates, it can be a bit challenging to get the basic bits going. Here's a small program that just does some GETs to the API, to see how it might work:

import os.path

import requests

conf_dir = os.path.expanduser('~/.config/lxc')
crt = os.path.join(conf_dir, 'client.crt')
key = os.path.join(conf_dir, 'client.key')

print(requests.get('https://127.0.0.1:8443/1.0', verify=False, cert=(crt, key)).text)

which gives me (piped through jq for sanity):

$ python3 lxd.py | jq .
{
  "type": "sync",
  "status": "Success",
  "status_code": 200,
  "metadata": {
    "api_compat": 1,
    "auth": "trusted",
    "config": {
      "trust-password": true
    },
    "environment": {
      "backing_fs": "ext4",
      "driver": "lxc",
      "kernel_version": "3.19.0-15-generic",
      "lxc_version": "1.1.2",
      "lxd_version": "0.9"
    }
  }
}

It just piggy backs on the lxc client generated certificates for now, but it would be great to have some python code that could generate those as well!

Another bit I should point out for people is lxd's --debug flag, which prints out every request it receives and response that it sends. I found this useful while developing the default lxc client, and it will probably be useful to those of you out there who are developing your own clients.

Happy hacking!

Live Migration in LXD

There has been a lot of interest on the various mailing lists as well as internally at Canonical about the state of migration in LXD, so I thought I'd write a bit about the current state of affairs.

Migration in LXD today passes the "Doom demo" test, i.e. it works well enough to reproduce the LXD announcement demo under certain conditions, which I'll cover below. There is still a lot of ongoing work to make CRIU (the underlying migration technology) work with all these configurations, so support will eventually arrive for everything. For now, though, you'll need to use the configuration I describe below.

First, I should note that things currently won't work on a systemd host. Since systemd re-mounts the rootfs as MS_SHARED, lots of things automatically become shared mounts, which confuses CRIU. There are several mailing list threads about ongoing work with respect to shared mounts in CRIU and I expect something to be merged that will resolve the situation shortly, but for now your host machine needs to be a non-systemd host (i.e. trusty or utopic will work just fine, but not vivid).

You'll need to install the daily versions of liblxc and lxd from their respective PPAs on each host:

sudo apt-add-repository -y ppa:ubuntu-lxc/daily
sudo apt-add-repository -y ppa:ubuntu-lxc/lxd-git-master
sudo apt-get update
sudo apt-get install lxd

Also, you'll need to uninstall lxcfs on both hosts:

sudo apt-get remove lxcfs

liblxc currently doesn't support migrating the mount configuration that lxcfs uses, although there is some work on that as well. The overmounting issue has been fixed in lxcfs, so I expect to land some patches in liblxc soon that will make lxcfs work.

Next, you'll want to set a password for your new lxd instance:

lxc config set password foo

You need some images in lxd, which can be acquired easily enough by lxd-images (of course, this only needs to be done on the source host of the migration):

lxd-images import lxc ubuntu trusty amd64 --alias ubuntu

You'll also need to set a few configuration items in lxd. First, the container needs to be privileged, although there is yet more ongoing work to remove this restriction. There are also a few things that CRIU does not support, so we need to set our container config to respect those as well. You can do all of this using lxd's profiles mechanism, that is:

lxc config profile create migratable
lxc config profile edit migratable

And paste the following content in instead of what's there:

name: migratable
config:
  raw.lxc: |
    lxc.console = none
    lxc.cgroup.devices.deny = c 5:1 rwm
    lxc.start.auto =
    lxc.start.auto = proc:mixed sys:mixed
  security.privileged: "true"
devices:
  eth0:
    nictype: bridged
    parent: lxcbr0
    type: nic

Finally, launch your contianer:

lxc launch ubuntu migratee -p migratable

Finally, add both of your LXDs as non unix-socket remotes (required for now, but not forever):

lxc remote add lxd thishost:8443   # don't use localhost here
lxc remote add lxd2 otherhost:8443 # use a publicly addressable name

Profiles used by a particular container need to be present on both the source of the migration and the sink, so we should copy the profile to the sink as well:

lxc config profile copy migratable lxd2:

And now, you're ready for the magic!

lxc start migratee
lxc move lxd:migratee lxd2:migratee

With luck, you'll have migrated the container to lxd2. Of course, things don't always go right the first time. The full log file for the migration attempts should be available in /var/log/lxd/migratee/migration_{dump|restore}_<timestamp>.log, on the respective host where the dump or restore took place. If you aren't successful in migrating things (or parsing the dump/restore log), feel free to mail lxc-users, and I can help you debug what went wrong.

Happy hacking!

setproctitle() in Linux

While working on LXD, one of the things I occasionally do is submit patches to LXC (e.g. the migration work or other things). In particular, the name of the LXC monitor process (the process that's the parent of init) is fork()ed in the C API call, so whatever the name of the binary that ran the API call (in our case, LXD) is the name of the parent. This could be slightly confusing (especially in the case where LXD dies but a process that looks like it is named LXD lives on). Should be easy enough to fix, right? Lots of *nixes seem to have a setproctitle() function to correct this, so we'll just call that!

And lo, there is prctl() which has a PR_SET_NAME mode that we can use. Done! Except from one small caveat from the man page:

The name can be up to 16 bytes long, and should be null-terminated if it contains fewer bytes.

Yes, you read that, 16 bytes; not useful for a lot of process names, especially something which would be ideal for LXC:

[lxc monitor] /var/lib/lxc container-name

Ok, so how hard can it be to write our own? If you look around on the internet, a lot of people suggest something like strcpy(argv[0], "my-proc-name"). That works, but what happens if your process name is longer than the original? You smash the stack! Try cat /proc/<pid>/environ on the program below:

#include <string.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
    char buf[1024];
    memset(buf, '0', sizeof(buf));
    buf[1023] = 0;
    strncpy(argv[0], buf, sizeof(buf));
    sleep(10000);
    return 0;
}

If your process name is longer than the original environment, you overwrite something else potentially more useful, which could cause all sorts of nastiness, especially as something that runs as root.

The thing is, the environment isn't necessarily all that useful; it doesn't indicate the current environment, just the initial environment. So we could use that space for the process name, as long as the kernel knew the environment wasn't valid any more. prctl() to the rescue again, we can pass it PR_SET_MM and PR_SET_MM_ENV_{START|END} to update these locations.

Problem solved! Except that we want to do this from liblxc.so, which has no concept of argv. prctl() has no PR_GET_MM calls, so we can't just go the other way with it. We could invent some ugly API where you have to pass it in, but that would require users to either set their argv pointers up front, or carry it around until they needed it, or something similarly ugly. Instead, we steal an idea from the CRIU codebase: we look in /proc/<pid>/stat. This file has (in columns 48-51, if your kernel is new enough) exactly the arguments you want from PR_GET_MM_*! Thus, we can use this file to find out inside of liblxc where is safe to put the new proctitle.

Putting it all together, liblxc now has an implementation of setproctitle() that will overwrite your initial environment (but is careful not to overwrite anything else), which can be used to set process titles longer than 16 bytes. Enjoy!

lxd and Doom migration demo

Last week at the Openstack Developer Summit I gave a live demo of migrating a linux container running doom, which generated quite a lot of excitement! Several people asked me for steps on reproducing the demo, which I have just posted.

I am one of Canonical's developers working on lxd, and I will be focused on bringing migration and other features into it. I'm very excited about the opportunity to work on this project! Stay tuned!

Live Migration of Linux Containers

Recently, I've been playing around with checkpoint and restore of Linux containers. One of the obvious applications is checkpointing on one host and restoring on another (i.e. live migration). Live migration has all sorts of interesting applications, so it is nice to know that at least a proof of concept of it works today.

Anyway, onto the interesting bits! The first thing I did was create two vms, and install criu's and lxc's development versions on both hosts:

sudo add-apt-repository ppa:ubuntu-lxc/daily
sudo apt-get update
sudo apt-get install lxc

sudo apt-get install build-essential protobuf-c-compiler
git clone https://github.com/xemul/criu && cd criu && sudo make install

Then, I created a container:

sudo lxc-create -t ubuntu -n u1 -- -r trusty -a amd64

Since the work on container checkpoint/restore is so young, not all container configurations are supported. In particular, I had to add the following to my config:

cat << EOF | sudo tee -a /var/lib/lxc/u1/config
# hax for criu
lxc.console = none
lxc.tty = 0
lxc.cgroup.devices.deny = c 5:1 rwm
EOF

Finally, although the lxc-checkpoint tool allows us to checkpoint and restore containers, there is no support for migration directly today. There are several tools in the works for this, but for now we can just use a cheesy shell script:

cat > migrate <<EOF
#!/bin/sh
set -e

usage() {
  echo $0 container user@host.to.migrate.to
  exit 1
}

if [ "$(id -u)" != "0" ]; then
  echo "ERROR: Must run as root."
  usage
fi

if [ "$#" != "2" ]; then
  echo "Bad number of args."
  usage
fi

name=$1
host=$2

checkpoint_dir=/tmp/checkpoint

do_rsync() {
  rsync -aAXHltzh --progress --numeric-ids --devices --rsync-path="sudo rsync" $1 $host:$1
}

# we assume the same lxcpath on both hosts, that is bad.
LXCPATH=$(lxc-config lxc.lxcpath)

lxc-checkpoint -n $name -D $checkpoint_dir -s -v

do_rsync $LXCPATH/$name/
do_rsync $checkpoint_dir/

ssh $host "sudo lxc-checkpoint -r -n $name -D $checkpoint_dir -v"
ssh $host "sudo lxc-wait -n u1 -s RUNNING"
EOF
chmod +x migrate

Now, for the magic show! I've set up the container I created above to be a web server running micro-httpd that serves an incredibly important message:

$ ssh ubuntu@$(sudo lxc-info -n u1 -H -i)
ubuntu@u1:~$ sudo apt-get install micro-httpd
ubuntu@u1:~$ echo "Meshuggah is the best metal band." | sudo tee /var/www/index.html
ubuntu@u1:~$ exit
$ curl -s $(sudo lxc-info -n u1 -H -i)
Meshuggah is the best metal band.

Let's migrate!

$ sudo ./migrate u1 ubuntu@criu2.local
  # lots of rsync output...
$ ssh ubuntu@criu2.local 'curl -s $(sudo lxc-info -n u1 -H -i)'
Meshuggah is the best metal band.

Of course, there are several caveats to this. You've got to add the lines above to your config, which means you can't dump containers with ttys. Since containers have the hosts's fusectl bind mounted and fuse mounts aren't supported by criu, containers or hosts using fuse can't be dumped. You can't migrate unprivileged containers yet. There are probably others that I'm forgetting, though list of troubleshoting steps is available at criu.org/LXC#Troubleshooting.

There is ongoing work in both CRIU and LXC to get rid of all the caveats above, so stay tuned!

Qtile's crazy 0.9.0 changes have landed

We have re-written a lot of the underlying code that powers qtile, in order to support python2/3, pypi, as well as getting rid of several memory leaks. This work is now done and on the development branch, see the mailing list announcement for more info.

Qtile 0.8.0 tagged!

I've just tagged version 0.8.0 of Qtile! See the changelog for full release details, and the release announcement for other detials. This release of Qtile also comes with a sleek new website, courtesy of Derek Payton.