setproctitle() in Linux

While working on LXD, one of the things I occasionally do is submit patches to LXC (e.g. the migration work or other things). In particular, the name of the LXC monitor process (the process that's the parent of init) is fork()ed in the C API call, so whatever the name of the binary that ran the API call (in our case, LXD) is the name of the parent. This could be slightly confusing (especially in the case where LXD dies but a process that looks like it is named LXD lives on). Should be easy enough to fix, right? Lots of *nixes seem to have a setproctitle() function to correct this, so we'll just call that!

And lo, there is prctl() which has a PR_SET_NAME mode that we can use. Done! Except from one small caveat from the man page:

The name can be up to 16 bytes long, and should be null-terminated if it contains fewer bytes.

Yes, you read that, 16 bytes; not useful for a lot of process names, especially something which would be ideal for LXC:

[lxc monitor] /var/lib/lxc container-name

Ok, so how hard can it be to write our own? If you look around on the internet, a lot of people suggest something like strcpy(argv[0], "my-proc-name"). That works, but what happens if your process name is longer than the original? You smash the stack! Try cat /proc/<pid>/environ on the program below:

#include <string.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
    char buf[1024];
    memset(buf, '0', sizeof(buf));
    buf[1023] = 0;
    strncpy(argv[0], buf, sizeof(buf));
    sleep(10000);
    return 0;
}

If your process name is longer than the original environment, you overwrite something else potentially more useful, which could cause all sorts of nastiness, especially as something that runs as root.

The thing is, the environment isn't necessarily all that useful; it doesn't indicate the current environment, just the initial environment. So we could use that space for the process name, as long as the kernel knew the environment wasn't valid any more. prctl() to the rescue again, we can pass it PR_SET_MM and PR_SET_MM_ENV_{START|END} to update these locations.

Problem solved! Except that we want to do this from liblxc.so, which has no concept of argv. prctl() has no PR_GET_MM calls, so we can't just go the other way with it. We could invent some ugly API where you have to pass it in, but that would require users to either set their argv pointers up front, or carry it around until they needed it, or something similarly ugly. Instead, we steal an idea from the CRIU codebase: we look in /proc/<pid>/stat. This file has (in columns 48-51, if your kernel is new enough) exactly the arguments you want from PR_GET_MM_*! Thus, we can use this file to find out inside of liblxc where is safe to put the new proctitle.

Putting it all together, liblxc now has an implementation of setproctitle() that will overwrite your initial environment (but is careful not to overwrite anything else), which can be used to set process titles longer than 16 bytes. Enjoy!

lxd and Doom migration demo

Last week at the Openstack Developer Summit I gave a live demo of migrating a linux container running doom, which generated quite a lot of excitement! Several people asked me for steps on reproducing the demo, which I have just posted.

I am one of Canonical's developers working on lxd, and I will be focused on bringing migration and other features into it. I'm very excited about the opportunity to work on this project! Stay tuned!

Live Migration of Linux Containers

Recently, I've been playing around with checkpoint and restore of Linux containers. One of the obvious applications is checkpointing on one host and restoring on another (i.e. live migration). Live migration has all sorts of interesting applications, so it is nice to know that at least a proof of concept of it works today.

Anyway, onto the interesting bits! The first thing I did was create two vms, and install criu's and lxc's development versions on both hosts:

sudo add-apt-repository ppa:ubuntu-lxc/daily
sudo apt-get update
sudo apt-get install lxc

sudo apt-get install build-essential protobuf-c-compiler
git clone https://github.com/xemul/criu && cd criu && sudo make install

Then, I created a container:

sudo lxc-create -t ubuntu -n u1 -- -r trusty -a amd64

Since the work on container checkpoint/restore is so young, not all container configurations are supported. In particular, I had to add the following to my config:

cat << EOF | sudo tee -a /var/lib/lxc/u1/config
# hax for criu
lxc.console = none
lxc.tty = 0
lxc.cgroup.devices.deny = c 5:1 rwm
EOF

Finally, although the lxc-checkpoint tool allows us to checkpoint and restore containers, there is no support for migration directly today. There are several tools in the works for this, but for now we can just use a cheesy shell script:

cat > migrate <<EOF
#!/bin/sh
set -e

usage() {
  echo $0 container user@host.to.migrate.to
  exit 1
}

if [ "$(id -u)" != "0" ]; then
  echo "ERROR: Must run as root."
  usage
fi

if [ "$#" != "2" ]; then
  echo "Bad number of args."
  usage
fi

name=$1
host=$2

checkpoint_dir=/tmp/checkpoint

do_rsync() {
  rsync -aAXHltzh --progress --numeric-ids --devices --rsync-path="sudo rsync" $1 $host:$1
}

# we assume the same lxcpath on both hosts, that is bad.
LXCPATH=$(lxc-config lxc.lxcpath)

lxc-checkpoint -n $name -D $checkpoint_dir -s -v

do_rsync $LXCPATH/$name/
do_rsync $checkpoint_dir/

ssh $host "sudo lxc-checkpoint -r -n $name -D $checkpoint_dir -v"
ssh $host "sudo lxc-wait -n u1 -s RUNNING"
EOF
chmod +x migrate

Now, for the magic show! I've set up the container I created above to be a web server running micro-httpd that serves an incredibly important message:

$ ssh ubuntu@$(sudo lxc-info -n u1 -H -i)
ubuntu@u1:~$ sudo apt-get install micro-httpd
ubuntu@u1:~$ echo "Meshuggah is the best metal band." | sudo tee /var/www/index.html
ubuntu@u1:~$ exit
$ curl -s $(sudo lxc-info -n u1 -H -i)
Meshuggah is the best metal band.

Let's migrate!

$ sudo ./migrate u1 ubuntu@criu2.local
  # lots of rsync output...
$ ssh ubuntu@criu2.local 'curl -s $(sudo lxc-info -n u1 -H -i)'
Meshuggah is the best metal band.

Of course, there are several caveats to this. You've got to add the lines above to your config, which means you can't dump containers with ttys. Since containers have the hosts's fusectl bind mounted and fuse mounts aren't supported by criu, containers or hosts using fuse can't be dumped. You can't migrate unprivileged containers yet. There are probably others that I'm forgetting, though list of troubleshoting steps is available at criu.org/LXC#Troubleshooting.

There is ongoing work in both CRIU and LXC to get rid of all the caveats above, so stay tuned!

Qtile's crazy 0.9.0 changes have landed

We have re-written a lot of the underlying code that powers qtile, in order to support python2/3, pypi, as well as getting rid of several memory leaks. This work is now done and on the development branch, see the mailing list announcement for more info.

Qtile 0.8.0 tagged!

I've just tagged version 0.8.0 of Qtile! See the changelog for full release details, and the release announcement for other detials. This release of Qtile also comes with a sleek new website, courtesy of Derek Payton.

xcffib 0.1.0 released!

I'm excited to announce today that I've tagged the first release of xcffib, v0.1.0. The testing of xcffib with qtile has been mostly successful, and I'm comfortable now tagging a release. Special thanks to Sean Vig who has done a lot of work on the python3 side of xcffib. Happy hacking!

CFFI-based Qtile!

For the past while I've been working on a reimplementation of xpyb in cffi. There are several reasons to want to do this:

  1. xpyb has at least one more memory leak (but probably others)
  2. The xpyb upstream is inactive, and there is no sign of a python 3 port
  3. It would be uber cool to be able to run qtile in pypy.

Using cffi soves 2 and 3 pretty easily, and I've made sure that xcffib's test suite runs through valgrind with no definite leaks, hopefully mitigating 1. However, even if we have xcffib, there is still a lot of work that needed to happen to make qtile run on top of it. I'm writing this post to announce that some of that work is done, and late last night I was able to boot qtile running on top of xffib! There are still lots of bugs, and lots of testing needs to be done, but we're most of the way there I think, and running qtile on python3 and pypy without memory leaks is no longer a pipe dream :-).

To install, you'll need:

  1. sudo apt-get install xcb-proto libpango1.0-dev libcairo2-dev (or whatever the equivalent packages are on your distro)
  2. Follow the installation instructions for xcffib.
  3. Install the xcffib branch of tych0/cairocffi
  4. Install the cffi branch of tych0/qtile

I have not tried to test qtile on python 3 yet, so there may be some work that needs to be done to successfully run things on python 3. However, both xcffib and cairocffi run their test suites on python 3, and so the only work that needs to be done is probably in qtile, if any. pypy is another story however, as xcffib does not currently pass its test suite on pypy. I plan to fix that at some point, but I'd like to get qtile running completely before that happens.

Finally, there are some bugs that manifest with qtile right now:

  1. The systray doesn't work. This is probably due to a bug in the way xcffib unpacks ClientMessage events.
  2. Most of the text-based widgets don't work. This is probably due to a bug in the pangocffi binding I wrote for qtile. I thing it is just an incompleteness, and I will try and fix it either today or tomorrow. Basic text widgets like the clock or the volume widget work just fine.
  3. Lots of other things are probably broken :-). Bug reports welcome.

Happy hacking!

Ubuntu 14.04 Trusty Tahr packages for Qtile

Just posting to let everyone know that I've published packages for the latest Qtile release on Ubuntu 14.10. See the mailing list announcement for more details. Additionally, we will be doing a 0.7 release shortly, so please let me know if there are any release blocking bugs!

Ubuntu 13.10 Saucy Salamander packages for Qtile

Just posting to let everyone know that I've published packages for the latest Qtile release on Ubuntu 13.10. See the mailing list announcement for more details.

Manage passwords without state

A few years ago I had a problem: I had a bunch of accounts that I accessed once a year when tax time came around, and I kept forgetting the passwords. Often I'd try a few before locking myself out, and then I'd have to spend an hour on the phone with customer service getting my account unlocked, which meant if I was doing my taxes on the last weekend possible, I wouldn't be able to complete them until the next business day. The obvious solution to this problem is to store the passwords in some kind of password manager -- lots of them exist for all kinds of platforms: phone, computer, browser, etc.

The problem with password manager is that they typically require some kind of state file. They store the mapping between site and cleartext password in some file, and then they decrypt it with some secret from you when you want to access it. Thus, you have to 1. trust the person who is doing the encrypting and decrypting that they are doing it correctly so that when your laptop gets stolen your passwords aren't leaked, and 2. you have to have access to the machine that the passwords are stored on when you want to use them. If you've left your laptop at home or you forgot to back up your password file when you got a new computer, you're SOL.

What's the solution? A password manager without state, of course! Since we're assuming the user can remember at least one pretty good password, we can use that as our "state", so we end up with the algorithm as follows:

hash = sha512(user_secret + "example.org")
base64encode(hash)[:10]

Here, we're using the domain to salt the user secret so the generated passwords are different for each site. sha512 provides randomness, although we are only using the first 60 bits of the output here (10 base64 characters, each character encodes six bits of entropy), there are significantly more bits of entropy here than in your typical English character, making it a much stronger password. Further, the algorithm is very simple, and you could re-implement it on any computer that has your favorite programming language environment available. Thus, you can use it in a pinch, since all you need to remember are the algorithm and your user_secret. I've published a python script that implements this mechanism, so you don't even have to remember the algorithm