Status Update, March 2020

It’s been a while since my last status update! Let’s get straight to it.

Virtio Wayland

In the last update, I talked about crosvm, the virtual machine monitor (VMM) Spectrum will be using. Since then, I have hit some important milestones:

hyperfekt and I got a crosvm package into Nixpkgs.
A Nix-built Virtio Wayland stack ran a simple Wayland client in a VM, integrated with my host system’s window manager.
Frustrated at the lack of documentation, I published an overview of all the components required to use Virtio Wayland. When I checked recently on a couple of popular search engines, my blog came up top or near-top for “virtio wayland”. Hopefully this means the next person who needs Virtio Wayland documentation is in a bit better of a position than I was.
I can now run a full Wayland compositor inside a VM, as a Wayland client connected to another compositor on the host system.

The last item would probably benefit from some explanation:

In Spectrum, we want the host system to be running as little software as possible, to reduce the attack surface. This means that the Wayland compositor should itself run in a VM, displaying applications running in other VMs. And the first step to this was getting a Wayland compositor to run in a VM.

I chose Sway for this proof-of-concept because it supports running as a Wayland client, i.e., as a window in another compositor. Additionally, it’s straightforward to build, and I’m already familiar with it as a user and a little as a developer, having fixed a couple of bugs in it.

Some minor patching was required, because Sommelier doesn’t support a new enough version of the wl_compositor to run wlroots out of the box. The patch is a quick hack to downgrade wlroots’ protocol requirements. It will undoubtedly have a negative performance impact, but it lets me keep moving for now.

At some point, we will want to upgrade Sommelier to support the newer protocol version. I think it might be a good opportunity for a new contributor to get acquainted with Spectrum’s Virtio Wayland setup, and provide an extremely useful contribution that I won’t be able to get to myself for a long time. If that sounds interesting, please please get in touch on the development mailing list or on IRC.

crosvm

At 36c3 in December, I spoke to a couple of interested potential Spectrum contributors. We discussed what they might be able to work on, and it became clear to me for basically any interesting feature development work, better VM-to-VM communication was going to be required much earlier in Spectrum’s development than I’d previously anticipated.

Some examples of where VM-to-VM communication will be important in Spectrum are:

Running a Wayland compositor in a VM, displaying applications running in other VMs.
Running a device driver in a VM, which can then expose that device to application VMs without exposing an entire physical USB controller.
Direct VM-to-VM networking.

Right now, the only form of VM-to-VM communication is IP, via the host kernel’s networking stack. For all of these things, it would probably have been possible to tunnel them over IP somehow, but all the work to do that would end up being for nothing when we eventually got proper inter-guest communication. So if that work could be avoided entirely by doing better VM-to-VM earlier, it would be more efficient in the long run.

As scary as it was, it was time to bite the bullet and implement inter-guest communication, in crosvm, over virtio. I knew this was going to be a lot of work, because crosvm doesn’t really do any inter-guest communication at the moment. Everything is between host and guest.

The first step towards this was making it possible to run crosvm devices separately from the crosvm program that runs a VM. The hard part of this is initializing the devices. While crosvm runs devices in a sandbox, it doesn’t initialize them in one. This means that crosvm assumes it can pass arbitrarily complex data into a VM before the sandbox is activated, and I needed to figure out how to do this over a socket.

Because devices do run in a sandbox, we can assume that the only shared mutable state between a device and crosvm is shared memory. Shared memory is accessed using a file descriptor and can therefore be sent over a socket. So initializing a remote crosvm device can be reduced to serializing an complex data structure consisting of bytes and file descriptors into a socket, and then deserializing it at the other end. What exactly this data structure is varies by device, so we need a way to do it generically.

crosvm has a nice internal library called msg_socket, which can serialize and deserialize Rust types consisting of data and file descriptors, and send them over a socket. Alas, it is not suitable for our use case, because it only supports statically-sized data. Critically, this means no dynamically-sized arrays, which are used fairly extensively in crosvm device initialization. Adding support for dynamically sized data didn’t look easy either.

On the other hand, serde, the canonical solution for data serialization in Rust, assumes (reasonably) that its output will eventually be a bag of bytes. Since we also need to handle file descriptors, we can’t just use serde either.

So I wrote a new library, msg_socket2. It wraps the serde API to additionally provide methods that handle serializing and deserializing file descriptors, which can be sent alongside the data over a socket. I might end up extracting it and putting it on crates.io at some point in the future, once it’s a bit more polished (and has a better name! Suggestions welcome.).

All of this means that Spectrum’s crosvm can now provide small programs that run a single kind of crosvm device in a completely separate process from an application crosvm instance, but which an application crosvm instance can connect to and use. The next step will be putting the device a VM. The IPC handling will go into a new crosvm process, which will then run the device in its own VM.

As a result of all this, Spectrum’s crosvm has now diverged quite significantly from Google’s. Diffs between them report thousands of lines changed. We have quite firmly progressed from “patchset” to “soft fork” here. I am still merging in upstream changes, and don’t plan on getting to a point where we no longer do that.

Nixpkgs

I’ve been focusing very heavily on crosvm so far this year, because I want to get back on track hitting milestones as soon as possible, so Nixpkgs work has so far been fairly limited.

I took part in a special NixOS Office Hours to discuss some concerns I had about the Flakes development approach, and one result of this was the revival of the Flakes shepherding team. We recently had the first meeting we’d had in months. I suspect the first Flakes RFC will be accepted fairly soon.

I also went on a bit of a merge spree, and merged more than 60 pull requests from other contributors this month, by my rough estimate. This got the number of open PRs below 1800 for a while, but it’s now right back where it started.

Spectrum community

At the time of my last status update, I had just finished setting up Spectrum’s cgit, mailing lists, etc. A lot of Nixpkgs work came out of this, and a lot of that is now upstream! I’m very pleased with this, especially since I worried that the Mailman module improvements would never make it, since they changed the way the module worked quite a lot.

The cgit has been very useful. It lets me show other people Spectrum code, and it lets other people easily browse it. I know it’s used for both of these things. The mailing lists have been very quiet, but I’m not surprised by that. With just me actively working on the project, there’s not all that much to discuss in long form on a mailing list.

The IRC channel, on the other hand, has become pleasantly active. There’s a good core of people there who are keeping up with and discussing the project. There’s almost always somebody around to talk things through with, which is especially nice, and people have been very quick to offer their expertise on various things I’ve found myself working with or thinking about.

There have been very few external contributions so far. I’ve been thinking about what I can do better to attract contributions. Most of the work that currently needs done is rather specialised, so no matter what I do it’s not going to be easy to attract drive-by contributions, at least for a while, I think. But I’m sure there’s still stuff I can do.

One thing I thought about was streaming myself working on Spectrum occasionally. I think this would give people an opportunity to see what I’m working on, and maybe spot things that could be improved or fixed, or to ask questions and have me demonstrate how a particular aspect of a system works. I got a positive reception on IRC when I first mooted this idea, and so I’ve spent some time figuring out how this would work and setting things up. I expect to do my first stream in the next few days.

The future of status updates

I’m thinking about changing the way I communicate what’s going on with Spectrum development. Status updates in their current format are nice, but they’re extremely time consuming—this one is coming up on three thousand words and by the time it is published will have taken me more than a day to write and edit. Knowing that this amount of work is coming puts me off writing updates, and then months go by where there’s no easy way for somebody to see what’s going on with the project.

Cole Mickens suggested on IRC that I try a “This Week In…” format. I have some concerns with this idea—there will undoubtedly be some weeks that go by where nothing much happened that’s likely to be of interest to anybody but me. Overall, though, I think this might well work better than the occasional long status update, so I think I will try it.

If I do this, the updates will likely be posted to Spectrum’s discuss mailing list, rather than to this blog.

I don’t know whether I’ll continue writing this sort of status update if the TWI format takes off. I’d like to think I will, because I do think there’s a lot of value in occasional, long, detailed articles accessible to project outsiders. On the other hand, they require a very large time investment. I’d like to hear any thoughts you might have on this.

Summary

These are uncertain times for Spectrum. I’ve now taken about three months out of my planned work schedule—which is also three months of not working on funded milestones—to make sure we have solid foundations to build Spectrum on top of. With this comes a fairly substantial risk both to myself and to the project. I really hope it pays off.

It’s nice to see a core of interested people around Spectrum, even if external contributions up to this point have been limited. I hope some of the new stuff I plan on trying, like streaming and smaller, more frequent, more focused status updates, helps attract some external contribution. I also think that once the focus is less on VMM work and more on Nix work, that might be easier for external contributors to get into, since most of Spectrum’s current community comes from Nix’s. If you’re interested in contributing to Spectrum, please get in touch with the project on the devel mailing list or on IRC, or contact me directly if that’s more comfortable for you. I’d be very happy to point you in the right direction and assist you.

Finally, thank you so much to the kind people who donate money to sustain my work. It is only because of this support that I can take risks like I have described in this post. Without the level of financial certainty I get from recurring donations, I would not have been able to justify all the work on crosvm I’ve been doing, and I think the project would be in a worse place because of it. If you’d like to financially support my work and enable me to take similarly bold decisions in the future, you can donate through GitHub Sponsors (which will match your donations until October 2020) or Liberapay.