Wednesday, September 02, 2009

Pedigree's "Services" Feature

Well, the Pedigree team is currently in bug fixing mode as we prepare for our first release, codenamed "Foster" and as such, there's not been a lot to talk about innovation-wise in the kernel. So I've decided instead to chat about a feature of the kernel, which was implemented recently.

For our "live CDs" we realised we would need a read/write file system, in memory, with a set of applications and files to allow people to try Pedigree out without making changes to their computers. This meant that we needed a way to mount a disk image as a usable disk, which is a feature we lacked at that stage in Pedigree. So I spent a couple of hours whipping up a module to support such a feature. However, this module required a couple of tricky design decisions: Cache, and write support.

Cache is a must-have in such a module, as without it each read from the disk image ends up in a read to its parent hardware - not so great if a slow CD drive needs to seek! However, the cache must also be big enough to hold at least one sector of disk data (preferably more). I eventually settled upon a 4096-byte buffer cache, which holds two full CD sectors, and eight hard disk sectors. Initial reads will miss the cache and read straight from the hardware into the cache. Further reads come from that buffer in cache. For an image formatted as FAT, for example, this improves read times from the file allocation table and root directory (high-use areas) significantly.

Writing to the virtual disk makes things slightly more complicated. I decided upon an implementation, which gave the option for the virtual disk to be write-through, or totally in RAM. These options need a bit of explanation. A write-through virtual disk will place writes into cache, and write to the real disk image itself. This is an ideal setup for something like Linux-style loopback disks. However, consider the "live CD" situation: we can't write back to a CD! So the second option writes only into the cache without affecting the original file. This means all changes are kept as long as the system is running, and do not persist. Implementing these two write options generalised the module - a massive bonus.

However, all is not well at this stage. A module to provide a disk is one part of the battle. At this stage, all we have is an abstraction of a disk - no partitions, no file systems - it's useless for normal usage. At the time of implementation, Pedigree had no way to dynamically detect and mount such disks. This simply cannot do for a modern operating system where storage devices are hardly static - USB mass storage devices come and go, hot-pluggable hard disks exist, and so on.

I didn't want to have my loopback disk module talk directly to the partition driver though. Exposing the internals of the partitioner to other modules makes changing the partitioner’s interface awfully complicated, and creates an explicit dependency on a specific module.

I decided instead to implement what I call Pedigree’s service manager. This kernel feature sits between different parts of the operating system and provides a standardised interface to other modules. Each service provides the following types of functionality (at the time of writing):

  • Write: Send data to the module
  • Read: Read data from the module
  • Touch: Inform the module of new state
  • Probe: Probe the module for a specific state or piece of information

Each service decides which features it provides, so it is possible for a service to provide only read as a function. The service manager takes these potential features and provides a generic interface for drivers and modules to talk to named services. In effect, this idea of services is a method of inter-process communication using named destinations. Therefore, with this new service manager, I was able to modify the partitioner to add support for the touch service. There is no need for a partitioner to support read, write, or probe, as the only notification to be sent to the partitioner is to inform it of a new disk.

With a quick modification to the loopback disk code, I was able to inform the partitioner of the presence of the new disk with error handling and no direct partitioner-specific functionality used:

// Chat to the partition service and let it pick up that we're around now
ServiceFeatures *pFeatures = ServiceManager::instance().enumerateOperations(String("partition"));
Service         *pService  = ServiceManager::instance().getService(String("partition"));
NOTICE("Asking if the partition provider supports touch");
if(pFeatures->provides(ServiceFeatures::touch))
{
NOTICE("It does, attempting to inform the partitioner of our presence...");
if(pService)
{
if(pService->serve(ServiceFeatures::touch, reinterpret_cast(this), sizeof(FileDisk)))
NOTICE("Successful.");
else
ERROR("Failed.");
}
else
ERROR("FileDisk: Couldn't tell the partition service about the new disk presence");
}
else
ERROR("FileDisk: Partition service doesn't appear to support touch");



This feature has already been added to other areas of the kernel (mainly talking to the partitioner), but has the potential to even be expanded to call applications that the user runs. This means it would be theoretically possible to replace the partitioner at runtime, or replace a component of the network stack to provide a different level of service. That means that Pedigree can be modular and flexible, even though it uses the conventionally rigid “monolithic kernel” design. Now that’s something to write home about!

NOTE: Blogger simply will not let that code sample work without wrapping it (it looks right in the preview and text editor). You should be able to get the idea that I'm trying to convey though.