The cardinal sin of Scratchbox, a backhanded stab at the Maemo SDK+, and the promise of a pot of gold
Sunday, 7. Dec 2008
Ok, this is a long one. This was floating around my brain for a while, and it is time to lay it to rest. Get a coffee and make yourself comfortable.
Scratchbox is a success, and life has adapted to its kinks. Our software has formed calluses around the places where Scratchbox used to cause pain.
If all goes according to plan, then the Maemo SDK will move from Scratchbox 1 to Scratchbox 2 in the near future. This might require our software to develop calluses in new places, but that is OK. The benefits of SDK+ will hopefully outweigh the cost of switching.
So I think this is a good opportunity as any to take a look at what Scratchbox and the Maemo SDK get wrong, in my opinion.
The big sin of Scratchbox 1 is wanton redirection to places that are outside the target. The Maemo SDK inherited that sin without questioning it, and I am afraid the Maemo SDK+ will continue with it, even tho Scratchbox 2 would allow us to repent.
By way of example, consider how Scratchbox and the Maemo SDKs handle “autoconf”.
The nice trick of Scratchbox 1 and 2 is to put you in an environment that looks and feels like a native ARM machine (say) by emulating the CPU, but processor-hungry tools like the compiler are still natively fast. This trick is achieved by provisioning alternative binaries of the processor-hungry tools. These alternative binaries are functionally equivalent to the real ones, but they use the instruction set of the native CPU, not the emulated one. Scratchbox magically executes them instead of the real binaries and thus avoids the emulation overhead.
So, how would you expect Scratchbox to handle autoconf, which is a shell script?
I would expect Scratchbox to do nothing about the /usr/bin/autoconf script itself. The desired speed up comes from having a alternative binary for /bin/sh, the interpreter.
Still, Scratchbox has autoconf in /scratchbox/tools/bin/ and accesses to /usr/bin/autoconf get redirected to /scratchbox/tools/bin/autoconf. Why?
The only reason can be bootstrapping or more generally, supporting incomplete systems. This is another basic trick of Scratchbox: you have a running autoconf even when your /usr/bin is still empty. This can be a great help when breaking the cyclic dependencies so typical of bootstrapping.
But, do we need the redirection magic for this? After all, if /usr/bin is empty, the shell will not find autoconf there and will not even attempt to run /usr/bin/autoconf. That’s why /scratchbox/tools/bin is included in your PATH by default, and autoconf and a host of other much more useful tools such as ls, cp, and mkdir are available to you from there even if your root filesystem is empty.
But, as soon as you have bootstrapped your system enough to have its own mkdir or autoconf, Scratchbox should step back and hand control over to the growing system. The emulation magic of Scratchbox will make sure that the system’s tools can be used even when they are compiled for a different architecture.
Thus, redirection is not really useful for bootstrapping or incomplete systems, and should not be used to make ghosts of binaries available in places that are really empty. Stepping back and handing over control can then be as simple as putting /scratchbox/tools/bin at the end of PATH.
This is not what Scratchbox does, and I frankly don’t know why. It’s easy to fix, by changing the default path and redirection rules, but still it puzzles me why the defaults for the Maemo SDK are the way they are. Defaults matter a lot when build bots enter the picture that run the Maemo SDK in its default configuration.
The way Scratchbox tries to stay in control of your growing system, you might think it is jealous and doesn’t want it to grow up enough to become independent. With the current setup, you are bound to grow not only incomplete, but also inconsistent systems that are forever bound to run nowhere else but in Scratchbox.
As a simple example, some portable programs hardcode the location of perl into their scripts. If you need these programs during early phases of bootstrapping, before you have added perl itself to the system, they will pick up /scratchbox/tools/bin/perl and will happily go to work bootstrapping the system further. Later, after bringing perl into the system, you will retrace the dependency cycle so that the programs will now hardcode /usr/bin/perl into their scripts, the real location of perl in the system that you are building. With the default Scratchbox, this is not what is happening: even on the second round, the programs will hardcode /scratchbox/tools/bin/perl. This of course will not work when the program is running in a real Maemo device that only has /usr/bin/perl.
(Incidentally, some scripts in the Maemo SDK+ rootstraps still refer to programs in /scratchbox/tools/bin.)
Worse, Scratchbox’s Perl might be sufficiently different from the one you want to have in your system, courtesy of devkits being a notoriously hard to maintain and slow distribution mechanism.
The same thing continues to happen even if bootstrapping should be long over. Essentially, what used to be a perfectly portable program has to changed to compile correctly in the Maemo SDK. This is a useless activity, much more useless than making a program cross-compilable.
Thus, /scratchbox/tools/bin etc should be at the end of PATH. But, that is not enough: Scratchbox will still redirect /usr/bin/perl to /scratchbox/bin/perl and you still have to fight against the core devkit. Except now everything turns to magic, “which” is lying to you, and you can spend the better part of the day figuring out why things don’t behave the way you expect them to.
So, redirection is useless and harmful and the default PATH inside Scratchbox is wrong.
But what about the first trick, avoiding the expensive CPU emulation for processor-hungry programs like the compiler? The speedup is crucial and it is worth to tolerate a bit of redirection ugliness to get it. If we can get the speedup in a non-ugly way, that would be better.
The first important thing to realize is that the list of binaries to speed up can be quite short and very explicit: /bin/sh, the coreutils, /usr/bin/make, /usr/bin/perl, /usr/bin/m4, /usr/lib/gcc/*/cc1, /usr/lib/gcc/*/cc1plus, /usr/bin/as, and /usr/bin/ld should get us pretty far. It might even be enough. There is no point to try to speed up interpreted programs and libraries like shell scripts and Perl modules. Bootstrapping is over, the point of redirecting /usr/bin/perl is to speed it up, not because we don’t have it yet.
The second important point is that to speed up /usr/bin/perl in this way, it has to be replaced with a functionally equivalent program. Same version, same compile time configuration, same everything except for the target instruction set. Improving qemu to do some JITing with caching would be great. Taking the configured source and compiling it again but this time for Intel is probably a more realistic approach for the time being.
In other words, I don’t want to use Scratchbox’s i386 Perl binary to speed up Maemo’s armel Perl binary. I obviously want to use Maemo’s i386 Perl to speed up Maemo’s armel Perl.
I also don’t want to use Debian’s Perl to speed up Maemo’s Perl. This is what the Maemo SDK+ is doing and it is equally as wrong as using Scratchbox’s Perl.
Scratchbox 2 is not so much redirecting executables, what it does is better described as assembling a new filesystem according to mapping rules. It does this in a much cleaner and much more flexible way than Scratchbox 1.
What the Maemo SDK+ does with it, however, is to again shadow parts of Maemo freely and massively with things from outside of Maemo. Almost everything comes from a Debian etch chroot. Why?
This time, the reason seems to be that someone tried to learn a lesson from devkits. Devkits are hard to keep up-to-date and people continuously complained about them being behind Debian. So, the solution was to replace the devkits with Debian.
This is still missing the point, though: For people working on Maemo, Debian is just as hard to maintain as the devkits, even more so. Just try to seriously answer the question which Debian distribution we should be using exactly for (half of) our build dependencies. Stable? Come on, even devkits are more recent. Testing? Maybe, but why not unstable? Maybe some packages from unstable and some from testing? Maybe people can decide on their own?
The complaint that devkits are out-of-date should be countered by removing devkits from the picture, but we shouldn’t stop there. General redirection to places outside of Maemo needs to stop. Maemo itself should contain exactly what we need. Build dependencies in devkits: bad, build dependencies in a distribution: hmmkay-ish. Build dependencies in our own distribution: for the win!
We should not use Perl from Debian, we should use our own Perl. It’s part of Maemo and not using it in the SDK is schizophrenic. If we need something from Debian, we should import it and make it part of Maemo, explicitly and in a controlled way, like we have been doing it since the dawn of Maemo for packages that are primarily meant for the devices.
To put it another way: I want a simple setup that allows us to work on Maemo itself. I don’t want a SDK that is a mix of packages from an old version of Debian and from Maemo, implemented via obscure and overly specific hacks to high level tools like dpkg-checkbuilddeps, and having to switch between under documented virtualization modes depending on whether I want to run “make all” or “make install”. Maemo can and should be big enough to contain its own build dependencies, and the virtualization tools should behave in a simple way.
So here is what I did as an experiment:
- I installed the maemo SDK+ with both diablo_4.1.1 rootstraps.
- I modified the configuration so that the armel rootstrap uses the i386 rootstrap as its “tool root”, and I disabled the gcc magic that Scratchbox 2 usually does.
- I wrote a new mapping mode for Scratchbox 2 that essentially gives you the following layout:
/ -> diablo_4.1.1_armel rootstrap
/home -> host
/dev -> host
/proc -> host
/sys -> host
/etc/passwd -> host
/etc/resolv.conf -> host
/usr/share/scratchbox2 -> host
At this point, you should have a working, fully emulated Diablo 4.1.1 environment. (Maybe this is the same as the “emulate” mode of Scratchbox 2, but I prefer to start from scratch in order to understand the magic better.)
You can compile stuff, but it is slow. Also, qemu is less than transparent and can’t seem to run /usr/bin/make, for example, which is a bit of a show stopper. Also also, find, xarg, and md5sum are missing and some tools expect /scratchbox/tools/bin to be there… oh well.
- The following mappings make native binaries visible inside the Scratchbox target:
/tools -> diablo_4.1.1_i386 rootstrap, read only
/opt/maemo/ -> host
- Then some symlinks for speed:
/bin/bash -> /tools/bin/bash
/usr/bin/make -> /tools/usr/bin/make
/usr/bin/m4 -> /tools/usr/bin/m4
/usr/bin/perl -> /tools/usr/bin/perl
- Also for Perl .so modules:
/usr/lib/perl/5.8/auto/File/Glob/Glob.so -> /tools/...
- For the compiler, I made a small wrapper since as a cross compiler, it doesn’t dare to look into the standard places for header files and libraries. This needs to be done more cleanly and more thoroughly of course and maybe I should be using Scratchbox’s magic instead. But for now I prefer it explicit:
$cc -I/usr/include -L/usr/lib -Wl,-rpath-link -Wl,/usr/lib "$@"
With this, “dpkg-buildpackage” runs successfully and with no real slowdown. I am happy.
- Fakeroot doesn’t seem to work. (I used the fantastic “sb -R” to get around this. Love it. But fakeroot just needs to be there as well for completeness.)
- The cross-compiler toolchain needs to move into Maemo and needs to be properly tuned.
- Some tools needs to be written to manage the symlinks, maybe in cooperation with the actual packages whose binaries we redirect.
- The Maemo warts need to be removed, such as the missing /usr/bin/find.
That’s it! Thanks for reading, and happy crossing!