The Valiant Little Tailor
"The Valiant Little Tailor" (or short: Tailor) is a project developed by Bernhard Heinloth, Valentin Rothberg, Andreas Ruprecht, supervised by Anil Kurmus and Reinhard Tartler, which allows to generate a small Linux kernel configuration tailored to a specific use case. The general idea and the design originate from the ktrim project.
The tools were developed for a practical course with the VAMOS project (1) at Friedrich-Alexander-University Erlangen-Nuremberg, and improved for publications in the proceedings of the HotDep 12 workshop (2) and the NDSS 2013 conference (3).
A new method called Flipper for collecting the tracing information was developed for publication at GPCE '14 (4). If you want to try this, please follow the detailed instructions on the Flipper documentation page and the corresponding README in the tarball.
On the project website (1) you can also find more papers describing the undertaker tool and its variability model in greater detail.
In this documentation we want to give an insight on the various steps required to build this configuration, and how to use our tools to actually do it. While the general approach is distribution independent, we were able to improve the results on Ubuntu by using the specific upstart infrastructure. We explain this in the additional HOWTO TailorHowto file as part of a complete recommended workflow for the Ubuntu distribution. This can be done for other distributions as well, so patches and instructions for porting to other distributions are always welcome!
Overview of the approach
In order to tailor a kernel configuration specific for a use case, we can use the kernel's own ftrace infrastructure to collect a log file containing the addresses of all functions that have been instrumented. As there might be functionality needed in the system that is only executed during the boot phase, we provide a simple modification of the initramfs (currently Ubuntu only!) in order to make it possible to start tracing very early in the boot process.
We can then use debug information to correlate these addresses from the text segment to locations in kernel source files. This leads to a list of typically around 10.000 to 15.000 distinct locations in the source code.
Using the undertaker tool and its possibility to compute Kconfig models for a Linux source tree, we can now get the Kconfig constraints needed to actually compile every one of these locations, including possible dependencies between Kconfig features. By combining this for every location found from the traces, we get a formula describing the Kconfig constraints for the entire list.
This formula can be solved by a SAT solver, setting the variables (and therefore configuration options) to be on or off. By using heuristics to keep as many features turned off as possible, we end up with a kernel configuration containing a very small set of configuration options that are needed for the given use case.
More details on the approach can also be found in (2), (3) and (4).
Requirements
- Ubuntu 12.04 or later
- Binaries with debug information for the current kernel
- The undertaker tool in version 1.4 or later, with all its dependencies
- The addr2line tool, found in the GNU Binary Utilities
- A Linux kernel source tree for your traceable kernel
- Kconfig models with file inference information for the Linux kernel (calculated by calling "undertaker-kconfigdump -i x86" in the Linux source tree you want to trace)
- Root/sudo access on the machine you want to trace
- Possibly a defined use case, in order to run a test workload (although you can always just use it on your own machine, and execute your everyday tasks)
Tool descriptions
- undertaker-tracecontrol: Allows you to start/stop the collection of system traces
- undertaker-tracecontrol-prepare: Prepares your Ubuntu system to be able to collect traces (see HOWTO file for further instructions for Ubuntu).
- undertaker-traceutil: This tool watches the kernel's own ftrace log for addresses it has not encountered yet, feeds ftrace's ignore list with already discovered functions, and writes a file containing unique addresses encountered during tracing; if the address traced stems from within a loadable kernel module (LKM), the module's name is written to this file, too.
- undertaker-tailor: This script uses the collected log together with debug information to first obtain which address resolves to which line in the source code, and to additionally help you calling the undertaker tool. It accepts user defined white- and blacklists, and calls the undertaker tool with the correct parameters. This should be executed from within a Linux source tree.
How do I use it?
Download the undertaker tool either from the webpage (http://vamos.informatik.uni-erlangen.de/trac/undertaker) or install it on Debian or Ubuntu by using your package manager.
tailor@machine:~$ sudo apt-get install undertaker
Download the Linux kernel source for the currently running kernel; this can be for example done with apt-get, using the following command:
tailor@machine:~$ apt-get source linux-image-`uname -r`
Generate the model information for this Linux tree, by changing into the source tree and execute the undertaker-kconfigdump; please replace the last parameter with your machine's architecture, if you're not using x86.
tailor@machine:~$ cd linux-3.2.0/ tailor@machine:~/linux-3.2.0$ undertaker-kconfigdump -i x86
Also make sure that you have kernel binaries with debug information, which some distributors provide in their repositories (for Ubuntu, see HOWTO). For the remainder of this documentation, suppose we have stored this in ~/debug-3.2.0/.
Once you have fulfilled all these requirements, you're ready to start tracing. You can do so by typing:
tailor@machine:~$ sudo undertaker-tracecontrol module
This fires up the undertaker-traceutil and includes support for LKM addresses; this can also be a background process after you entered your password for sudo. On Ubuntu, you can setup the system in a way to start tracing during the boot process which might cover additional information about your system (e.g. calls that only occur from the startup scripts). Please see HOWTO for this.
Now our tool is collecting traces and stores them in a text file called undertaker-trace.out in /run. You should now run your use case on the system, for example if you are tracing a LAMP server, you could now run a benchmark or generate some traffic to trigger the execution of relevant code paths in the kernel. (see (2) or (3) for use cases we used)
If you're confident that the collected traces are sufficient (i.e. if the output file '/run/undertaker-trace.out' doesn't grow anymore while still running the test workload), you can stop tracing by using the command
tailor@machine:~$ sudo undertaker-tracecontrol stop
Now you can start the undertaker-tailor script to generate a configuration. First, please change into the previously prepared kernel source tree.
tailor@machine:~$ cd linux-3.2.0/
Please make sure that you have read access to the undertaker-traceutil output file (/run/undertaker-trace.out); otherwise please run
tailor@machine:~/linux-3.2.0$ sudo chmod og+r /run/undertaker-trace.out
Now you can execute the undertaker-tailor tool, which you need to provide the collected trace file to. The tool is highly configurable in order to make it work for other distributions than Ubuntu, too, but a typical call on x86 will look like this (for a specific call for Ubuntu, see HOWTO):
tailor@machine:~/linux-3.2.0$ undertaker-tailor -a -c -k ~/debug-3.2.0 /run/undertaker-trace.out
-a (for "automatic) will assume "." as the source and debug information tree, the model "./models/x86.model" and black- and whitelists for the x86 or x86_64 architecture in the subdirectory tailor/lists (or "/usr/etc/undertaker", if the Debian package was installed). Please note that -a currently only works for x86_64 and x86, patches for other architectures are always welcome!
In order to override the path to debug information, the -k parameter is given with the actual path. The -c parameter is used to generate a complete configuration that is expanded by the Kconfig framework after the undertaker run. For a complete list of parameters, please see the output of "undertaker-tailor -h".
Important note on black- and whitelists:
You can supply the undertaker tool with white- and blacklists (using the -b and -w parameters for undertaker- tailor), which contains Kconfig features to be turned on or off specific to your needs. If there is a conflict between a dependency requirement from the log file and a list, the list will be preferred. This makes it possible for example to automatically remove the ftrace infrastructure from a tailored kernel.
Furthermore, we need to have these black- and whitelists because some features in Linux are not traceable, and therefore can not appear in the ftrace log file. In the undertaker source subdirectory "tailor/lists" (or if you installed the undertaker Debian package, in "/usr/etc/undertaker/"), we provide sample black- and whitelists for the x86 and x86_64 architectures, which we needed to employ to get a bootable kernel for Ubuntu 12.04. Additionally, the undertaker tool currently has issues parsing a few files. These files are included in the tailor/lists/undertaker.ignore file (Debian package: /usr/etc/undertaker/undertaker.ignore), and the undertaker-tailor tool will filter them out of the generated list before feeding it into the undertaker tool.
The list is automatically employed on x86 and x86_64 if you use the -a parameter, or you can use a custom version by providing the -i parameter to undertaker-tailor. Once this call is finished, the .config file will contain your tailored configuration, so you can now compile this kernel (using the command below will also generate a .deb package that can be easily installed using dpkg) and use it.
tailor@machine:~/linux-3.2.0$ make deb-pkg -j6
tailor@machine:~/linux-3.2.0$ cd .. && sudo dpkg -i ./linux-image-\*.deb
Remarks on the approach
Our approach uses the assumption that the workload run during tracing is representative for the whole use case, in other words, the traces need to be "sufficiently complete". However, as can be seen in the evaluation section in (3), we do not need to execute every single code path that is used in the target system. The results presented there show that a configuration obtained from a smaller sample scenario also provides full usability in the bigger scenario.
In both (2) and (3), we got comments from the reviewers who were concerned about what might happen if you execute something on your tailored kernel which was not encountered during tracing, and if this might cause the kernel to crash. While this can not be completely excluded (and with respect to the remark above seems to be no problem for "closely related code"), this would not be a drawback of our approach, but moreover point to bugs in the application or the kernel. Configuring the kernel with its own infrastructure (which considers interdependencies by the provided constraints for every feature) will leave no "loose ends" in the kernel that could lead to "sudden breaks" in the code, but will have the kernel in a consistent state. This is (in more detail) also discussed in (3). Of course we can not give guarantees for some completely different scenarios (e.g. tailoring a kernel for a LAMP server scenario and using it as a gaming machine), but a) this was not the goal of our approach, and b) should not lead to kernel crashes, but rather to errors handled and explained to you by the people who wrote the application and kernel code.
Known issues
- Proprietary drivers not working: As our approach is based on source code and the Linux source tree, proprietary drivers can not be traced and analyzed, therefore any devices requiring proprietary drivers will not work with the traced kernel.
- Sound not working on a Ubuntu Desktop machine: Our approach will not enable any sound parser codecs, it will only select the driver itself. If you want sound, you have to manually edit the config before compiling or add the desired Kconfig option to the whitelist.
- The tools were only tested on Ubuntu, patches or comments about necessary adaptions for porting them to other distributions are welcome.
References
- (1): VAMOS: Variability Management in Operating Systems, http://www4.cs.fau.de/Research/VAMOS/
- (2): Automatic OS Kernel TCB Reduction by Leveraging Compile-Time Configurability, HotDep '12, Hollywood, CA, http://www4.cs.fau.de/Publications/2012/tartler_12_hotdep.pdf
- (3): Attack Surface Metrics and Automated Compile-Time OS Kernel Tailoring, NDSS '13, San Diego, CA
- (4): Automatic Feature Selection in Large-Scale System-Software Product Lines, GPCE '14, Västerås, Sweden