On Tech, Math, Teaching
The personal laptop is the workhorse of the modern-day IT person. It is a highly dynamical system because over a long time, you hone it to your needs: You install software, configure it, structure its filesystem, create shortcuts, write scripts... then tomorrow you reconfigure again. When your laptop gets stolen on the train, a big pain would be to lose your data... including your config data! This must not be so, however, "just make a backup" does not cut it. In this blog post I first analyze the problem, present a useful way of thinking about it and explore a spectrum of what can be done. In the end, I present my personal practical solution.
In the following, all exmples are taken from Linux but the concepts are completely OS-agnostic.
The following workflow is the absolute minimal standard for most people, including past me. User data like documents, photos etc. is backed up. Beyond the user data, you make changes directly to the system. Let's say e.g., you want to change the keyboard shortcuts of your file manager. You will open a settings window, change some settings and apply them by clicking on "Close".
Remaining problem: These changes are not tracked and are therefore not reproducible, at least not automatically.
Let's hold on for a moment and ask a very basic question. What exactly is the configuration of a system? It is useful to divide it into four classes. The classes are ordered from "hardware-specific" to "hardware-independent".
Class | Examples | |
---|---|---|
1. | OS-to-hardware |
/etc/fstab , CPU architecture
|
2. | OS-to-user | System language, keyboard layout |
3. | Tools | Installed software and its configuration |
4. | User data | JPG files, docx files, ... |
We should only make classes 2 to 4 reproducible, but not class 1. Why? When your laptop is stolen, you are very likely to buy a more modern hardware than the one you just lost. It may come with an ARM architecture instead of x86, it contains a larger SSD than before, requires a new partitioning etc. Config of class 1 needs to be newly created, not reproduced. Now that we know what we want... how do we achieve it?
It is very helpful to think of configuring a system in terms of two notions: configuration ("source code") and target ("the actual configured OS"). Compilers work like this. They take source code and produce a binary. You never make changes to the target, only to the source code. What is the deal? Abstraction. Assembly code ("binary") is painful but high-level code ("source") provides very useful abstractions, like functions (allowing you to escape GOTO
spaghetti code). Only the source code is put under version control because you can always run the compiler to obtain the target.
What's the point? As it turns out, viewing the following discussions through this lens of "config vs. target" can be quite helpful. Moreover, each approach is depicted as a diagram of the same sort, in which...
A very easy way to capture all of the classes 2 to 4 is to backup everything, including class 1. When you want to setup your new hardware, you simply copy back everything. This corresponds to two simple routines "store" and "restore". The approach has the following major disadvantage though.
Remaining problem: You need to recreate class 1 by hand. Consider a new laptop with a new SSD that requires a new partitioning. You will need to know how class 1 is implemented, i.e. how to update the /etc/fstab
by hand. So you crawl through the fstab
manpage, debug in unknown environments... Being guided during the OS installation process was so much more convenient! If you switched to a new CPU architecture it is even worse: None of your binaries work anymore.
The problem with approach 1 is that too little is tracked. The problem with approach 2 is that too much is tracked.
The most common improvement to approach 1 is to extend the config by tool-specific config files, the so-called dotfiles1. Dotfiles typically declare the user configuration for various tools in a very condensed way, often (but not always!) in a human-readable format like JSON, XML, INI, ... For instance, the file /home/user/.mozilla/firefox/profiles.ini
defines the profiles of Firefox of a given user
. There are different solutions of how to keep track of dotfiles, e.g. one could use GNU Stow in combination with a version control system like Git. The details of how to manage dotfiles is a topic of its own, but let's notice for now that this in principle adds two routines "store dotfiles" and "restore dotfiles" to our workflow. Alas, tracking dotfiles is not sufficient:
Remaining problem: Dotfiles form only a small part of classes 2 and 3. For example, although your Firefox profile is stored in some kind of dotfile, the fact that Firefox is installed at all is not tracked2.
The problem with approach 3 is that parts of the configuration are not tracked, e.g. which software packages are installed via the package manager. The easiest solution is to write a restore script, e.g. one that runs the sudo apt install firefox cowsay
command. Commands like these may encode any configuration that is not already captured via dotfiles, in this case "have the firefox
and cowsay
packages installed".
Instead of writing a long bash script, one should use a tool like Ansible. The main advantage of Ansible is that it allows splitting such a script into multiple tasks. This makes it easier to execute only parts of the script for debugging purposes and more. Just like the dotfiles, the restore script is now part of the config and therefore version-controlled. This adds one more routine to our workflow, "run script". Notice that installing Firefox now involves first updating the script, and then running it. The update part has therefore been moved to the config side ("source"), away from the target side.
Remaining problem: The nature of the restore script comes with a caveat, namely that it must be idempotent: Running a task multiple times must have the same effect as running it once. In mathematical terms, if $f$ is the function that restores the system from state $s$ to a state $f(s)$, then $f$ being idempotent means that $f(f(s)) = f(s)$ for all states $s$.
Notice that while the dotfiles are declarative in nature (They declare the final system state), the restore script is imperative in nature (They specify how to achieve that state). Ensuring idempotency of this script requires great care and and may become quite complex as the following simple example demonstrates.
Consider the case where you want to append the /etc/hosts
file by the line 127.0.0.1 test.localhost
. The script needs to check whether the line is already present in the file, and if not, append it. If that check was not performed, the line would be appended multiple times and idempotency would be violated. While Ansible provides some abstractions like "ensure this line is present", it is still a long way from the high-level abstractions that some programming languages provide.
Trying to ensure idempotency, it is very easy to shoot yourself in the foot. The stakes are quite high: Bugs in the script may bring the system in a state that that cannot easily be rolled back. It is like operating directly on the patient, without any sandboxing that would allow trying out things safely3.
The alternative to an imperative restore script is to use a declarative source file, simply called config declaration in the following. Now what is meant by "declarative"? It means that you write down your configuration in a functional language and then simply push the "compile" button just as you would with every high-level programming language. The following example contrasts the two styles.
Now unlike compiling source code to a single binary, here the source code is compiled to a whole system state, i.e. a file system consisting of binaries and config files. The combination of "compiling" and "applying" a config adds another routine to our workflow, called "activate" for short in the following. Here are the three main selling points.
In a declarative language, adding a new entry to /etc/hosts
is as simple as modifying a single expression. The system then automatically ensures the file is updated correctly. For our example, the change would be to change the following line...
make_hosts_file(default_hosts)
to...
make_hosts_file(default_hosts ++ [{ ip: "127.0.0.1", domain: "test.localhost" }])
Not only does such a language allow a cleaner separation of classes 1, 2 and 3, but it makes changes to any of the three classes much easier. Even changing the CPU architecture, resulting in a radically different system, consists only of changing an expression like...
make_system("arm", config)
to...
make_system("x86", config)
Existing solutions for this declarative approach include the Nix package manager and Guix.
Remaining problem: The idea is somewhat radical because once you switch to a high-level language, it is hard to embed low-level components. If a desktop application only allows configuration through a GUI, resulting in a non-human-readable file, it’s difficult to include it in a declarative system. You’d need to either automate the GUI process, find a way to integrate its configuration into the declarative system, or handle it manually.
Unlike server software which is often remotely accessed via terminal, a lot of desktop software dependends heavily on GUIs. Unix-people often love configuration via TUI and I believe this is also due to the idea of "dealing with source code" (Unix people are programmers). Let me drop a hot take here: GUIs should be used for as much configuration as possible! GUI setting dialogs guide you through the process, provide explanations, and the most important point: They prevent invalid configurations ("built-in type-checking"). We should not give up this convenience on the desktop. The workflows around the declarative approach of Nix stands in contrast to this.
Why not take the best of all worlds? Interestingly, they are very compatible with each other. The problem of approach 5 is its incompatibility with GUI-based workflows. But how about we use approach 5 only selectively when it is suitable to us (e.g to configure comamndline tools), and use approach 4 otherwise?
In the following I summarize what it means to take the best of all worlds: There are four "worlds" that are combined, resulting in six routines. For each world, I highlight its unique advantage.
There was a time when I was very excited about approach 5 (using Nix). I believe this is due to its purity; the idea of "declare your OS in a functional language" seemed very intriguing. I personally became much more pragmatic and use approach 6, trading purity for simplicity. The exact details of how each world is implemented are each a topic of their own and not the main goal of this blog post. Because I have benefitted a lot (!) by reading about other people's solutions online, I conclude this blog post by presenting my personal implementation of approach 6 in the next section.
In practice I've done quite some scripting to automate all four components. Was this necessary? Probably not. Approach 6 requires the integration of multiple tools though and I have not found a 4-worlds-in-1-solution yet, although I would not be surprised if it does exist. Researching how to make the personal computer reproducible was a very interesting (ongoing) journey and trying out things by yourself is the most fun way to learn.
There is one more feature that I have not discussed because it is not relevant for everyone. I have split my whole config into a public repo and a private repo. Why is that? As mentioned, I benefitted hugely from open source, more generally from open knowledge and this is my contribution, may it be a stepping stone to others. Splitting the whole config into two parts worked remarkably well. Of course I had this in mind during the implementation, which is why it turns out as quite composable.
Here's my implementation of approach 6.
store
and restore
that copy the dotfiles to the repo's static directory and back. More explanations can be found here. Routines:
nixpkgs
to install software. However, because of runtime dependencies of some GUI apps, I prefer to use as many packages as possible from Debian's APT package manager instead of mixing the two repositories. Here is the declaration of the Nix config file for my laptop. Routine:
luback
that offers two functions backup
(= store) and restore
. The user data is first encrypted and then synced with a remote server using duplicity. Routines:
Footnotes:
.bashrc
".
/etc/fstab
, rendering the system in an unbootable state. The original state is recoverable, but clearly this kind of pain is highly undesirable.