❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Updating Ubuntu packages that you have local changes for with dgit

By: cks
31 March 2026 at 22:07

Suppose, not entirely hypothetically, that you've made local changes to an Ubuntu package using dgit and now Ubuntu has come out with an update to that package that you want to switch to, with your local changes still on top. Back when I wrote about moving local changes to a new Ubuntu release with dgit, I wrote an appendix with a theory of how to do this, based on a conversation. Now that I've actually done this, I've discovered that there is a minor variation and I'm going to write it down explicitly (with additional notes because I forgot some things between then and now).

I'll assume we're starting from an existing dgit based repository with a full setup of local changes, including an updated debian/changelog. Our first step, for safety, is to make a branch to capture the current state of our repository. I suggest you name this branch after the current upstream package version that you're on top of, for example if the current upstream version you're adding local changes to can be summarized as 'ubuntu2.6':

git branch cslab-2.6

Making a branch allows you to use 'git diff cslab-2.6..' later to see exactly what changed between your versions. A useful thing to do here is to exclude the 'debian/' directory from diffs, which can be done with 'git diff cslab-2.6.. -- . :!debian', although your shell may require you to quote the '!' (cf).

Then we need to use dgit to fetch the upstream updates:

dgit fetch -d ubuntu

We need to use '-d ubuntu', at least in current versions of dgit, or 'dgit fetch' gets confused and fails. At this point we have the updated upstream in the remote tracking branch 'dgit/dgit/jammy,-security,-updates' but our local tree is still not updated.

(All of dgit's remote tracking branches start with 'dgit/dgit/', while all of its local branches start with just 'dgit/'. This is less than optimal for my clarity.)

Normally you would now rebase to shift your local changes on top of the new upstream, but we don't want to immediately do that. The problem is that our top commit is our own dgit-based change to debian/changelog, and we don't want to rebase that commit; instead we'll make a new version of it after we rebase our real local changes. So our first step is to discard our top commit:

git reset --hard HEAD~

(In my original theory I didn't realize we had to drop this commit before the rebase, not after, because otherwise things get confused. At a minimum, you wind up with debian/changelog out of order, and I don't know if dropping your HEAD commit after the rebase works right. It's possible you might get debian/changelog rebase conflicts as well, so I feel dropping your debian/changelog change before the rebase is cleaner.)

Now we can rebase, for which the simpler two-argument form does work (but not plain rebasing, or at least I didn't bother testing plain rebasing):

git rebase dgit/dgit/jammy,-security,-updates dgit/jammy,-security,-updates

(If you are wondering how this command possibly works, as I was part way through writing this entry, note that the first branch is 'dgit/dgit/...', ie our remote tracking branch, and then second branch is 'dgit/...', our local branch with our changes on it.)

At this point we should have all of our local changes stacked on top of the upstream changes, but no debian/changelog entry for them that will bump the package version. We create that with:

gbp dch --since dgit/dgit/jammy,-security,-updates --local .cslab. --ignore-branch --commit

Then we can build with 'dpkg-buildpackage -uc -b', and afterward do 'git clean -xdf; git reset --hard' to reset your tree back to its pristine state.

(My view is that while you can prepare a source package for your work if you want to, the 'source' artifact you really want to save is your dgit VCS repository. This will be (much) less bulky when you clean it up to get rid of all of the stuff (to be polite) that dpkg-buildpackage leaves behind.)

Here in 2026, we're retaining old systems instead of discarding them

By: cks
31 March 2026 at 02:45

I mentioned recently that at work, we're retaining old systems that we would have normally discarded. We're doing this for the obvious reason that new servers have become increasingly expensive, due to escalating prices of RAM (especially DDR5 RAM) and all forms of SSDs, especially as new servers might really require us to buy ones that support U.2 NVMe instead of SATA SSDs (because I'm not sure how available SATA SSDs are these days).

Our servers are generally fairly old anyways, so our retention takes two forms. The straightforward one is that we're likely going to slow down completely pushing old servers out of service. Instead, we'll keep them on the shelf for if we want test or low importance machines, and along with that we're probably going to be more careful about which generation of hardware we use for new machines. We've traditionally simply used the latest hardware any time we turn over a machine (for example, updating it to a new Ubuntu version), but this time around a bunch of those will reuse what we consider second generation hardware or even older hardware for machines where we don't care too much if it's down for a day or two.

The second form of retention is that we're sweeping up older hardware that other groups at the university are disposing of, when in the past we'd have passed on the offer or taken only a small number of machines. For example, we just inherited a bunch of Supermicro servers and Lenovo P330 desktops (both old enough that they use DDR4 RAM), and in the past we'd have taken only a few of each at most. These inherited servers are likely to be used as part of what we consider 'second generation' hardware, equivalent to Dell R340s and R240s (and perhaps somewhat better in practice), so we'll use them for somewhat less important machines but ones where we still actually care.

(A couple of the inherited servers have already been reused as test servers.)

The hardware we're inheriting is perfectly good hardware and it'll probably work reliably for years to come (and if not, we have a fair number of spares now). But it's hardware with several years of use and wear already on it, and there's nothing special about it that makes it significantly better than the sort of second generation hardware we already have. However, we're looking at a future where we may not be able to afford to get new general purpose 1U servers and our current server fleet is all we'll have for a few years, even as some of them break or increasingly age out. So we're hoarding what we can get, in case. Maybe we won't need them, but if we do need them and we pass them up now, we'll really regret it.

(The same logic applies to the desktops. We don't have any immediate, obvious use for them, but at the same time they're not something we could get a replacement for if we pass on them now. We'll probably put a number of them to use for things we might not have bothered with it we had to get new machines; for example, I may set one up as a backup for my vintage 2017 office desktop.)

I suspect that there will be more of this sort of retention university-wide, whether or not the retained hardware gets used in the end. We're not in a situation where we can assume a ready supply of fresh hardware, so we'd maybe better hold on to what we have if it still works.

How old our servers are (as of 2026)

By: cks
30 March 2026 at 02:25

Back in 2022, I wrote about how old our servers were at the time, partly because they're older than you might expect, and today I want to update that with our current situation. My group handles the general departmental infrastructure for the research side of the department (the teaching side is a different group), and we've tended to keep servers for quite a while. Research groups are a different matter; they often have much more modern servers and turn them over much faster.

As in past installments, our normal servers remain Dell 1U servers. What we consider our current generation are Dell R350s, which it looks like we got about two years ago in 2024 (and are now out of production). We still have plenty of Dell R340s and R240s in production, which were our most recent generation in 2022. We still have some Dell R230s and even R210 IIs in production in less important server roles. We also have a fair number of Supermicro servers in production, of assorted ages and in assorted roles (including our fileservers and our giant login server, which is now somewhat old).

(On a casual look, the Dell R210 IIs are all for machines that we consider decidedly unimportant; they're still in service because we haven't had to touch them. Our current view is that R350s are for important servers, and R340s and R240s are acceptable for less important ones.)

In a change from 2022, we turned over the hardware for our fileservers somewhat recently, 'modernizing' all of our ZFS filesystems in the process. The current fileservers have 512 GBytes of RAM in each, so I expect that we'll run this hardware for more than five years unless prices drop drastically back to what they were when we could afford to get a half-dozen machines with a combined multiple terabytes of (DDR5) RAM.

(Today, a single machine with 128 GBytes of DDR5 RAM and some U.2 NVMe drives came out far more expensive than we hoped (and the prices forced us to lower the amount of RAM we were targeting).)

Our SLURM cluster is quite a mix of machines. We have both CPU-focused and GPU-focused machines, and on both sides there's a lot of hand-built machines stuffed into rack cases. On the GPU side, the vendor servers are mostly Dell 3930s; on the CPU side, they're mostly Supermicro servers. A significant number of these servers are relatively old by now; the 3930s appear to date from 2019, for example. We have updated the GPUs somewhat but we mostly haven't bothered to update the servers otherwise, as we assume people mostly want GPU computation in GPU SLURM nodes. Even the CPU nodes are not necessarily the most modern; half of them (still) have Threadripper 2990WX CPUs (launched in 2018, and hand built into the same systems as in 2022). With RAM prices being the way they are, it's unlikely that we'll replace these CPU nodes with anything more recent in the near future.

With current hardware prices being what they are (and current and future likely funding levels), I don't think we're likely to get a new generation of 1U servers in the moderate future. We have one particular important server getting a hardware refresh soon, but apart from that we'll run servers on the hardware we have available today. This may mean we have to accept more hardware failures than usual (our usual amount of server hardware failures is roughly zero), but hopefully we'll have a big enough pool of old spare servers to deal with this.

(I expect us to reuse a lot more old servers than we traditionally have. For instance, our first generation of Linux ZFS fileservers date from 2018 but they've been completely reliable and they have a lot of disk bays and decent amounts of RAM. Surely we can find uses for that.)

PS: If I'm doing the math correctly, we have roughly 10 TBytes of DDR4 RAM of various sizes in machines that report DMI information to our metrics system, compared to roughly 6 TBytes of DDR5 RAM. That DDR5 RAM number is unlikely to go up by much any time soon; the DDR4 number probably will, for various reasons beyond the scope of this entry. This doesn't include our old fileserver hardware, which is currently turned off and not in service (and so not reporting DMI information about their decent amount of DDR4 RAM).

New old systems in the age of hardware shortages

By: cks
29 March 2026 at 02:52

Recently I asked something on the Fediverse:

Lazyweb, if you were going to put together new DDR4-based desktop (because you already have the RAM and disks), what CPU would you use? Integrated graphics would probably be ideal because my needs are modest and that saves wrangling a GPU.

(Also I'm interested in your motherboard opinions, but the motherboard needs 2x M.2 and 2x to 4x SATA, which makes life harder. And maybe 4K@60Hz DisplayPort output, for integrated graphics)

If I was thinking of building a new desktop under normal circumstances, I would use all modern components (which is to say, current generation CPU, motherboard, RAM, and so on). But RAM is absurdly expensive these days, so building a new DDR5-based system with the same 64 GBytes of RAM that I currently have would cost over a thousand dollars Canadian just for the RAM. The only particularly feasible way to replace such an existing system today is to reuse as many components as possible, which means reusing my DDR4 RAM. In turn, this means that a lot of the rest of the system will be 'old'. By this I don't necessarily mean that it will have been manufactured a while ago (although it may have) but that its features and capabilities will be from a while back.

If you want an AMD CPU for your DDR4-based system, it will have to be an AM4 CPU and motherboard. I'm not sure how old good CPUs are for AM4, but the one you want may be as old as a 2022 CPU (Ryzen 5 5600; other more recent options don't seem to be as well regarded). Intel's 14th generation CPUs ("Raptor Lake") from late 2023 still support DDR4 with compatible motherboards, but at this point you're still looking at things launched two years or more ago, which at one point was an eternity in CPUs.

(It's still somewhat of an eternity in CPUs, especially AMD, because AMD has introduced support for various useful instructions since then. For instance, Go's latest garbage collector would like you to have AVX-512 support. Intel desktop CPUs appear to have no AVX-512 at all, though.)

Beyond CPU performance, older CPUs and often older motherboards also often mean that you have older PCIe standards, fewer PCIe lanes, less high speed USB ports, and so on. You're not going to get the latest PCIe from an older CPU and chipset. Then you may step down in other components as well (like GPUs and NVMe drives), depending on how long you expect to keep them, or opt to keep your current components if those are good enough.

My impression is that such 'new old systems' have usually been a relatively unusual thing in the PC market, and that historically people have upgraded to the current generation. This lead to a steady increase in baseline capabilities over time as you could assume that desktop hardware would age out on a somewhat consistent basis. If people are buying new old systems and keeping old systems outright, that may significantly affect not just the progress of performance but also the diffusion of new features (such as AVX-512 support) into the CPU population.

The other aspect of this is, well, why bother upgrading to a new old system at all, instead of keeping your existing old old system? If your old system works, you may not get much from upgrading to a new old system. If your old system doesn't have enough performance or features, spending money on a new old system may not get you enough of an improvement to remove your problems (although it may mitigate them a bit). New old systems are effectively a temporary bridge and there's a limit to how much people are willing to spend on temporary bridges unless they have to. This also seems likely to slow down both the diffusion of nice new CPU features and the slow increase in general performance that you could assume.

(At work, the current situation has definitely caused us to start retaining machines that we would have discarded in the past, and in fact were planning to discard until quite recently.)

PS: One potentially useful thing you can get out of a new old system like this is access to newer features like PCIe bifurcation or decent UEFI firmware that your current system doesn't support or have.

Canonical's Netplan is hard to deal with in automation

By: cks
28 March 2026 at 03:10

Suppose, not entirely hypothetically, that you've traditionally used /etc/resolv.conf on your Ubuntu servers but you're considering switching to systemd-resolved, partly for fast failover if your normal primary DNS server is unavailable and partly because it feels increasingly dangerous not to, since resolved is the normal configuration and what software is likely to expect. One of the ways that resolv.conf is nice is that you can set the configuration by simply copying a single file that isn't used for anything else. On Ubuntu, this is unfortunately not the case for systemd-resolved.

Canonical expects you to operate all of your Ubuntu server networking through Canonical Netplan. In reality, Netplan will render things down to a systemd-networkd configuration, which has some important effects and creates some limitations. Part of that rendered networkd configuration is your DNS resolution settings, and the natural effect of this is that they have to be associated with some interface, because that's the resolved model of the world. This means that Netplan specifically attaches DNS server information to a specific network interfaces in your Netplan configuration. This means that you must find the specific device name and then modify settings within it, and those settings are intermingled (in the same file) with settings you can't touch.

(Sometimes Netplan goes the other way, separating interface specific configuration out to a completely separate section.)

Netplan does not give you a way to do this; if anything, Netplan goes out of its way to not do so. For example, Netplan can dump its full or partial configuration, but it does so in YAML form with no option for JSON (which you could readily search through in a script with jq). However, if you want to modify the Netplan YAML without editing it by hand, 'netplan set' sometimes requires JSON as input. Lack of any good way to search or query Netplan's YAML matters because for things like DNS settings, you need to know the right interface name. Without support for this in Netplan, you wind up doing hacks to try to get the right interface name.

Netplan also doesn't provide you any good way to remove settings. The current Ubuntu 26.04 beta installer writes a Netplan configuration that locks your interfaces to specific MAC addresses:

  enp1s0:
    match:
      macaddress: "52:54:00:a5:d5:fb"
    [...]
    set-name: "enp1s0"

This is rather undesirable if you may someday swap network cards or transplant server disks from one chassis to another, so we would like to automatically take it out. Netplan provides no support for this; 'netplan set' can't be given a blank replacement, for example (and 'netplan set "network.ethernets.enp1s0.match={}"' doesn't do anything). If Netplan would give you all of the enp1s0 block in JSON format, maybe you could edit the JSON and replace the whole thing, but that's not available so far.

(For extra complication you also need to delete the set-name, which is only valid with a 'match:'.)

Another effect of not being able to delete things in scripts is that you can't write scripts that move things out to a different Netplan .conf file that has only your settings for what you care about. If you could reliably get the right interface name and you could delete DNS settings from the file the installer wrote, you could fairly readily create a '/etc/netplan/60-resolv.conf' file that was something close to a drop-in /etc/resolv.conf. But as it is, you can't readily do that.

There are all sorts of modifications you might want to make through a script, such as automatically configuring a known set of VLANs to attach them to whatever the appropriate host interface is. Scripts are good for automation and they're also good for avoiding errors, especially if you're doing repetitive things with slight differences (such as setting up a dozen VLANs on your DHCP server). Netplan fights you almost all the way about doing anything like this.

My best guess is that all of Canonical's uses of Netplan either use internal tooling that reuses Netplan's (C) API or simply re-write Netplan files from scratch (based on, for example, cloud provider configuration information).

(To save other people the time, the netplan Python package on PyPI seems to be a third party package and was last updated in 2019. Which is a pity, because it theoretically has a quite useful command line tool.)

One bleakly amusing thing I've found out through using 'netplan set' on Ubuntu 26.04 is that the Ubuntu server installer and Netplan itself have slightly different views on how Netplan files should be written. The original installer version of the above didn't have the quotes around the strings; 'netplan set' added them.

(All of this would be better if there was a widely agreed on, generally shipped YAML equivalent of 'jq', or better yet something that could also modify YAML in place as well as query it in forms that were useful for automation. But the 'jq for YAML' ecosystem appears to be fragmented at best.)

Considering mmap() verus plain reads for my recent code

By: cks
26 March 2026 at 23:05

The other day I wrote about a brute force approach to mapping IPv4 /24 subnets to Autonomous System Numbers (ASNs), where I built a big, somewhat sparse file of four-byte records, with the record for each /24 at a fixed byte position determined by its first three octets (so 0.0.0.0/24's ASN, if any, is at byte 0, 0.0.1.0/24 is at byte 4, and so on). My initial approach was to open, lseek(), and read() to access the data; in a comment, Aristotle Pagaltzis wondered if mmap() would perform better. The short answer is that for my specific case I think it would be worse, but the issue is interesting to talk about.

(In general, my view is that you should use mmap() primarily if it makes the code cleaner and simpler. Using mmap() for performance is a potentially fraught endeavour that you need to benchmark.)

In my case I have two strikes against mmap() likely being a performance advantage: I'm working in Python (and specifically Python 2) so I can't really directly use the mmap()'d memory, and I'm normally only making a single lookup in the typical case (because my program is running as a CGI). In the non-mmap() case I expect to do an open(), an lseek(), and a read() (which will trigger the kernel possibly reading from disk and then definitely copying data to me). In the mmap() case I would do open(), mmap(), and then access some page, triggering possible kernel IO and then causing the kernel to manipulate process memory mappings to map the page into my address space. In general, it seems unlikely that mmap() plus the page access handling will be cheaper than lseek() plus read().

(In both the mmap() and read() cases I expect two transitions into and out of the kernel. As far as I know, lseek() is a cheap system call (and certainly it seems unlikely to be more expensive than mmap(), which has to do a bunch of internal kernel work), and the extra work the read() does to copy data from the kernel to user space is probably no more work than the kernel manipulating page tables, and could be less.)

If I was doing more lookups in a single process, I could possibly win with the mmap() approach but it's not certain. A lot depends on how often I would be looking up something on an already mapped page and how expensive mapping in a new page is compared to some number of lseek() plus read() system calls (or pread() system calls if I had access to that, which cuts the number of system calls in half). In some scenarios, such as a burst of traffic from the same network or a closely related set of networks, I could see a high hit rate on already mapped pages. In others, the IPv4 addresses are basically random and widely distributed, so many lookups would require mapping new pages.

(Using mmap() makes it unnecessary to keep my own in-process cache, but I don't think it really changes what the kernel will cache for me. Both read()'ing from pages and accessing them through mmap() keeps them recently used.)

Things would also be better in a language where I could easily make zero-copy use of data right out of the mmap()'d pages themselves. Python is not such a language, and I believe that basically any access to the mmap()'d data is going to create new objects and copy some bytes around. I expect that this results in as many intermediate objects and so on as if I used Python's read() stuff.

(Of course if I really cared there's no substitute for actually benchmarking some code. I don't care that much, and the code is simpler with the regular IO approach because I have to use the regular IO approach when writing the data file.)

Early notes on switching some libvirt-based virtual machines to UEFI

By: cks
26 March 2026 at 03:09

I keep around a small collection of virtual machines so I don't have to drag out one of our spare physical servers to test things on. These virtual machines have traditionally used traditional MBR-based booting ('BIOS' in libvirt instead of 'UEFI'), partly because for a long time libvirt didn't support snapshots of UEFI based virtual machines and snapshots are very important for my use of these scratch virtual machines. However, I recently discovered that libvirt now can do snapshots of UEFI based virtual machines, and also all of our physical server installs are UEFI based, so in the past couple of days I've experimented with moving some of my Ubuntu scratch VMs from BIOS to UEFI.

As far as I know, virt-manager and virsh don't directly allow you to switch a virtual machine between BIOS and UEFI after it's been created, partly because the result is probably not going to boot (unless you deliberately set up the OS inside the VM with both an EFI boot and a BIOS MBR boot environment). Within virt-manager, you can only select BIOS or UEFI at setup time, so you have to destroy your virtual machine and recreate it. This works, but it's a bit annoying.

(On the other hand, if you've had some virtual machines sitting around for years and years, you might want to refresh all of their settings anyway.)

It's possible to change between BIOS and UEFI by directly editing the libvirt XML to transform the <os> node. You may want to remove any old snapshots first because I don't know what happens if you revert from a 'changed to UEFI' machine to a snapshot where your virtual machine was a BIOS one. In my view, the easiest way to get the necessary XML is to create (or recreate) another virtual machine with UEFI, and then dump and copy its XML with some minor alterations.

For me, on Fedora with the latest libvirt and company, the <os> XML of a BIOS booting machine is:

 <os>
   <type arch='x86_64' machine='pc-q35-6.1'>hvm</type>
 </os>

Here the 'machine=' is the machine type I picked, which I believe is the better of the two options virt-manager gives me.

My UEFI based machines look like this:

 <os firmware='efi'>
   <type arch='x86_64' machine='pc-q35-9.2'>hvm</type>
   <firmware>
     <feature enabled='yes' name='enrolled-keys'/>
     <feature enabled='yes' name='secure-boot'/>
   </firmware>
   <loader readonly='yes' secure='yes' type='pflash' format='qcow2'>/usr/share/edk2/ovmf/OVMF_CODE_4M.secboot.qcow2</loader>
   <nvram template='/usr/share/edk2/ovmf/OVMF_VARS_4M.secboot.qcow2' templateFormat='qcow2' format='qcow2'>/var/lib/libvirt/qemu/nvram/[machine name]_VARS.qcow2</nvram>
 </os>

Here the '[machine-name]' bit is the libvirt name of my virtual machine, such as 'vmguest1'. This nvram file doesn't have to exist in advance; libvirt will create it the first time you start up the virtual machine. I believe it's used to provide snapshots of the UEFI variables and so on to go with snapshots of your physical disks and snapshots of the virtual machine configuration.

(This feature may have landed in libvirt 10.10.0, if I'm reading release notes correctly. Certainly reading the release notes suggests that I don't want to use anything before then with UEFI snapshots.)

Manually changing the XML on one of my scratch machines has worked fine to switch it from BIOS MBR to UEFI booting as far as I can tell, but I carefully cleared all of its disk state and removed all of its snapshots before I tried this. I suspect that I could switch it back to BIOS if I wanted to. Over time, I'll probably change over all of my as yet unchanged scratch virtual machines to UEFI through direct XML editing, because it's the less annoying approach for me. Now that I've looked this up, I'll probably do it through 'virsh edit ...' rather than virt-manager, because that way I get my real editor.

(This is the kind of entry I write for my future use because I don't want to have to re-derive this stuff.)

PS: Much of this comes from this question and answers.

Going from an IPv4 address to an ASN in Python 2 with Unix brute force

By: cks
25 March 2026 at 02:45

For reasons, I've reached the point where I would like to be able to map IPv4 addresses into the organizations responsible for them, which is to say their Autonomous System Number (ASN), for use in DWiki, the blog engine of Wandering Thoughts. So today on the Fediverse I mused:

Current status: wondering if I can design an on-disk (read only) data structure of some sort that would allow a Python 2 program to efficiently map an IP address to an ASN. There are good in-memory data structures for this but you have to load the whole thing into memory and my Python 2 program runs as a CGI so no, not even with pickle.

(Since this is Python 2, about all I have access to is gdbm or rolling my own direct structure.)

Mapping IP addresses to ASNs comes up a lot in routing Internet traffic, so there are good in-memory data structures that are designed to let you efficiently answer these questions once you have everything loaded. But I don't think anyone really worries about on-disk versions of this information, while it's the case that I care about, although I only care about some ASNs (a detail I forgot to put in the Fediverse post).

Then I had a realization:

If I'm willing to do this by /24 (and I am) and represent the ASNs by 16-bit ints, I guess you can do this with a 32 Mbyte sparse file of two-byte blocks. Seek to a 16-byte address determined by the first three octets of the IP, read two bytes, if they're zero there's no ASN mapping we care about, otherwise they're the ASN in some byte order I'd determine.

If I don't care about the specific ASN, just a class of ASNs of interest of which there are at most 255, it's only 16 Mbytes.

(And if all I care about is a yes or know answer, I can represent each /24 by a bit, so the storage required drops even more, to only 2 Mbytes.)

This Fediverse post has a mistake. I thought ASNs were 16-bit numbers, but we've gone well beyond that by now. So I would want to use the one-byte 'class of ASN' approach, with ASNs I don't care about mapping to a class of zero. Alternately I could expand to storing three bytes for every /24, or four bytes to stay aligned with filesystem blocks.

That storage requirement is 'at most' because this will be a Unix sparse file, where filesystem blocks that aren't written to aren't stored on disk; when read, the data in them is all zero. The lookup is efficient, at least in terms of system calls; I'd open the file, lseek() to the position, and read two bytes (causing the system to read a filesystem block, however big that is). Python 2 doesn't have access to pread() or we could do it in one system call.

Within the OS this should be reasonably efficient, because if things are active much of the important bits of the mapping file will be cached into memory and won't have to be read from disk. 32 Mbytes is nothing these days, at least in terms of active file cache, and much of the file will be sparse anyway. The OS obviously has reasonably efficient random access to the filesystem blocks of the file, whether in memory or on disk.

This is a fairly brute force approach that's only viable if you're typically making a single query in your process before you finish. It also feels like something that is a good fit for Unix because of sparse files, although 16 Mbytes isn't that big these days even for a non-sparse file.

Realizing the brute force approach feels quite liberating. I've been turning this problem over in my mind for a while but each time I thought of complicated data structures and complicated approaches and it was clear to me that I'd never implement them. This way is simple enough that I could actually do it and it's not too impractical.

PS: I don't know if I'll actually build this, but every time a horde of crawlers descends on Wandering Thoughts from a cloud provider that has a cloud of separate /24s and /23s all over the place, my motivation is going to increase. If I could easily block all netblocks of certain hosting providers all at once, I definitely would.

(To get the ASN data there's pyasn (also). Conveniently it has a simple on-disk format that can be post-processed to go from a set of CIDRs that map to ASNs to a data file that maps from /24s to ASN classes for ASNs (and classes) that I care about.)

Update: After writing most of this entry I got enthused and wrote a stand-alone preliminary implementation (initially storing full ASNs in four-byte records), which can both create the data file and query it. It was surprisingly straightforward and not very much code, which is probably what I should have expected since the core approach is so simple. With four-byte records, a full data file of all recent routes from pyasn is about 53 Mbytes and the data file can be created in less than two minutes, which is pretty good given that the code writes records for about 16.5 million /24s.

(The whole thing even appears to work, although I haven't strongly tested it.)

Fedora's virt-manager started using external snapshots for me as of Fedora 41

By: cks
24 March 2026 at 02:51

Today I made an unpleasant discovery about virt-manager on my (still) Fedora 42 machines that I shared on the Fediverse:

This is my face that Fedora virt-manager appears to have been defaulting to external snapshots for some time and SURPRISE, external snapshots can't be reverted by virsh. This is my face, especially as it seems to have completely screwed up even deleting snapshots on some virtual machines.

(I only discovered this today because today is the first time I tried to touch such a snapshot, either to revert to it or to clean it up. It's possible that there is some hidden default for what sort of snapshot to make and it's only been flipped for me.)

Neither virt-manager nor virsh will clearly tell you about this. In virt-manager you need to click on each snapshot and if it says 'external disk only', congratulations, you're in trouble. In virsh, 'virsh snapshot-list --external <vm>' will list external snaphots, and then 'virsh snapshot-list --tree <vm>' will tell you if they depend on any internal snapshots.

My largest problems came from virtual machines where I had earlier internal snapshots and then I took more snapshots, which became external snapshots from Fedora 41 onward. You definitely can't revert to an external snapshot in this situation, at least not with virsh or virt-manager, and the error messages I got were generic ones about not being able to revert external snapshots. I haven't tested reverting external snapshots for a VM with no internal ones.

(Not being able to revert to external snapshots is a long standing libvirt issue, but it's possible they now work if you only have external snapshots. Otherwise, Fedora 41 and Fedora 42 defaulting to external snapshots is extremely hard to understand (to be polite).)

Update: you can revert an external snapshot in the latest libvirt if all of your snapshots are external. You can't revert them if libvirt helpfully gave you external snapshots on top of internal ones by switching the default type of snapshots (probably in Fedora 41).

If you have an external snapshot that you need to revert to, all I can do is point to a libvirt wiki page on the topic (although it may be outdated by now) along with libvirt's documentation on its snapshot XML. I suspect that there is going to be suffering involved. I haven't tried to do this; when it came up today I could afford to throw away the external snapshot.

If you have internal snapshots and you're willing to throw away the external snapshot and what's built on it, you can use virsh or virt-manager to revert to an internal snapshot and then delete the external snapshot. This leaves the external snapshot's additional disk file or files dangling around for you to delete by hand.

If you have only an external snapshot, it appears that libvirt will let you delete the snapshot through 'virsh snapshot-delete <vm> <external-snapshot>', which preserves the current state of the machine's disks. This only helps if you don't want the snapshot any more, but this is one of my common cases (where I take precautionary snapshots before significant operations and then get rid of them later when I'm satisfied, or at least committed).

The worst situation appears to be if you have an external snapshot made after (and thus on top of) an earlier internal snapshot and you to keep the live state of things while getting rid of the snapshots. As far as I can tell, it's impossible to do this through libvirt, although some of the documentation suggests that you should be able to. The process outlined in libvirt's Merging disk image chains didn't work for me (see also Disk image chains).

(If it worked, this operation would implicitly invalidate the snapshots and I don't know how you get rid of them inside libvirt, since you can't delete them normally. I suspect that to get rid of them, you need to shut down all of the libvirt daemons and then delete the XML files that (on Fedora) you'll find in /var/lib/libvirt/qemu/snapshot/<domain>.)

One reason to delete external snapshots you don't need is if you ever want to be able to easily revert snapshots in the future. I wouldn't trust making internal snapshots on top of external ones, if libvirt even lets you, so if you want to be able to easily revert, it currently appears that you need to have and use only internal snapshots. Certainly you can't mix new external snapshots with old internal snapshots, as I've seen.

(The 5.1.0 virt-manager release will warn you to not mix snapshot modes and defaults to whatever snapshot mode you're already using. I don't know what it defaults to if you don't have any snapshots, I haven't tried that yet.)

Sidebar: Cleaning this up on the most tangled virtual machine

I've tried the latest preview releases of the libvirt stuff, but it doesn't make a difference in the most tangled situation I have:

$ virsh snapshot-delete hl-fedora-36 fedora41-preupgrade
error: Failed to delete snapshot fedora41-preupgrade
error: Operation not supported: deleting external snapshot that has internal snapshot as parent not supported

This VM has an internal snapshot as the parent because I didn't clean up the first snapshot (taken before a Fedora 41 upgrade) before making the second one (taken before a Fedora 42 upgrade).

In theory one can use 'virsh blockcommit' to reduce everything down to a single file, per the knowledge base section on this. In practice it doesn't work in this situation:

$ virsh blockcommit hl-fedora-36 vda --verbose --pivot --active
error: invalid argument: could not find base image in chain for 'vda'

(I tried with --base too and that didn't help.)

I was going to attribute this to the internal snapshot but then I tried 'virsh blockcommit' on another virtual machine with only an external snapshot and it failed too. So I have no idea how this is supposed to work.

Since I could take a ZFS snapshot of the entire disk storage, I chose violence, which is to say direct usage of qemu-img. First, I determined that I couldn't trivially delete the internal snapshot before I did anything else:

$ qemu-img snapshot -d fedora40-preupgrade fedora35.fedora41-preupgrade
qemu-img: Could not delete snapshot 'fedora40-preupgrade': snapshot not found

The internal snapshot is in the underlying file 'fedora35.qcow2'. Maybe I could have deleted it safely even with an external thing sitting on top of it, but I decided not to do that yet and proceed to the main show:

$ qemu-img commit -d fedora35.fedora41-preupgrade
Image committed.
$ rm fedora35.fedora41-preupgrade

Using 'qemu-img info fedora35.qcow2' showed that the internal snapshot was still there, so I removed it with 'qemu-img snapshot -d' (this time on fedora35.qcow2).

All of this left libvirt's XML drastically out of step with the underlying disk situation. So I removed the XML for the snapshots (after saving a copy), made sure all libvirt services weren't running, and manually edited the VM's XML, where it turned out that all I needed to change was the name of the disk file. This appears to have worked fine.

I suspect that I could have skipped manually removing the internal snapshot and its XML and libvirt would then have been happy to see it and remove it.

(I'm writing all of the commands and results down partly for my future reference.)

Mass production's effects on the cheapest way to get some things

By: cks
23 March 2026 at 02:00

We have a bunch of networks in a number of buildings, and as part of looking after them, we want to monitor whether or not they're actually working. For reasons beyond the scope of this entry we don't do things like collect information from our switches through SNMP, so our best approach is 'ping something on the network in the relevant location'. This requires something to ping. We want that thing to be stable and always on the network, which typically rules out machines and devices run by other people, and we want it to run from standard wall power for various reasons.

You can imagine a bunch of solutions to this for both wired and wireless networks. There are lots of cheap little computers these days that can run Linux, so you could build some yourself or expect to find someone selling them pre-made. However, these are unlikely to be a mass produced volume product, and it turns out that the flipside of things only being cheap when there is volume is that if there is volume, unexpected things can be the cheapest option.

The cheapest wall-powered device you can put on your wireless network to ping these days turns out to be a remote controlled power plug intended for home automation (as a bonus it will report uptime information for you if you set it up right, so you can tell if it lost power recently). They can fail after a few years, but they're inexpensive so we consider them consumables. And if you have another device that turns out to be flaky and has to be power cycled every so often, you can reuse a 'wifi reachability sensor' for its actual remote power control capabilities.

Similarly, as far as we've found, the cheapest wall powered device that plugs into a wired Ethernet and can be given an IP address so it can be pinged is a basic five port managed switch. You give it a 'management IP', plug one port into the network, and optionally plug up its other four ports so no one uses it for connectivity (because it's a cheap switch and you don't necessarily trust it). You might even be able to find one that supports SNMP so you can get some additional information from it (although our current ones don't, as far as I can tell).

In both cases it's clear that these are cheap because of mass production. People are making lots of wireless remote controlled power plugs and five port managed switches, so right now you can get the switches for about $30 Canadian each and the power plugs for $10 Canadian. In both cases what we get is overkill for what we want, and you could do a simpler version that has a smaller, cheaper bill of materials (BOM). But that smaller version wouldn't have the volume so it would cost much more for us to get it or an approximation.

(Even if we designed and built our own, we probably can't beat the price of the wireless remote controlled power plugs. We might be able to get a cheaper BOM for a single-Ethernet simple computer with case and wall plug power supply, but that ignores staff time to design, program, and assemble the thing.)

At one level this makes me sad. We're wasting the reasonably decent capabilities of both devices, and it feels like there should be a more frugal and minimal option. But it's hard to see what it would be and how it could be so cheap and readily available.

A traditional path to getting lingering duplicate systems

By: cks
21 March 2026 at 20:36

In yesterday's entry I described a lingering duplicate system and how it had taken us a long time to get rid of it, but I got too distracted by the story to write down the general thoughts I had on how this sort of thing happens and keeps happening (also, the story turned out to be longer than I expected). We've had other long running duplicate systems, and often they have more or less the same story as yesterday's disk space usage tracking system.

The first system built is a basic system. It's not a bad system, but it's limited and you know it. You can only afford to gather disk usage information once a day and you have nowhere to put it other than in the filesystem, which makes it easy to find and independent of anything else but also stops it updating when the filesystem fills up. Over time you may improve this system (cheaper updates that happen more often, a limited amount of high resolution information), but the fundamental issues with it stick around.

After a while it becomes possible to build a different, better system (you gather disk usage information every few minutes and put it in your new metrics system), or maybe you just realize how to do a better version from scratch. But often the initial version of this new system has its own limitations or works a bit differently or both, or you've only implemented part of what you'd need for a full replacement of the first system. And maybe you're not sure it will fully work, that it's really the right answer, or if you'll be able to support it over the long term (perhaps the cardinality of the metrics will be too overwhelming).

(You may also be wary of falling victim to the "second system effect", since you know you're building a second system.)

Usually this means that you don't want to go through the effort and risk of immediately replacing the old system with the new system (if it's even immediately possible without more work on the new system). So you use the new system for new stuff (providing dashboards of disk space usage) and keep the old system for the old stuff (the officially supported commands that people know). The old system is working so it's easier to have it stay "for now". Even if you replace part of the use of the old system with the new system, you don't replace all of it.

(If your second system started out as only a partial version of the old system, you may also not be pushed to evolve it so that it could fully replace the old system, or that may only happen slowly. In some ways this is a good thing; you're getting practical experience with the basic version of the new system rather than immediately trying to build the full version. This is a reasonable way to avoid the "second system effect", and may lead you to find out that in the new system you want things to operate differently than the old one.)

Since both the old system and the new system are working, you now generally have little motivation to do more work to get rid of the old system. Until you run into clear limitations of the old system, moving back to only having one system is (usually) cleanup work, not a priority. If you wanted to let the new system run for a while to prove itself, it's also easy to simply lose track of this as a piece of future work; you won't necessarily put it on a calendar, and it's something that might be months or a year out even in the best of circumstances.

(The times when the cleanup is a potential priority are when the old system is using resources that you want back, including money for hardware or cloud stuff, or when the old system requires ongoing work.)

A contributing factor is that you may not be sure about what specific behaviors and bits of the old system other things are depending on. Some of these will be actual designed features that you can perhaps recover from documentation, but others may be things that simply grew that way and became accidentally load bearing. Figuring these out may take careful reverse engineering of how the system works and what things are doing with it, which takes work, and when the old system is working it's easier to leave it there.

Lingering duplicate systems and the expense of weeding them out (an illustration)

By: cks
21 March 2026 at 03:05

We have been operating a fileserver environment for a long time now, back before we used ZFS. When you operate fileservers in a traditional general Unix environment, one of the things you need is disk usage information. So a very long time ago, before I even arrived, people built a very Unix-y system to do this. Every night, raw usage information was generated for each filesystem (for a while with 'du'), written to a special system directory in the filesystem, and then used to create a text file with a report showing currently usage and the daily and weekly change in everyone's usage. A local 'report disk usage' script would then basically run your pager on this file.

After a while, we we able to improve this system by using native ZFS commands to get per-user 'quota' usage information, which made it much faster than the old way (we couldn't do this originally because we started with ZFS before ZFS tracked this information). Later, this made it reasonable to generate a 'frequent' disk usage report every fifteen minutes (with it keeping a day's worth of data), which could be helpful to identify who had suddenly used a lot of disk space; we wrote some scripts to use this information, but never made them as public as the original script. However, all of this had various limitations, including that it stopped updating once the filesystem had filled up.

Shortly after we set up our Prometheus metrics system and actually had a flexible metrics system we could put things into, we started putting disk space usage information into it, giving us more fine grained data, more history (especially fine grained history, where we'd previously only had the past 24 hours), and the ability to put it into Grafana graphs on dashboards. Soon afterward it became obvious that sometimes the best way to expose information is through a command, so we wrote a command to dump out current disk usage information in a relatively primitive form.

Originally this 'getdiskusage' command produced quite raw output because it wasn't really intended for direct use. But over time, people (especially me) kept wanting more features and options and I never quite felt like writing some scripts to sit on top of it when I could just fiddle the code a bit more. Recently, I added some features and tipped myself over a critical edge, where it felt like I could easily re-do the old scripts to get their information from 'getdiskusage' instead of those frequently written files. One thing led to another and so now we have some new documentation and new (and revised) user-visible commands to go with it.

(The raw files were just lines of 'disk-space login', and this was pretty close to what getdiskusage produced already in some modes.)

However, despite replacing the commands, we haven't yet turned off the infrastructure on our fileservers that creates and updates those old disk usage files. Partly this is because I'd want to clean up all the existing generated files rather than leave them to become increasingly out of date, and that's a bit of a pain, and partly it's because of inertia.

Inertia is also a lot of why it took so long to replace the scripts. We've had the raw capability to replace them for roughly six years (since 'getdiskusage' was written, demonstrating that it was easily possible to extract the data from our metrics system in a usable form), and we'd said to each other that we wanted to do it for about that long, but it was always "someday". One reason for the inertia was that the existing old stuff worked fine, more or less, and also we didn't think very many people used it very often because it wasn't really documented or accessible. Perhaps another reason was that we weren't entirely sure we wanted to commit to the new system, or at least to exact form we first implemented our disk space metrics in.

DMARC DNS record inheritance and DMARC alignment requirements

By: cks
20 March 2026 at 02:56

To simplify, DMARC is based on the domain in the 'From:' header, and what policy (if any) that domain specifies. As I've written about (and rediscovered) more than once (here and here), DMARC will look up the DNS record for the DMARC policy in exactly one of two places, either in the exact From: domain or on the organization's top level domain. In other words, if a message has a From: of 'someone@breaking.news.example.org', a receiver will first look for a DMARC TXT DNS record with the name _dmarc.breaking.news.example.org and then one with the name _dmarc.example.org.

(But there will be no lookup for _dmarc.news.example.org.)

DMARC also has the concept of policy inheritance, where the example.org DMARC DNS TXT record can specify a different DMARC policy for the organizational domain than for subdomains that don't have their own policy. For example, example.org could specify 'p=reject; sp=none' to say that 'From: user@example.org' should be rejected if it fails DMARC but it has no views on a default for 'From: user@news.example.org'.

If you're an innocent person, you might think that if your organization has 'sp=none' on its organization policy, you don't have to be concerned about the DMARC (and DKIM, and SPF) behavior of sub-names that don't have their own DMARC records, including hosts that send as 'From: local-account@host.dept.example.org'. Your organizational policy says 'sp=none', meaning don't do anything with sub-names for DMARC, and surely everyone will follow that.

This is unfortunately not quite true in an environment where people care about DKIM results regardless of DMARC policy settings. The problem is DKIM (and SPF) alignment. Under relaxed DKIM alignment, a 'From: flash@eng.news.example.org' would pass if it's DKIM signed by anything in example,org, for example 'eng.example.org'. Under strict DKIM alignment, it must be signed specifically by 'eng.news.example.org'.

The choice of what DKIM alignment to require is not a 'policy' and is not covered by 'p=' or 'sp=' in DMARC DNS TXT records. It's instead covered by a separate parameter, 'adkim=', and there is no 'sadkim=' parameter that only applies to subdomains. This means that there's no way for example.org to change the alignment policy for just 'From: user@example.org'; the moment they set 'adkim=s' in the _dmarc.example.org DNS TXT record, all sub-names without their own _dmarc.<whatever> records also switch to strict DKIM alignment. Even if the top level domain specifies 'sp=none', various mail systems out there may actively reject your mail because they no longer consider it properly aligned or increase their suspicion score a bit due to the lack of alignment (in some views your mail went from 'properly DKIM signed' to 'not properly DKIM signed').

The only way to deal with this is the same as with policy inheritance. Any host or domain name within your (sub-)organization that appears in From: headers must have its own valid DMARC DNS TXT record. If you want strict DKIM alignment you need to set that as 'adkim=s'. If you want relaxed alignment in theory that's the default but you might find it clearer to explicitly set 'adkim=r' (and probably 'aspf=r', also for clarity).

(Setting alignment explicitly makes it clear to other people and future you that you're deliberately choosing an alignment that might wind up different from your top level organizational alignment.)

PS: As far as I can see this is the behavior the DMARC RFC implicitly requires for all DMARC settings other than 'p=' (which has the 'sp=' version), but I could be wrong and missing something.

One problem with (Python) docstrings is that they're local

By: cks
19 March 2026 at 02:40

When I wrote about documenting my Django forms, I said that I knew I didn't want to put my documentation in docstrings, because I'd written some in the past and then not read it this time around. One of the reasons for that is that Python docstrings have to be attached to functions, or more generally, Python docstrings have to be scattered through your code. The corollary to this is that to find relevant docstrings you have to read through your code and then remember which bits of it are relevant to what you're wondering about.

When your docstring is specifically about the function you already know you want to look at, this is fine. Docstrings work perfectly well for local knowledge, for 'what is this function about' summaries that you want to read before you delve into the function. I feel they work rather less well for finding what function you want to look at (ideally you want some sort of skimmable index for that); if you have to read docstrings to find a function, you're going to be paging through a lot of your code until you hit the right docstring.

This is also why I feel docstrings are a bad fit for documenting my Django forms. Even if I attach them to the Python functions that handle each particular form, the resulting documentation is going to be mingled with my code and spread all through it. Not only is there no overview, but I'd have to skip around my code as I read about how one form interacts with another; there's no single place where I can read about the flow of forms, one leading to another.

(This is the case even if all of the form handling functions are in one spot with nothing between them, because the docstrings will be split up by the code itself and the comments in the code.)

Another issue is that sensible docstrings can only be so big, because they separate the function's 'def' statement from its actual code. You don't want those two too far apart, which pushes docstrings toward being relatively concise. My feeling is that if I have a lot to say about what the function is used for or how it relates to other things, I can't really put it in a docstring. I usually put it in a comment in front of the function (which means that some of my Python code has a mixture of comments and docstrings). The less a function can be described purely by itself (and concisely), the more its docstring is going to sprawl and the more awkward that gets.

(Docstrings on functions are also generally seen as what I could call external documentation, written for people who might want to call the function and understand how it relates to other functions they might also use. Comments are the usual form of internal documentation that you want at hand while reading the function's code.)

It's conventional to say that docstrings are documentation for what they're on. I think it's better to say that docstrings are summaries. Some things can be described purely through summaries (with additional context that the programmer is assumed to have), but not everything can be.

(Comments before a function are also local to some degree, but they intrude less on the function's code since they don't put themselves between 'def' and the rest of things.)

Wayland has good reasons to put the window manager in the display server

By: cks
18 March 2026 at 02:26

I recently ran across Isaac Freund's Separating the Wayland Compositor and Window Manager (via), which is excellent news as far as I'm concerned. But in passing, it says:

Traditionally, Wayland compositors have taken on the role of the window manager as well, but this is not in fact a necessary step to solve the architectural problems with X11. Although, I do not know for sure why the original Wayland authors chose to combine the window manager and Wayland compositor, I assume it was simply the path of least resistance. [...]

Unfortunately, I believe that there are excellent reasons to put the window manager into the display server the way Wayland has, and the Wayland people (who were also X people) were quite familiar with them and how X has had problems over the years because of its split.

One large and more or less core problem is that event handling is deeply entwined with window management. As an example, consider this sequence of (input) events:

  1. your mouse starts out over one window. You type some characters.
  2. you move your mouse over to a second window. You type some more characters.
  3. you click a mouse button without moving the mouse.
  4. you type more characters.

Your window manager is extremely involved in the decisions about where all of those input events go and whether the second window receives a mouse button click event in the third step. If the window manager is separate from whatever is handling input events, either some things trigger synchronous delays in further event handling or sufficiently fast typeahead and actions are in a race with the window manager to see if it handles changes in where future events should go fast enough or if some of your typing and other actions are misdirected to the wrong place because the window manager is lagging.

Embedding the window manager in the display server is the simple and obvious approach to insuring that the window manager can see and react to all events without lag, and can freely intercept and modify all events as it wishes without clients having to care. The window manager can even do this using extremely local knowledge if it wants. Do you want your window manager to have key bindings that only apply to browser windows, where the same keys are passed through to other programs? An embedded window manager can easily do that (let's assume it can reliably identify browser windows).

(An outdated example of how complicated you can make mouse button bindings, never mind keyboard bindings, is my mouse button bindings in fvwm.)

X has a collection of mechanisms that try to allow window managers to manage 'focus' (which window receives keyboard input), intercept (some) keys at a window manager level, and do other things that modify or intercept events. The whole system is complex, imperfect, and limited, and a variety of these mechanisms have weird side effects on the X events that regular programs receive; you can often see this with a program such as xev. Historically, not all X programs have coped gracefully with all of the interceptions that window managers like fvwm can do.

(X also has two input event systems, just to make life more complicated.)

X's mechanisms also impose limits on what they'll allow a window manager to do. One famous example is that in X, mouse scroll wheel events always go to the X window under the mouse cursor. Even if your window manager uses 'click (a window) to make it take input', mouse scroll wheel input is special and cannot be directed to a window this way. In Wayland, a full server has no such limitations; its window manager portion can direct all events, including mouse scroll wheels, to wherever it feels like.

(This elaborates on a Fediverse post of mine.)

Cleaning old GPG RPM keys that your Fedora install is keeping around

By: cks
17 March 2026 at 01:56

Approximately all RPM packages are signed by GPG keys (or maybe they're supposed to be called PGP keys), which your system stores in the RPM database as pseudo-packages (because why not). If your Fedora install has been around long enough, as mine have, you will have accumulated a drift of old keys and sometimes you either want to clean them up or something unfortunate will happen to one of those keys (I'll get to one case for it).

One basic command to see your collection of GPG keys in the RPM database is (taken from this gist):

rpm -q gpg-pubkey --qf '%{NAME}-%{VERSION}-%{RELEASE}\t%{SUMMARY}\n'

On some systems this will give you a nice short list of keys. On others, your list may be very long.

Since Fedora 42 (cf), DNF has functionality (I believe more or less built in) that should offer to remove old GPG keys that have actually expired. This is in the 'expired PGP keys plugin' which comes from the 'libdnf5-plugin-expired-pgp-keys' if you don't have it installed (with a brief manpage that's called 'libdnf5-expired-pgp-keys'). I believe there was a similar DNF4 plugin. However, there are two situations where this seems to not work correctly.

The first situation is now-obsolete GPG keys that haven't expired yet, for various reasons; these may be for past versions of Fedora, for example. These days, the metadata for every DNF repository you use should list a URL for its GPG keys (see the various .repo files in /etc/yum.repos.d/ and look for the 'gpgkey=' lines). So one way to clean up obsolete keys is to fetch all of the current keys for all of your current repositories (or at least the enabled ones), and then remove anything you have that isn't among the list. This process is automated for you by the 'clean-rpm-gpg-pubkey' command and package, which is mentioned in some Fedora upgrade instructions. This will generally clean out most of your obsolete keys, although rare people will have keys that are so old that it chokes on them.

The second situation is apparently a repository operator who is sufficiently clever to have re-issued an expired key using the same key ID and fingerprint but a new expiry date in the future; this fools RPM and related tools and everything chokes. This is unfortunate, since it will often stall all DNF updates unless you disable the repo. One repository operator who has done this is Google, for their Fedora Chrome repository. To fix this you'll have to manually remove the relevant GPG key or keys. Once you've used clean-rpm-gpg-pubkey to reduce your list of GPG keys to a reasonable level, you can use the RPM command I showed above to list all your remaining keys, spot the likely key or keys (based on who owns it, for example), and then use 'rpm -e --allmatches gpg-pubkey-d38b4796-570c8cd3' (or some other appropriate gpg-pubkey name) to manually scrub out the GPG key. Doing a DNF operation such as installing or upgrading a package from the repository should then re-import the current key.

(This also means that it's theoretically harmless to overshoot and remove the wrong key, because it will be fetched back the next time you need it.)

(When I wrote my Fediverse post about discovering clean-rpm-gpg-pubkey, I apparently thought I would remember it without further prompting. This was wrong, and in fact I didn't even remember to use it when I upgraded my home desktop. This time it will hopefully stick, and if not, I have it written down here where it will probably be easier to find.)

Making empirical decisions about web access (here in 2026)

By: cks
16 March 2026 at 02:12

Recently, Denis Warburton wrote in a comment on my entry on how HTTP results today depend on what HTTP User-Agent you use:

Making decisions based on user-provided information is unwise in 2026. The originating ip address is the only source of "truth" ... and even then, that information needs to be further examined before discerning whether or not it is a valid piece of communication.

It's absolutely true that everything except the source IP address is under the control of an attacker (and it always has been), and in one sense you can't trust it. But this doesn't mean you can't use information that's under the attacker's control in making decisions about whether to allow access to something; instead, it means that you have to be thoughtful about how you use the information and what for.

In practice, web agents emit a lot of data in their HTTP headers and requests. Some of these signals are complicated, such as browser version numbers, and some of them require work to use, but this doesn't mean that there's no signal at all that can be derived from all of the data that a web agent emits. For example, consider a web agent that uses the HTTP User-Agent of:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

This web agent is telling you that it's claiming to be Googlebot. Under the right circumstances this can be a valuable signal of malfeasance and worth denying access.

Similarly, a web agent that emits user agent hints while its HTTP User-Agent is claiming to be an authentic version of Firefox 147 is giving you the signal that it's not an unaltered, standard version of Firefox, because standard versions of Firefox 147 don't do that. It's most likely something built on Chromium, but in any case you might decide that this signal means it is suspicious enough to be denied access. Neither the User-Agent nor the Sec-CH-UA headers create true facts to definitively identify the browser and both could be faked by the attacker, but the inconsistency is real.

What an attacker tells you (deliberately or accidentally) is a signal, and it's up to you to interpret and use that signal (which I think you should these days). This is an empirical thing, something that depends on the surrounding environment (for example, you have to interpret the attacker's signal in terms of its difference from the signals of legitimate visitors), what you're doing, and what you care about, but then security is always ultimately people, not math, even though tech loves to avoid this sort of empiricism (which is a bad thing).

As a pragmatic thing, it's usually easier to use attacker signals if you allow things by default rather than deny them by default. If you allow by default, your primary concern is false positives (legitimate visitors who are emitting signals you find too suspicious), rather than false negatives, because an attacker that wants to work hard enough can always obtain access. Conveniently, public web sites (such as Wandering Thoughts) are exactly such an allow by default environment, which is why these days I use a lot of signals here when deciding what to accept or block (including IP addresses and networks).

(If you need a deny by default environment with real security, you need to use something that attackers can't fake. IP addresses can be one option in the right circumstances, but they aren't the only one.)

I think dependency cooldowns would be a good idea for Go

By: cks
14 March 2026 at 20:26

Via Filippo Valsorda, I recently heard about a proposal to add dependency cooldowns to Go. The general idea of dependency cooldowns is to make it so that people don't immediately update to new versions of dependencies; instead, you wait some amount of time for people to inspect the new version and so on (either through automated tooling or manual work). Since one of Go's famous features is 'minimum version selection', you might think that a cooldown would be unnecessary, since people have to manually update the version of dependencies anyway and don't automatically get them.

Unfortunately, this is not the actual observed reality. In the actual observed reality, people update dependency versions fast enough to catch out other people who change what a particular version is of a module they publish. This seems to be in part from things like 'Dependabot' automatically cruising around looking for version updates, but in general it seems clear that some amount of people will update to new versions of dependencies the moment those new versions become visible to them. And if a dependency is used widely enough, through random chance there's pretty much always going to be a developer somewhere who is running 'go list -m -u all' right after a new version of the package is released. So I feel that some sort of a cooldown would be useful in practice, even with Go's other protections.

I follow the VCS repositories of a fair number of Go projects, and a lot of their dependency updates are automated, through things like Dependabot. If these things supported dependency cooldowns and if people turned that on, we might get a lot of the benefit without Go's own mechanisms having to add code to support this. On the other hand, not everyone uses Dependabot or equivalent features (especially if people migrate away from Github, as some are) and there's always going to be people checking and doing dependency updates by hand. To support them, we need assistance from tooling.

(In theory this tooling assistance could be showing how old a version is then leaving it up to people to notice and decide, but in practice I feel that's abrogating responsibilities. We've seen that show before; easy support and defaults matter.)

While I don't have any strong or well informed opinions on how this should be implemented in Go, I do feel that both defaults and avoiding mistakes are important. This biases me towards, say, a setting for this in your go.mod, because then that way it's automatically persistent and everyone who works on your project gets it applied automatically, unlike (for example) an environment variable that you have to make sure everyone has set.

(This elaborates on some badly phrased thoughts I posted on the Fediverse.)

On today's web, HTTP results depend on the HTTP User-Agent you use

By: cks
14 March 2026 at 03:27

Back in the old days, search engines mostly crawled your sites with their regular, clearly identifying HTTP User-Agent headers, but once in a while they would switch up to fetching with a browser's User-Agent. What they were trying to detect was if you served one set of content to "Googlebot" but another set of content to "Firefox", and if you did they tended to penalize you; you were supposed to serve the same content to both, not SEO-bait to Googlebot and wall to wall ads to browsers. Googlebot identified itself as a standard courtesy, not so you could handle it differently.

Obviously those days are long over. It's now routine and fully accepted to serve different things to Googlebot and to regular browsers. Generally websites offer Googlebot more access and plain text, and browsers less access (even paywalls) and JavaScript encrusted content (leading to people setting their User-Agent to Googlebot to bypass paywalls). Since people give Googlebot special access, people impersonate it and other well accepted crawlers and other people (like me) block that impersonation.

This is part of an increasingly common general pattern, which is that different HTTP User-Agents get different results for the same URL. Especially, some HTTP User-Agents will get errors, HTTP redirections, or challenge pages, and other User-Agents won't; instead they'll get the real content. What this means in concrete terms is it's increasingly bad to take the results from one HTTP User-Agent and assume they apply for another. This isn't just me and Wandering Thoughts; for example, if a site has a standard configuration of Anubis, having a User-Agent that includes 'Mozilla' will cause you to get a challenge page instead of the actual page (cf).

(One of the amusing effects of this is what it does to 'link previews', which require the website displaying the preview to fetch a copy of the URL from the original site. On the Fediverse, fairly often the link preview I see is just some sort of a challenge page.)

In practice, you're probably reasonably safe if you're doing close variations of what's fundamentally the same distinctive User-Agent. But you're living dangerously if you try this with browser-like User-Agent values, either two different ones or a browser-like User-Agent and a distinctive non-browser one, because those are the ones that are most frequently forged and abused by covert web crawlers and other malware. Everyone who wants to look normal is imitating a browser, which means looking like a browser is a bad idea today.

Unfortunately, however bad an idea it is, people seem to keep trying fetches with multiple User-Agent header values and then taking a result from one User-Agent and using it in the context of another. Especially, feed reader companies seem to do it, first Feedly and now Inoreader.

You (I) should document the forms of your Django web application

By: cks
13 March 2026 at 03:18

We have a long-standing Django web application to handle (Unix) account requests. Since these are requests, there is some state involved, so for a long time a request could be pending, approved, or rejected, with the extra complexity that an approved request might be incomplete and waiting on the person to pick their login. Recently I added being able to put a request into a new state, 'held', in order to deal with some local complexities where we might have a request that we didn't want to delete but also didn't want to go through to create an account.

(For instance, it's sometimes not clear if new incoming graduate students who've had to defer their arrival are going to turn up later or wind up not coming at all. So now we can put their requests on hold.)

When I initially wrote the new code, I though that this new 'held' status was relatively weak, and in particular that professors (who approve accounts) could easily take an account request out of 'held' status and approve it. At the time I decided that this was probably a feature, since a professor might know that one of their graduate students was about to turn up after all and this way they didn't have to get us to un-hold the account request. Then the other day we sort of wanted to hold an account request even against the professor involved approving it, and because I knew that the 'held' status was weak this way, I didn't bother trying.

Well, it turns out I was wrong. Because I had forgotten how our forms worked, I hadn't realized that my new 'held' status was less 'held' and more 'frozen', and I only learned better today because I took a stab at creating a real 'frozen' status. In the current state, while it's possible for professors to deliberately un-hold a request, it takes a certain amount of work to find the one obscure place it's possible and you can't do it by accident (and it would be easy to close that possibility off if we decided to). You definitely can't accidentally approve a request that's currently held without realizing it.

(So my admittedly modest amount of work to add a 'frozen' status was sort of wasted, although it did lead to greater understanding in the end.)

Past me, immersed in the application, presumably found all of the rules about who could see what form and what they showed to be obvious (at least in context). Present me is a long distance from past me and did not remember all of those things. Brief documentation on each form would have been really quite handy, and if I'm smart I'll spend some time this time around to write some.

I'm not sure where I'll put any new forms documentation. Probably not in our views.py, which is already big enough. I could put it in urls.py, or I could write a separate README.forms file that doesn't try to embed this in code. And I know that I don't want to put it in Python docstrings, because I wrote some things in Python docstrings on the existing forms functions and then didn't read them. Even if I had read them, the existing docstrings don't entirely cover the sort of information I now know I want to know.

(I think there's a good reason for my not reading my own docstrings, but that's for another entry.)

UEFI-only booting with GRUB has gone okay on our (Ubuntu 24.04) servers

By: cks
12 March 2026 at 01:24

We've been operating Ubuntu servers for a long time and for most of that time we've booted them through traditional MBR BIOS boots. Initially it was entirely through MBR and then later it was still mostly through MBR (somewhat depending on who installed a particular server; my co-workers are more tolerant of UEFI than I am). But when we built the 24.04 version of our customized install media, my co-worker wound up making it UEFI only, and so for the past two years all of our 24.04 machines have been UEFI (with us switching BIOSes on old servers into UEFI mode as we updated them). The headline news is that it's gone okay, more or less as you'd expect and hope by now.

All of our servers have mirrored system disks, and the one UEFI thing we haven't really had to deal with so far is fixing Ubuntu's UEFI boot disk redundancy stuff after one disk fails. I think we know how to do it in theory but we haven't had to go through it in practice. It will probably work out okay but it does make me a bit nervous, along with the related issue that the Ubuntu installer makes it hard to be consistent about which disk your '/boot/efi' filesystem comes from.

(In the installer, /boot/efi winds up on the first disk that you set as the boot device, but the disks aren't always presented in order so you can do this on 'the first disk' in the installer and discover that the first disk it listed was /dev/sdb.)

The Ubuntu 24.04 default bootloader is GRUB, so that's what we've wound up with even though as a UEFI-only environment we could in theory use simpler ones, such as systemd-boot. I'm not particularly enthused about GRUB but in practice it does what we want, which is to reliably boot our servers, and it has the huge benefit that it's actively supported by Ubuntu (okay, Canonical) so they're going to make sure it works right, including with their UEFI disk redundancy stuff. If Ubuntu switches default UEFI bootloaders in their server installs, I expect we'll follow along.

(I don't know if Canonical has any plans to switch away from GRUB to something else. I suspect that they'll stick with GRUB for as long as they support MBR booting, which I suspect will be a while, especially as people look more and more likely to hold on to old hardware for much longer than normally expected.)

PS: One reason I'm writing this down is that I've been unenthused about UEFI for a long time, so I'm not sure I would have predicted our lack of troubles in advance. So I'm going to admit it, UEFI has been actually okay. And in its favour, UEFI has regularized some things that used to be pretty odd in the MBR BIOS era.

(I'm still not happy about the UEFI non-story around redundant system disks, but I've accepted that hacks like the Ubuntu approach are the best we're going to get. I don't know what distributions such as Fedora are doing here; my Fedora machines are MBR based and staying that way until the hardware gets replaced, which on current trends won't be any time soon.)

The story of one of my worst programming failures

By: cks
11 March 2026 at 01:58

Somewhat recently, GeePaw Hill shared the story of what he called his most humiliating experience as a skilled and successful computer programmer. It's an excellent, entertaining story with a lesson for all of us, so I urge you to read it. Today I'm going to tell the story of one of my great failures, where I may have quietly killed part of a professor's research project by developing on a too-small machine.

Once upon a time, back when I was an (advanced) undergraduate, I was hired as a part time research programmer for a Systems professor to work on one of their projects, at first with a new graduate student and then later alone (partly because the graduate student switched from Systems to HCI). One of this professor's research areas was understanding and analyzing disk IO patterns (a significant research area at the time), and my work was to add detailed IO tracing to the Ultrix kernel. Some of this was porting work the professor had done with the 4.x BSD kernel (while a graduate student and postdoc) into the closely related, BSD-derived Ultrix kernel, but we extended the original filesystem level tracing down all the way to capturing block IO traces (still specifically attributed to filesystem events).

We were working on Ultrix because my professor had a research and equipment grant from DEC. DEC was interested in this sort of information for improving the IO performance of the Ultrix kernel, and part of the benefit of working with DEC was that DEC could arrange for us to get IO traces from real customers with real workloads, instead of university research system workloads. Eventually the modified kernel worked, gathered all the data that we wanted (and gave us some insights even on our systems), and was ready for the customer site. We talked to DEC and it was decided that the best approach was that I would go down to Boston with the source code, meet with the DEC people involved, we'd build a kernel for the customer's setup, and then I'd go with the DEC people to the customer site to actually boot into it and turn the tracing on.

Very shortly after we booted the new kernel on the customer's machine and turned tracing on, the kernel paniced. It was a nice, clear panic message from my own code, basically an assertion failure, and what it said was more or less 'disk block number too large to fit into data field'. I looked at that and had a terrible sinking feeling.

This was long enough ago (with small enough disks) that having very compact trace data was extremely important, especially at the block IO layer (where we were generating a lot of trace records). As a result, I'd carefully designed the on-disk trace records to be as small as possible. As part of that I'd tried to cut down the size of fields to be only as big as necessary, and one of the fields I'd minimized was the disk block address of the IO. My minimized field was big enough for the block addresses on our Ultrix machines (donated by DEC), with not very big disks, but it was now obviously too small for the bigger disks that the company had bought from DEC for their servers. In a way I was lucky that I'd taken the precaution of putting in the size check that paniced, because otherwise we could have happily wasted time collecting corrupted traces with truncated block addresses.

(All of this was long enough ago that I can't remember how small the field was, although my mind wants to say 24 bits. If it was 24 bits, I had to be using 4 Kbyte filesystem block addresses, not 512-byte sector addresses.)

Once I saw the panic message, both the mistake and the fix were obvious, and the code and so on were well structured enough that it was simple to make the change; I could almost have done it on the spot (or at least while in Boston). But, well, you only get one kernel panic from your new "we assure you this is going to work" kernel on a customer's machine, especially if you only have one evening to gather your trace data and you can't rebuild a kernel from source while at the customer's site, so the DEC people and I had to pack up and go back empty handed. Afterward, I flew back to Toronto from Boston, made the simple change, and tested everything. But I never went back to Boston for another visit with DEC, and I don't think that part of my professor's research projects went anywhere much after that.

(My visit to Boston and its areas did feature getting driven around at somewhat unnervingly fast speeds on the Massachusetts Turnpike in the sports car of one of the DEC people involved.)

So that's the story of how I may have quietly killed one of my professor's research projects by developing on a too-small machine.

(That's obviously not the only problem. When I was picking the field size, I could have reached out somehow to ask how big DEC's disks got, or maybe ran the field size past my professor to see if it made sense. But I was working alone and being trusted with all of this, and I was an undergraduate, although I had significant professional programming experience by then.)

Sidebar: Fixing an earlier spectacular failure

(All of the following is based on my fallible memory.)

The tracing code worked by adding trace records to a buffer in memory and then writing out the buffer to the trace file when it was necessary. The BSD version of the code that I started with (which traced only filesystem level IO) did this synchronously, created trace records even for writing out the trace buffer, and didn't protect itself against being called again. A recursive call would deadlock but usually it all worked because you didn't add too many new trace records while writing out the buffer.

(Basically, everything that added a trace record to the buffer checked to see if the buffer was too full and if it was, immediately called the 'flush the trace buffer' code.)

This approach blew up spectacularly when I added block IO tracing; the much higher volume of records being added made deadlocks relatively common. The whole approach to writing out the trace buffer had to change completely, into a much more complex one with multiple processes involved and genuinely asynchronous writeout. I still have a vivid memory of making this relatively significant restructuring and then doing a RCS ci with a commit message that included a long, then current computing quote about replacing one set of code with known bugs with a new set of code with new unknown ones.

(At this remove I have no idea what the exact quote was and I can't find it in a quick online search. And unfortunately the code and its RCS history is long since gone.)

Power glitches can leave computer hardware in weird states

By: cks
10 March 2026 at 03:58

Late Friday night, the university's downtown campus experienced some sort of power glitch or power event. A few machines rebooted, a number of machines dropped out of contact for a bit (which probably indicates some switches restarting), and most significantly, some of our switches wound up in a weird, non-working state despite being powered on. This morning we cured the situation by fully power cycling all of them.

This isn't the first time we've seen brief power glitches leave things in unusual states. In the past we've seen it with servers, with BMCs (IPMIs), and with switches. It's usually not every machine, either; some machines won't notice and some will. When we were having semi-regular power glitches, there were definitely some models of server that were more prone to problems than others, but even among those models it usually wasn't universal.

It's fun to speculate about reasons why some particular servers of a susceptible model would survive and others not, but that's somewhat beside today's point, which is that power glitches can get your hardware into weird states (and your hardware isn't broken when and because this happens; it can happen to hardware that's in perfectly good order). We'd like to think that the computers around us are binary, either shut off entirely or working properly, but that clearly isn't the case. A power glitch like this peels back the comforting illusion to show us the unhappy analog truth underneath. Modern computers do a lot of work to protect themselves from such analog problems, but obviously it doesn't always work completely.

(My wild speculation is that the power glitch has shifted at least part of the overall system into a state that's normally impossible, and either this can't be recovered from or the rest of the system doesn't realize that it has to take steps to recover, for example forcing a full restart. See also flea power, where a powered off system still retains some power, and sometimes this matters.)

PS: We've also had a few cases where power cycling the hardware wasn't enough, which is almost certainly flea power at work.

PPS: My steadily increasing awareness of the fundamentally analog nature of a lot of what I take as comfortably digital has come in part from exposure on the Fediverse to people who deal with fun things like differential signaling for copper Ethernet, USB, and PCIe, and the spooky world of DDR training, where very early on your system goes to some effort to work out the signal characteristics of your particular motherboard, RAM, and so on so that it can run the RAM as fast as possible (cf).

(Never mind all of the CPU errata about unusual situations that aren't quite handled properly.)

If there are URLs in your HTTP User-Agent, they should exist and work

By: cks
9 March 2026 at 02:18

One of the things people put in their HTTP User-Agent header for non-browser software is a URL for their software, project, or whatever (I'm all for this). This is a a good thing, because it allows people operating web servers to check out who and what you are and decide for themselves if they're going to allow it. Increasingly (and partly for social reasons), I block many 'generic' User-Agent values that come to my attention, for example through their volume.

(I don't block all of them, but if your User-Agent shows up and I can't figure out what it is and whether or not it's legitimate and used by real people, that's probably a block.)

However, there's an important and obvious thing about any URLs in your HTTP User-Agent, which is that they should actually work. The domain or host should exist, the URL should exist in the web server, and the URL's contents should actually explain the software, project, or organization involved. Plus, if you use a HTTPS website, the TLS certificate should be valid.

(A related thing is a generic URL that doesn't give me anything to go on. For example, your URL on a code forge, and either it's not obvious which one of your repositories is doing things or you don't have any public repositories.)

For me, a non-working URL is much more suspicious than a missing URL. HTTP User-Agents without URLs are reasonably common (especially in feed readers), so I don't find them immediately suspicious. Non-working URLs in mysterious User-Agents certainly look like you're attempting to distract me with the appearance of a proper web agent but without the reality of it. If a User-Agent with such a non-working URL comes to my attention, I'm very likely to block it in some way (unless it's very clear that it's a legitimate program used by real people, and it merely has bad habits with its User-Agent).

You would think that people wouldn't make this sort of mistake, but I regret to say that I've seen it repeatedly, in all of the variations. One interesting version I've seen is User-Agent strings with the various 'example.<TLD>' domains in their URLs. I suspect that this comes from software that has some sort of 'operator URL' setting and provides a default value if you don't set one explicitly. I've also seen .lan and .local URLs in User-Agents, which takes somewhat more creativity.

As usual, my view is that software shouldn't provide this sort of default value; instead, it should refuse to work until you configure your own value. However, this makes it slightly more annoying to use, so it will be less popular than more accommodating software. Of course, we can change that calculation by blocking everything that mentions 'example.com', 'example.org', 'example.net' and so on in its User-Agent.

Restricting IP address access to specific ports in eBPF: a sketch

By: cks
8 March 2026 at 03:04

The other day I covered how I think systemd's IPAddressAllow and IPAddressDeny restrictions work, which unfortunately only allows you to limit this to specific (local) ports only if you set up the sockets for those ports in a separate systemd.socket unit. Naturally this raises the question of whether there is a good, scalable way to restrict access to specific ports in eBPF that systemd (or other interested parties) could use. I think the answer is yes, so here is a sketch of how I think you'd this.

Why we care about a 'scalable' way to do this is because systemd generates and installs its eBPF programs on the fly. Since tcpdump can do this sort of cross-port matching, we could write an eBPF program that did it directly. But such a program could get complex if we were matching a bunch of things, and that complexity might make it hard to generate on the fly (or at least make it complex enough that systemd and other programs didn't want to). So we'd like a way that still allows you to generate a simple eBPF program.

Systemd uses cgroup socket SKB eBPF programs, which attach to a cgroup and filter all network packets on ingress or egress. As far as I can understand from staring at code, these are implemented by extracting the IPv4 or IPv4 address of the other side from the SKB and then querying what eBPF calls a LPM (Longest Prefix Match) map. The normal way to use an LPM map is to use the CIDR prefix length and the start of the CIDR network as the key (for individual IPv4 addresses, the prefix length is 32), and then match against them, so this is what systemd's cgroup program does. This is a nicely scalable way to handle the problem; the eBPF program itself is basically constant, and you have a couple of eBPF maps (for the allow and deny sides) that systemd populates with the relevant information from IPAddressAllow and IPAddressDeny.

However, there's nothing in eBPF that requires the keys to be just CIDR prefixes plus IP addresses. A LPM map key has to start with a 32-bit prefix, but the size of the rest of the key can vary. This means that we can make our keys be 16 bits longer and stick the port number in front of the IP address (and increase the CIDR prefix size appropriately). So to match packets to port 22 from 128.100.0.0/16, your key would be (u32) 32 for the prefix length then something like 0x00 0x16 0x80 0x64 0x00 0x00 (if I'm doing the math and understanding the structure right). When you query this LPM map, you put the appropriate port number in front of the IP address.

This does mean that each separate port with a separate set of IP address restrictions needs its own set of map entries. If you wanted a set of ports to all have a common set of restrictions, you could use a normally structured LPM map and a second plain hash map where the keys are port numbers. Then you check the port and the IP address separately, rather than trying to combine them in one lookup. And there are more complex schemes if you need them.

Which scheme you'd use depends on how you expect port based access restrictions to be used. Do you expect several different ports, each with its own set of IP access restrictions (or only one port)? Then my first scheme is only a minor change from systemd's current setup, and it's easy to extend it to general IP address controls as well (just use a port number of zero to mean 'this applies to all ports'). If you expect sets of ports to all use a common set of IP access controls, or several sets of ports with different restrictions for each set, then you might want a scheme with more maps.

(In theory you could write this eBPF program and set up these maps yourself, then use systemd resource control features to attach them to your .service unit. In practice, at that point you probably should write host firewall rules instead, it's likely to be simpler. But see this blog post and the related VCS repository, although that uses a more hard-coded approach.)

Your terminal program has to be where xterm's ziconbeep feature is handled

By: cks
7 March 2026 at 03:26

I recently wrote about things that make me so attached to xterm. One of those things is xterm's ziconbeep feature, which causes xterm to visibly and perhaps audibly react when it's iconified or minimized and gets output. A commentator suggested that this feature should ideally be done in the window manager, where it could be more general. Unfortunately we can't do the equivalent of ziconbeep in the window manager, or at least we can't do all of it.

A window manager can sound an audible alert when a specific type of window changes its title in a certain way. This would give us the 'beep' part of ziconbeep in a general way, although we're treading toward a programmable window manager. But then, Gnome Shell now does a lot of stuff in JavaScript and its extensions are written in JS and the whole thing doesn't usually blow up. So we've got prior art for writing an extension that reacts to window title changes and does stuff.

What the window manager can't really do is reliably detect when the window has new output, in order to trigger any beeping and change the visible window title. As far as I know, neither X nor Wayland give you particularly good visibility into whether the program is rendering things, and in some ways of building GUIs, you're always drawing things. In theory, a program might opt to detect that it's been minimized and isn't visible and so not render any updates at all (although it will be tracking what to draw for when it's not minimized), but in practice I think this is unfashionable because it gets in the way of various sorts of live previews of minimized windows (where you want the window's drawing surface to reflect its current state).

Another limitation of this as a general window manager feature is that the window manager doesn't know what changes in the appearance of a window are semantically meaningful and which ones are happening because, for example, you just changed some font preference and the program is picking up on that. Only the program itself knows what's semantically meaningful enough to signal for people's attention. A terminal program can have a simple definition but other programs don't necessarily; your mail client might decide that only certain sorts of new email should trigger a discreet 'pay attention to me' marker.

(Even in a terminal program you might want more control over this than xterm gives you. For example, you might want the terminal program to not trigger 'zicon' stuff for text output but instead to do it when the running program finishes and you return to the shell prompt. This is best done by being able to signal the terminal program through escape sequences.)

How I think systemd IP address restrictions on socket units works

By: cks
6 March 2026 at 04:43

Among the systemd resource controls are IPAddressAllow= and IPAddressDeny=, which allow you to limit what IP addresses your systemd thing can interact with. This is implemented with eBPF. A limitation of these as applied to systemd .service units is that they restrict all traffic, both inbound connections and things your service initiates (like, say, DNS lookups), while you may want only a simple inbound connection filter. However, you can also set these on systemd.socket units. If you do, your IP address restrictions apply only to the socket (or sockets), not to the service unit that it starts. To quote the documentation:

Note that for socket-activated services, the IP access list configured on the socket unit applies to all sockets associated with it directly, but not to any sockets created by the ultimately activated services for it.

So if you have a systemd socket activated service, you can control who can access the socket without restricting who the service itself can talk to.

In general, systemd IP access controls are done through eBPF programs set up on cgroups. If you set up IP access controls on a socket, such as ssh.socket in Ubuntu 24.04, you do get such eBPF programs attached to the ssh.socket cgroup (and there is a ssh.socket cgroup, perhaps because of the eBPF programs):

# pwd
/sys/fs/cgroup/system.slice
# bpftool cgroup list ssh.socket
ID  AttachType      AttachFlags  Name
12  cgroup_inet_ingress   multi  sd_fw_ingress
11  cgroup_inet_egress    multi  sd_fw_egress

However, if you look there are no processes or threads in the ssh.socket cgroup, which is not really surprising but also means there is nothing there for these eBPF programs to apply to. And if you dump the eBPF program itself (with 'ebpftool dump xlated id 12'), it doesn't really look like it checks for the port number.

What I think must be going on is that the eBPF filtering program is connected to the SSH socket itself. Since I can't find any relevant looking uses in the systemd code of the `SO_ATTACH_*' BPF related options from socket(7) (which would be used with setsockopt(2) to directly attach programs to a socket), I assume that what happens is that if you create or perhaps start using a socket within a cgroup, that socket gets tied to the cgroup and its eBPF programs, and this attachment stays when the socket is passed to another program in a different cgroup.

(I don't know if there's any way to see what eBPF programs are attached to a socket or a file descriptor for a socket.)

If this is what's going on, it unfortunately means that there's no way to extend this feature of socket units to get per-port IP access control in .service units. Systemd isn't writing special eBPF filter programs for socket units that only apply to those exact ports, which you could in theory reuse for a service unit; instead, it's arranging to connect (only) specific sockets to its general, broad IP access control eBPF programs. Programs that make their own listening sockets won't be doing anything to get eBPF programs attached to them (and only them), so we're out of luck.

(One could experiment with relocating programs between cgroups, with the initial cgroup in which the program creates its listening sockets restricted and the other not, but I will leave that up to interested parties.)

Sometimes, non-general solutions are the right answer

By: cks
5 March 2026 at 03:33

I have a Python program that calculates and prints various pieces of Linux memory information on a per-cgroup basis. In the beginning, its life was simple; cgroups had a total memory use that was split between 'user' and '(filesystem) cache', so the program only needed to display either one field or a primary field plus a secondary field. Then I discovered that there was additional important (ie, large) kernel memory use in cgroups and added the ability to report it as an additional option for the secondary field. However, this wasn't really ideal, because now I had a three-way split and I might want to see all three things at once.

A while back I wrote up my realization about flexible string formatting with named arguments. This sparked all sorts of thoughts about writing a general solution for my program that could show any number of fields. Recently I took a stab at implementing this and rapidly ran into problems figuring out how I wanted to do it. I had multiple things that could be calculated and presented, I had to print not just the values but also a header with the right field names, I'd need to think about how I structured argparse argument groups in light of argparse not supporting nested groups, and so on. At a minimum this wasn't going to be a quick change; I was looking at significantly rewriting how the program printed its output.

The other day, I had an obvious realization: while it would be nice to have a fully general solution that could print any number of additional fields, which would meet my needs now and in the future, all that I needed right now was an additional three-field version with the extra fields hard-coded and the whole thing selected through a new command line argument. And this command line argument could drop right into the existing argparse exclusive group for choosing the second field, even though this feels inelegant.

(The fields I want to show are added with '-c' and '-k' respectively in the two field display, so the morally correct way to select both at once would be '-ck', but currently they're exclusive options, which is enforced by argparse. So I added a third option, literally '-b' for 'both'.)

Actually implementing this hard-coded version was a bit annoying for structural reasons, but I put the whole thing together in not very long; certainly it was much faster than a careful redesign and rewrite (in an output pattern I haven't used before, no less). It's not necessarily the right answer for the long term, but it's definitely the right answer for now (and I'm glad I talked myself into doing it).

(I'm definitely tempted to go back and restructure the whole output reporting to be general. But now there's no rush to it; it's not blocking a feature I want, it's a cleanup.)

A taxonomy of text output (from tools that want to be too clever)

By: cks
4 March 2026 at 01:41

One of my long standing gripes with Debian and Ubuntu is, well, I'll quote myself on the Fediverse:

I understand that Debian wants me to use 'apt' instead of apt-get, but the big reason I don't want to is because you can't turn off that progress bar at the bottom of your screen (or at least if you can it's not documented). That curses progress bar is something that I absolutely don't want (and it would make some of our tooling explode, yes we have tooling around apt-get).

Over time, I've developed opinions on what I want to see tools do for progress reports and other text output, and what I feel is increasingly too clever in tools that makes them more and more inconvenient for me. Today I'm going to try to run down that taxonomy, from best to worst.

  1. Line by line output in plain text with no colours.
  2. Represent progress by printing successive dots (or other characters) on the line until finally you print a newline. This is easy to capture and process later, since the end result is a newline terminated line with no control characters.

  3. Reporting progress by printing dots (or other characters) and then backspacing over them to erase them later. Pagers like less have some ability to handle backspaces, but this will give you heartburn in your own programs.

  4. Reporting progress by repeatedly printing a line, backspacing over it, and reprinting it (as apt-get does). This produces a lot more output, but I think less and anything that already deals with backspacing over things will generally be able to handle this. I believe apt-get does this.

  5. Any sort of line output with colours (which don't work in my environment, and when they do work they're usually unreadable). Any sort of terminal codes in the output make it complicated to capture the output with tools like script and then look over them later with pagers like less, although less can process a limited amount of terminal codes, including colours.

  6. Progress bar animation on one line with cursor controls and other special characters. This looks appealing but generates a lot more output and is increasingly hard for programs like less to display, search, or analyze and process. However, your terminal program of choice is probably still going to see this as line by line output and preserve various aspects of scrollback and so on.

  7. Progress output that moves the cursor and the output from its normal line to elsewhere on screen, such as at the bottom (as 'apt autoremove' and other bits of 'apt' do). Now you have a full screen program; viewing, reconstructing, and searching its output later is extremely difficult, and its output will blow up increasingly spectacularly if it's wrong about your window size (including if you resize things while it's running) or what terminal sequences your window responds to. Terminal programs and terminal environments such as tmux or screen may well throw up their hands at doing anything smart with the output, since you look much like a full screen editor, a pager, or programs like top. In some environments this may damage or destroy terminal scrollback.

    An additional reason I dislike this style is that it causes output to not appear at the current line. When I run your command line program, I want your program to print its output right below where I started it, in order, because that's what everything else does. I don't want the output jumping around the screen to random other locations. The only programs I accept that from are genuine full screen programs like top. Programs that insist on displaying things at random places on the screen are not really command line programs, they are TUIs cosplaying being CLIs.

  8. Actual full screen output, as a text UI, with the program clearing the screen and printing status reports all over the place. Fortunately I don't think I've seen any 'command line' programs do this; anything that does tends to be clearly labeled as a TUI program, and people mostly don't provide TUIs for command line tools (partly because it's usually more work).

My strong system administrator's opinion is that if you're tempted to do any of these other than the first, you should provide a command line switch to turn these off. Also, you should detect unusual settings of the $TERM environment variable, like 'dumb' or perhaps 'vt100', and automatically disable your smart output. And you should definitely disable your smart output if $TERM isn't set or you're not outputting to a (pseudo-)terminal.

(Programs that insist on fancy output no matter what make me very unhappy.)

Log messages are mostly for the people operating your software

By: cks
3 March 2026 at 04:48

I recently read Evan Hahn's The two kinds of error (via), which talks very briefly in passing about logging, and it sparked a thought. I've previously written my system administrator's view of what an error log level should mean, but that entry leaves out something fundamental about log messages, which is that under most circumstances, log messages are for the people operating your software (I've sort of said this before in a different context). When you're about to add a non-debug log message, one of the questions you should ask is what does someone running your program get out of seeing the message.

Speaking from my own experience, it's very easy to write log messages (and other messages) that are aimed at you, the person developing the program, script, or what have you. They're useful for debugging and for keeping track of the state of the program, and it's natural to write them that way since you're immersed in the program and have all of the context (this is especially a problem for infrequent error messages, which I've learned to make as verbose as possible, and a similar thing applies for infrequently logged messages). But if your software is successful (especially if it gets distributed to other people), most of the people running it won't be the developers, they'll only be operating it.

(This can include a future version of you when you haven't touched this piece of software for months.)

If you want your log messages to be useful for anything other than being mailed to you as part of a 'can you diagnose this' message, they need to be useful for the people operating the software. This doesn't mean 'only report errors that they can fix and need to', although that's part of it. It also means making the information you provide through logs be things that are useful and meaningful to people operating your software, and that they can understand without a magic decoder ring.

If people operating your software won't get anything out of seeing a log message, you probably shouldn't log it by default in the first place (or you need to reword it so that people will get something from it). In Evan Hahn's terminology, this apply to the log messages for both expected errors and unexpected errors, although if the program aborts, it should definitely tell system administrators why it did.

For a system administrator, log messages about expected errors let us diagnose what went wrong to cause something to fail, and how interested we are in them depends partly on how common they are. However, how common they are isn't the only thing. MTAs often have what would be considered relatively verbose logs of message processing and will log every expected error like 'couldn't do a DNS lookup' or 'couldn't connect to a remote machine', even though they can happen a lot. This is very useful because one thing we sometimes care a lot about is what happened to and with a specific email message.

The things that make me so attached to xterm as my terminal program

By: cks
2 March 2026 at 04:27

I've said before in various contexts (eg) that I'm very attached to the venerable xterm as my terminal (emulator) program, and I'm not looking forward to the day that I may have to migrate away from it due to Wayland (although I probably can keep running it under XWayland, now that I think about it). But I've never tried to write down a list of the things that make me so attached to it over other alternatives like urxvt, much less more standard ones like gnome-terminal. Today I'm going to try to do that, although my list is probably going to be incomplete.

  • Xterm's ziconbeep feature, which I use heavily. Urxvt can have an equivalent but I don't know if other terminal programs do.

  • I routinely use xterm's very convenient way of making large selections, which is supported in urxvt but not in gnome-terminal (and it can't be since gnome-terminal uses mouse button 3 for its own purposes).

  • The ability to turn off all terminal colours, because they often don't work in my preferred terminal colours. Other terminal programs have somewhat different and sometimes less annoying colours, but it's still far to easy for programs to display things in unreadable colours.

    Yes, I can set my shell environment and many programs to not use colours, but I can't set all of them; some modern programs simply always use colours on terminals. Xterm can be set to completely ignore them.

  • I'm very used to xterm's specific behavior when it comes to what is a 'word' for double-click selection. You can read the full details in the xterm manual page's section on character classes. I'm not sure if it's possible to fully emulate this behavior in other terminal programs; I once made an incomplete attempt in urxvt, while gnome-terminal is quite different and has little or no options for customizing that behavior (in the Gnome way). Generally the modern double click selection behavior is too broad for me.

    (For instance, I'm extremely attached to double-click selecting only individual directories in full paths, rather than the entire thing. I can always swipe to select an entire path, but if I can't pick out individual path elements with a double click my only choice is character by character selection, which is a giant pain.)

    Based on a quick experiment, I think I can make KDE's konsole behave more or less the way I want by clearing out its entire set of "Word characters" in profiles. I think this isn't quite how xterm behaves but it's probably close enough for my reflexes.

  • Xterm doesn't treat text specially because of its contents, for example by underlining URLs or worse, hijacking clicks on them to do things. I already have well evolved systems for dealing with things like URLs and I don't want my terminal emulator to provide any 'help'. I believe that KDE's konsole can turn this off, but gnome-terminal doesn't seem to have any option for it.

  • Many of xterm's behaviors can be controlled from command line switches. Some other terminal emulators (like gnome-terminal) force you to bundle these behaviors together as 'profiles' and only let you select a profile. Similarly, a lot of xterm's behavior can be temporarily changed on the fly through its context menus, without having to change the profile's settings (and then change them back).

  • Every xterm window is a completely separate program that starts from scratch, and xterm is happy to run on remote servers without complications; this isn't something I can say for all other competitors. Starting from scratch also means things like not deciding to place yourself where your last window was, which is konsole's behavior (and infuriates me).

Of these, the hardest two to duplicate are probably xterm's double click selection behavior of what is a word and xterm's large selection behavior. The latter is hard because it requires the terminal program to not use mouse button 3 for a popup menu.

I use some other xterm features, like key binding, including duplicating windows, but I could live without them, especially if the alternate terminal program directly supports modern cut and paste in addition to xterm's traditional style. And I'm accustomed to a few of xterm's special control characters, especially Ctrl-space, but I think this may be pretty universally supported by now (Ctrl-space is in gnome-terminal).

There are probably things that other terminal programs like konsole, gnome-terminal and so on do that I don't want them to (and that xterm doesn't). But since I don't use anything other than xterm (and a bit of gnome-terminal and once in a while a bit of urxvt), I don't know what those undesired features are. Experimenting with konsole for this entry taught me some things I definitely don't want, such as it automatically placing itself where it was before (including placing a new konsole window on top of one of the existing ones, if you have multiple ones).

(This elaborates on a comment I made elsewhere.)

Sometimes the simplest version of a text table is printed from a command

By: cks
1 March 2026 at 03:17

Back when we had just started with our current metrics and dashboards adventure, I wrote about how sometimes the simplest version of a graph is a text table. Today I will extend that further: sometimes the simplest version of a text table is to have a command that prints it out, rather than making people look at a web page.

We recently had a major power outage at work, and in the aftermath not all of our machines came back. One of my co-workers is an extreme early bird and he came in to the university about as early as it's possible to on the TTC, and started work on troubleshooting what was going on. One of the things he needed to know was what machines were still down, so he could figure out any common elements to them (and see what machines were stubbornly not coming back on even though they ought to be).

We have Grafana dashboards for this, and the information about what machines are down is present in some of them in tabular form. But it's a table embedded in a widget in a web page, and you need a browser to look at it, which you may not have from the server console of some server you just powered up. Since I like command line tools, at one point I wrote some little scripts that make queries to our Prometheus server with curl and run the result through 'jq' to extract things. One of them is called 'promdownhosts' and it prints out what you'd expect. Initially this was just something I used, but several years ago I mentioned my collection of these scripts to my co-workers and we wound up making them group scripts in a central location.

(I initially wrote this script and a few others for use during our planned power outages and other downtimes, because it was a convenient way of seeing what we hadn't yet turned on or might have missed.)

Early in the morning of that Tuesday, bringing machines back up after the power outage and finding dead PDUs, my co-worker used the 'promdownhosts' script extensively to troubleshoot things. One of the nice aspects of it being a script was that he could put the names of uninteresting machines in a file and then exclude them easily with things like 'promdownhosts | fgrep -v -f /tmp/ignore-these' (something that's much harder to do in a web page dashboard interface, especially if the designer hasn't thought of that). And in general, the script made (and makes) this information quite readily accessible in a compact format that was quick to skim and definitely free of distractions.

Not everything can be presented this way, in a list or a table printed out in plain text from a command line tool. Sometimes tables on a web page are the better option, and it's good to have options in general; sometimes we want to look at this information along with other information too. As I've found out the hard way sometimes, there's only so much information you can cram into a plain text table before the result is increasingly hard to read.

(I have a command that summarizes our current Prometheus alerts and its output is significantly harder to read because I need it to be compact and there's more information to present. It's probably only really suitable for my use because I understand all of its shorthand notations, including the internal Prometheus names for our alerts.)

On the Bourne shell's distinction between shell variables and exported ones

By: cks
28 February 2026 at 03:44

One of the famous things that people run into with the Bourne shell is that it draws a distinction between plain shell variables and special exported shell variables, which are put into the environment of processes started by the shell. This distinction is a source of frustration when you set a variable, run a program, and the program doesn't have the variable available to it:

$ GODEBUG=...
$ go-program
[doesn't see your $GODEBUG setting]

It's also a source of mysterious failures, because more or less all of the environment variables that are present automatically become exported shell variables. So whether or not 'GODEBUG=..; echo running program; go-program' works can depend on whether $GODEBUG was already set when your shell started. The environment variables of regular shell sessions are usually fairly predictable, but the environment variables present when shell scripts get run can be much more varied. This makes it easy to write a shell script that only works right for you, because in your environment it runs with certain environment variables set and so they automatically become exported shell variables.

I've told you all of that because despite these pains, I believe that the Bourne shell made the right choice here, in addition to a pragmatically necessary choice at the time it was created, in V7 (Research) Unix. So let's start with the pragmatics.

The Bourne shell was created along side environment variables themselves, and on the comparatively small machines that V7 ran on, you didn't have much room for the combination of program arguments and the new environment. If either grew too big, you got 'argument list too long' when you tried to run programs. This made it important to minimize and control the size of the environment that the shell gave to new processes. If you want to do that without limiting the use of shell variables so much, a split between plain shell variables and exported ones makes sense and requires only a minor bit of syntax (in the form of 'export').

Both machines and exec() size limits are much larger now, so you might think that getting rid of the distinction is a good thing. The Bell Labs Research Unix people thought so, so they did do this in Tom Duff's rc shell for V10 Unix and Plan 9. Having used both the Bourne shell and a version of rc for many years, I both agree and disagree with them.

For interactive use, having no distinction between shell variables and exported shell variables is generally great. If I set $GODEBUG, $PYTHONPATH, or any number of any other environment variables that I want to affect programs I run, I don't have to remember to do a special 'export' dance; it just works. This is a sufficiently nice (and obvious) thing that it's an option for the POSIX 'sh', in the form of 'set -a' (and this set option is present in more or less all modern Bourne shells, including Bash).

('Set -a' wasn't in the V7 sh, but I haven't looked to see where it came from. I suspect that it may have come from ksh, since POSIX took a lot of the specification for their 'sh' from ksh.)

For shell scripting, however, not having a distinction is messy and sometimes painful. If I write an rc script, every shell variable that I use to keep track of something will leak into the environment of programs that I run. The shell variables for intermediate results, the shell variables for command line options, the shell variables used for for loops, you name it, it all winds up in the environment unless I go well out of my way to painfully scrub them all out. For shell scripts, it's quite useful to have the Bourne shell's strong distinction between ordinary shell variables, which are local to your script, and exported shell variables, which you deliberately act to make available to programs.

(This comes up for shell scripts and not for interactive use because you commonly use a lot more shell variables in shell scripts than you do in interactive sessions.)

For a new Unix shell today that's made primarily or almost entirely for interactive use, automatically exporting shell variables into the environment is probably the right choice. If you wanted to be slightly more selective, you could make it so that shell variables with upper case names are automatically exported and everything else can be manually exported. But for a shell that's aimed at scripting, you want to be able to control and limit variable scope, only exporting things that you explicitly want to.

How to redirect a Bash process substitution into a while loop

By: cks
27 February 2026 at 03:37

In some sorts of shell scripts, you often find yourself wanting to work through a bunch of input in the shell; some examples of this for me are here and here. One of the tools for this is a 'while read -r ...' loop, using the shell's builtin read to pull in one or more fields of data (hopefully not making a mistake). Suppose, not hypothetically, that you have a situation where you want to use such a 'while read' loop to accumulate some information from the input, setting shell variables, and then using them later. The innocent and non-working way to write this is:

accum=""
sep=""
some-program |
while read -r avalue; do
   accum="$accum$sep$avalue"
   sep=" or "
done

# Now we want to use $accum

(The recent script where I ran into this issue does much more complex things in the while loop that can't easily be done in other ways.)

This doesn't work because the 'while' is actually happening in a subshell, so the shell variables it sets are lost at the end. To make this work we have to wrap everything from the 'while ...' onward up into a subshell, with that part looking like:

some-program |
(
while read -r avalue; do
   accum="$accum$sep$avalue"
   sep=" or "
done
[...]
)

(You can't get around this with '{ while ...; ... done; }', Bash will still put the 'while' in a subshell.)

The way around this starts with how you can use a file redirection with a while loop (it goes on the 'done'):

some-program >/some/file
while read -r avalue; do
  [...]
done </some/file
# $accum is still set

So far this is all generic Bourne shell things. Bash has a special feature of process substitution, which allows us to use a process instead of a file, using the otherwise illegal syntax '<(...)'. This is great and exactly what we want to avoid creating a temporary file and then have to clean it up. So the innocent and obvious way to try to write things is this:

while read -r avalue; do
  [...]
done <(some-program)

If you try this, you will get the sad error message from Bash of:

line N: syntax error near unexpected token `<(some-program)'
line N: 'done <(some-program)'

This is not a helpful error message. I will start by telling you the cure, and then what is going on at a narrow technical level to produce this error message. The cure is:

while read -r avalue; do
  [...]
done < <(some-program)

Note that you must have a space between the two <'s, writing this as '<<(some-program)' will get you a similar syntax error.

The technical reason for this error is that although it looks like redirection, process substitution is a form of substitution, like '$var' (it's in the name, but you, like me, may not know what Bash calls it off the top of your head). The result of process substitution will be, for example, a /dev/fd/N name (and a subprocess that is running our 'some-program' and feeding into the other end of the file descriptor). We can see this directly:

$ echo <(cat /dev/null)
/dev/fd/63

(Your number may vary.)

You can't write 'while ...; done /dev/fd/63'. That's a syntax error. Even though the pre-substitution version looks like redirection, it's not, so it's not accepted.

That '<(...)' is actually a substitution is why our revised version works. Reading '< <(some-program)' right to left, the '<(some-program)' is process substitution, and it (along with other shell expansions) are done first, before redirections. After substitution this looks like '< /dev/fd/NN', which is acceptable syntax. If we leave out the space and write this as '<<(some-program)', the shell throws up its hands at the '<<' bit.

(So from Bash's perspective, this is very similar to 'file=/some/file; while ... ; done < $file', which is perfectly legal.)

PS: Before I wrote this entry, I didn't know how to get around the 'done <(some-program)' syntax error. Until the penny dropped about the difference between redirections and process substitution, I thought that Bash simply forbade this to make its life easier.

With disk caches, you want to be able to attribute hits and misses

By: cks
26 February 2026 at 03:06

Suppose that you have a disk or filesystem cache in memory (which you do, since pretty much everything has one these days). Most disk caches will give you simple hit and miss information as part of their basic information, but if you're interested in the performance of your disk cache (or in improving it), you want more information. The problem with disk caches is that there are a lot of different sources and types of disk IO, and you can have hit rates that are drastically different between them. Your hit rate for reading data from files may be modest, while your hit rate on certain sorts of metadata may be extremely high. Knowing this is important because it means that your current good performance on things involving that metadata is critically dependent on that hit rate.

(Well, it may be, depending on what storage media you're using and what its access speeds are like. A lot of my exposure to this dates from the days of slow HDDs.)

This potential vast difference is why you want more detailed information in both cache metrics and IO traces. The more narrowly you can attribute IO and the more you know about it, the more useful things you can potentially tell about the performance of your system and what matters to it. This is not merely 'data' versus 'metadata', and synchronous versus asynchronous; ideally you want to know the sort of metadata read being done, and whether the file data being read is synchronous or not, and whether this is a prefetching read or a 'demand' read that really needs the data.

A lot of the times, operating systems are not set up to pass this information down through all of the layers of IO from the high level filesystem code that knows what it's asking for to the disk driver code that's actually issuing the IOs. Part of the reason for this is that it's a lot of work to pass all of this data along, which means extra CPU and memory on what is an increasingly hot path (especially with modern NVMe based storage). These days you may get some of this fine grained details in metrics and perhaps IO traces (eg, for (Open)ZFS), but probably not all the way to types of metadata.

Of course, disk and filesystem caches (and IO) aren't the only place that this can come up. Any time you have a cache that stores different types of things that are potentially queried quite differently, you can have significant divergence in the types of activity and the activity rates (and cache hit rates) that you're experiencing. Depending on the cache, you may be able to get detailed information from it or you may need to put more detailed instrumentation into the code that queries your somewhat generic cache.

Modern general observability features in operating systems can sometimes let you gather some of this detailed attribute yourself (if the OS doesn't already provide them). However, it's not a certain thing and there are limits; for example, you may have trouble tracing and tracking IO once it gets dispatched asynchronously inside the OS (and most OSes turn IO into asynchronous operations before too long).

Systemd resource controls on user.slice and system.slice work fine

By: cks
25 February 2026 at 03:54

We have a number of systems where we traditionally set strict overcommit handling, and for some time this has caused us some heartburn. Some years ago I speculated that we might want to use resource controls on user.slice or systemd.slice if they worked, and then recently in a comment here I speculated that this was the way to (relatively) safely limit memory use if it worked.

Well, it does (as far as I can tell, without deep testing). If you want to limit how much of the system's memory people who log in can use so that system services don't explode, you can set MemoryMin= on system.slice to guarantee some amount of memory to it and all things under it. Alternately, you can set MemoryMax= on user.slice, collectively limiting all user sessions to that amount of memory. In either case my view is that you might want to set MemorySwapMax= on user.slice so that user sessions don't spend all of their time swapping. Which one you set things on depends on which is easier and you trust more; my inclination is MemoryMax, although that means you need to dynamically size it depending on this machine's total memory.

(If you want to limit user memory use you'll need to make sure that things like user cron jobs are forced into user sessions, rather than running under cron.service in system.slice.)

Of course this is what you should expect, given systemd's documentation and the kernel documentation. On the other hand, the Linux kernel cgroup and memory system is sufficiently opaque and ever changing that I feel the need to verify that things actually do work (in our environment) as I expect them to. Sometimes there are surprises, or settings that nominally work but don't really affect things the way I expect.

This does raise the question of how much memory you want to reserve for the system. It would be nice if you could use systemd-cgtop to see how much memory your system.slice is currently using, but unfortunately the number it will show is potentially misleadingly high. This is because the memory attributed to any cgroup includes (much) more than program RAM usage. For example, on our it seems typical for system.slice to be using under a gigabyte of 'user' RAM but also several gigabytes of filesystem cache and other kernel memory. You probably want to allow for some of that in what memory you reserve for system.slice, but maybe not all of the current usage.

(You can get the current version of the 'memdu' program I use as memdu.py.)

Gnome, GSettings, gconf, and which one you want

By: cks
24 February 2026 at 03:22

On the Fediverse a while back, I said:

Ah yes, GNOME, it is of course my mistake that I used gconf-editor instead of dconf-editor. But at least now Gnome-Terminal no longer intercepts F11, so I can possibly use g-t to enter F11 into serial consoles to get the attention of a BIOS. If everything works in UEFI land.

Gnome has had at least two settings systems, GSettings/dconf (also) and the older GConf. If you're using a modern Gnome program, especially a standard Gnome program like gnome-terminal, it will use GSettings and you will want to use dconf-editor to modify its settings outside of whatever Preferences dialogs it gives you (or doesn't give you). You can also use the gsettings or dconf programs from the command line.

(This can include Gnome-derived desktop environments like Cinnamon, which has updated to using GSettings.)

If the program you're using hasn't been updated to the latest things that Gnome is doing, for example Thunderbird (at least as of 2024), then it will still be using GConf. You need to edit its settings using gconf-editor or gconftool-2, or possibly you'll need to look at the GConf version of general Gnome settings. I don't know if there's anything in Gnome that synchronizes general Gnome GSettings settings into GConf settings for programs that haven't yet been updated.

(This is relevant for programs, like Thunderbird, that use general Gnome settings for things like 'how to open a particular sort of thing'. Although I think modern Gnome may not have very many settings for this because it always goes to the GTK GIO system, based on the Arch Wiki's page on Default Applications.)

Because I've made this mistake between gconf-editor and dconf-editor more than once, I've now created a personal gconf-editor cover script that prints an explanation of the situation when I run it without a special --really argument. Hopefully this will keep me sorted out the next time I run gconf-editor instead of dconf-editor.

PS: Probably I want to use gsettings instead of dconf-editor and dconf as much as possible, since gsettings works through the GSettings layer and so apparently has more safety checks than dconf-editor and dconf do.

PPS: Don't ask me what the equivalents are for KDE. KDE settings are currently opaque to me.

PDUs can fail (eventually) and some things related to this

By: cks
22 February 2026 at 23:23

Early last Tuesday there was a widespread power outage at work, which took out power to our machine rooms for about four hours. Most things came back up when the power was restored, but not everything. One of the things that had happened was that one of our rack PDUs had failed. Fixing this took a surprising amount of work.

We don't normally think about our PDUs very much. They sit there, acting as larger and often smarter versions of power bars, and just, well, work. But both power bars and PDUs can fail eventually, and in our environment rack PDUs tend to last long enough to reach that point. We may replace servers in the racks in our machine rooms, but we don't pull out and replace entire racks all that often. The result is that a rack's initial PDU is likely to stay in the rack until it fails.

(This isn't universal; there are plenty of places that install and remove entire racks at a time. If you're turning over an entire rack, you might replace the PDU at the same time you're replacing all of the rest of it. Whole rack replacement is certainly going to keep your wiring neater.)

A rack PDU failing not a great thing for the obvious reason; it's going to take out much or all of the servers in the rack unless you have dual power supplies on your servers, each connected to a separate PDU. For racks that have been there for a while and gone through a bunch of changes, often it will turn out to be hard to remove and replace the PDU. Maintaining access to remove PDUs is often not a priority either in placing racks in your machine room or in wiring things up, so it's easy for things to get awkward and encrusted. This was one of the things that happened with our failed PDU on last Tuesday; it took quite some work to extract and replace it.

(Some people might have pre-deployed spare PDUs in each rack, but we don't. And if those spare PDUs are already connected to power and turned on, they too can fail over time.)

We're fortunate that we already had spare (smart) PDUs on hand, and we had also pre-configured a couple of them for emergency replacements. If we'd had to order a replacement PDU, things would obviously have been more of a problem. There are probably some research groups around here with their own racks who don't have a spare PDU, because it's an extra chunk of money for an unlikely or uncommon contingency, and they might choose to accept a rack being down for a while.

The importance of limiting syndication feed requests in some way

By: cks
22 February 2026 at 01:27

People sometimes wonder why I care so much about HTTP conditional GETs and rate limiting for syndication feed fetchers. There are multiple reasons, including social reasons to establish norms, but one obvious one is transfer volumes. To illustrate that, I'll look at the statistics for yesterday for feed fetches of the main syndication feed for Wandering Thoughts.

Yesterday there were 7492 feed requests that got HTTP 200 responses, 9419 feed requests that got HTTP 304 Not Modified responses, and 11941 requests that received HTTP 429 responses. The HTTP 200 responses amounted to about 1.26 GBytes, with the average response size being 176 KBytes. This average response size is actually a composite; typical compressed syndication feed responses are on the order of 160 KBytes, while uncompressed ones are on the order of 540 KBytes (but there look to have been only 313 of them, which is fortunate; even still they're 12% of the transfer volume).

If feed readers didn't do any conditional GETs and I didn't have any rate limiting (and all of the requests that got HTTP 429s would still have been made), the additional feed requests would have amounted to about another 3.5 GBytes of responses sent out to people. Obviously feed readers did do conditional GETS, and 66% of their non rate limited requests were successful conditional GETs. A HTTP 200 response ratio of 44% is probably too pessimistic once we include rate limited requests, so as an extreme approximation we'll guess that 33% of the rate limited requests would have received HTTP 200 responses with a changed feed; that would amount to another 677 MBytes of response traffic (which is less than I expected). If we use the 44% HTTP 200 ratio, it's still only 903 MBytes more.

(This 44% rate may sound high but my syndication feed changes any time someone leaves a comment on a recent entry, because the syndication feed of entries includes a comment count for every entry.)

Another statistic is that 41% of syndication feed requests yesterday got HTTP 429 responses. The most prolific single IP address received 950 HTTP 429s, which maps to an average request interval of less than two minutes between requests. Another prolific source made 779 requests, which again amounts to an interval of just less than two minutes. There are over 20 single IPs that received more than 96 HTTP 429 responses (which corresponds to an average interval of 15 minutes). There is a lot of syndication feed fetching software out there that is fetching quite frequently.

(Trying to figure out how many HTTP 429 sources did conditional requests is too complex with my current logs, since I don't directly record that information.)

You can avoid the server performance impact of lots of feed fetching by arranging to serve syndication feeds from static files instead of a dynamic system (and then you can limit how frequently you update those files, effectively forcing a maximum number of HTTP 200 fetches per time interval on anything that does conditional GETs). You can't avoid the bandwidth effects, and serving from static files generally leaves you with only modest tools for rate limiting.

PS: The syndication feeds for Wandering Thoughts are so big because I've opted to default to 100 entries in them, but I maintain you should be able to do this sort of thing without having your bandwidth explode.

Consider mentioning your little personal scripts to your co-workers

By: cks
21 February 2026 at 03:59

I have a habit of writing little scripts at work for my own use (perhaps like some number of my readers). They pile up like snowdrifts in my $HOME/adm, except they don't melt away when their time is done but stick around even when they're years obsolete. Every so often I mention one of them to my co-workers; sometimes my co-workers aren't interested, but sometimes they find the script appealing and have me put it into our shared location for 'production' scripts and programs. Sometimes, these production-ized scripts have turned out to be very useful.

(Not infrequently, having my co-workers ask me to move something into 'production' causes me to revise it to make it less of a weird hack. Occasionally this causes drastic changes that significantly improve the script.)

When I say that I mentioned my scripts to my co-workers, that makes it sound more intentional than it often is. A common pattern is that I'll use one of my scripts to get some results that I share, and then my co-workers will ask how I did it and I'll show them the command line, and then they'll ask things like "what is this ~cks/adm/<program> thing' and 'can you put that somewhere more accessible, it sounds handy'. I do sometimes mention scripts unprompted, if I think they're especially useful, but I've written a lot of scripts over time and many of them aren't of much use for anyone beside me (or at least, I think they're too weird to be shared).

If you have your own collection of scripts, maybe your co-workers would find some of them useful. It probably can't hurt to mention some of them every so often. You do have to mention specific scripts; in my experience 'here is a directory of scripts with a README covering what's there' doesn't really motivate people to go look. Mentioning a specific script with what it can do for people is the way to go, especially if you've just used the script to deal with some situation.

(One possible downside of doing this is the amount of work you may need to do in order to turn your quick hack into something that can be operated and maintained by other people over the longer term. In some cases, you may need to completely rewrite things, preserving the ideas but not the implementation.)

PS: Speaking from personal experience, don't try to write a README for your $HOME/adm unless you're the sort of diligent person who will keep it up to date as you add, change, and ideally remove scripts. My $HOME/adm's README is more than a decade out of date.

Parsing hours and minutes into a useful time in basic Python

By: cks
20 February 2026 at 03:48

Suppose, not hypothetically, that you have a program that optionally takes a time in the past to, for example, report on things as of that time instead of as of right now. You would like to allow people to specify this time as just 'HH:MM', with the meaning being that time today (letting people do 'program --at 08:30'). This is convenient for people using your program but irritatingly hard today with the Python standard library.

(In the following code examples, I need a Unix timestamp and we're working in local time, so I wind up calling time.mktime(). We're working in local time because that's what is useful for us.)

As I discovered or noticed a long time ago, the time module is a thin shim over the C library time functions and inherits their behavior. One of these behaviors is that if you ask time.strptime() to parse a time format of '%H:%M', you get back a struct_time object that is in 1900:

>>> import time
>>> time.strptime("08:10", "%H:%M")
time.struct_time(tm_year=1900, tm_mon=1, tm_mday=1, tm_hour=8, tm_min=10, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=-1)

There are two solutions I can think of, the straightforward brute force approach that uses only the time module and a more theoretically correct version using datetime, which comes in two variations depending on whether you have Python 3.14 or not.

The brute force solution is to re-parse a version of the time string with the date added. Suppose that you have a series of time formats that people can give you, including '%H:%M', and you try them all until one works, with code like this:

 for fmt in tfmts:
     try:
         r = time.strptime(tstr, fmt)
         # Fix up %H:%M and %H%M
         if r.tm_year == 1900:
             dt = time.strftime("%Y-%m-%d ", time.localtime(time.time()))
             # replace original r with the revised one.
             r = time.strptime(dt + tstr, "%Y-%m-%d "+fmt)
         return time.mktime(r)
     except ValueError:
         continue

I think the correct, elegant way using only the standard library is to use datetime to combine today's date and the parsed time into a correct datetime object, which can then be turned into a struct_time and passed to time.mktime. Before Python 3.14, I believe this is:

         r = time.strptime(tstr, fmt)
         if r.tm_year == 1900:
             tm = datetime.time(hour=r.tm_hour, minute=r.tm_min)
             today = datetime.date.today()
             dt = datetime.datetime.combine(today, tm)
             r = dt.timetuple()
         return time.mktime(r)

There are variant approaches to the basic transformation I'm doing here but I think this is the most correct one.

If you have Python 3.14 or later, you have datetime.time.strptime() and I think you can do the slightly clearer:

[...]
             tm = datetime.time.strptime(tstr, fmt)
             today = datetime.date.today()
             dt = datetime.datetime.combine(today, tm)
             r = dt.timetuple()
[...]

If you can work with datetime.datetime objects, you can skip converting back to a time.struct_time object. In my case, the eventual result I need is a Unix timestamp so I have no choice.

You can wrap this up into a general function:

def strptime_today(tstr, fmt):
   r = time.strptime(tstr, fmt)
   if r.tm_year != 1900:
      return r
   tm = datetime.time(hour=r.tm_hour, minute=r.tm_min, second=r.tm_sec)
   today = datetime.date.today()
   dt = datetime.datetime.combine(today, tm)
   return dt.timetuple()

This version of time.strptime() will return the time today if given a time format with only hours, minutes, and possibly seconds. Well, technically it will do this if given any format without the year, but dealing with all of the possible missing fields is left as an exercise for the energetic, partly because there's no (relatively) reliable signal for missing months and days the way there is for years. For many programs, a year of 1900 is not even close to being valid and is some sort of mistake at best, but January 1st is a perfectly ordinary day of the year to care about.

(Now that I've written this function I may update my code to use it, instead of the brute force time package only version.)

How GNU Tar handles deleted things in incremental tar archives

By: cks
19 February 2026 at 04:10

Suppose, not hypothetically, that you have a system that uses GNU Tar for its full and incremental backups (such as Amanda). Or maybe you use GNU Tar directly for this. If you have an incremental backup tar archive, you might be interested in one or both of two questions, which are in some ways mirrors of each other: what files were deleted between the previous incremental and this incremental, or what's the state of the directory tree as of this incremental (if it and all previous backups it depends on were properly restored).

(These questions are of deep interest to people who may have deleted some amount of files but they're not sure exactly what files have been deleted.)

Handling deleted files is one of the challenges of incremental backups, with various approaches. How GNU Tar handles deleted files is sort of documented in Using tar to perform incremental dumps and Dumpdir, but the documentation doesn't explain it specifically. The simple version is that GNU Tar doesn't explicitly record deletions; instead, every incremental tar archive carries a full listing of the directory tree, covering both things that are in this incremental archive and things that come from previous ones. To deduce deleted files, you have to compare two listings of the directory tree.

(As part of this full listing, an incremental tar archive records every directory, even unchanged ones.)

You can get at these full listings with 'tar --list --incremental --verbose --verbose --file ...', but tar prints them in an inconvenient format. You don't get a directory tree, the way you do with plain 'tar -t'; instead you get the Dumpdir contents of each directory printed out separately, and it's up to you to post-process the results to assemble a directory tree with full paths and so on. People have probably written tools to do this, either from tar's output or by directly reading the GNU Tar incremental tar archive format.

In my view, GNU Tar's approach is sensible and it comes with some useful properties (although there are tradeoffs). Conveniently, you can reconstruct the full directory tree as of that point in time from any single incremental archive; you don't have to go through a series of them to build up the picture. This probably also makes things somewhat more resilient if you're missing some incremental archives in the middle, since at least you know what's supposed to be there but you don't have any copy of. Finding where a single file was deleted is better than it would be if there were explicit deletion records, since you can do a binary search across incrementals to find the first one where it doesn't appear. The lack of explicit deletion reports does make it inconvenient to determine everything that was deleted between two successive incrementals, but on the other hand you can determine what was deleted (or added) between any two tar archives without having to go through every incremental between them.

(You could say that GNU Tar incremental archives have a snapshot of the directory tree state instead of carrying a journal of changes to the state.)

Two challenges of incremental backups

By: cks
18 February 2026 at 04:25

Roughly speaking, there are two sorts of backups that you can make, full backups and incremental backups. At the abstract level, full backups are pretty simple; you save everything that you find. Incremental backups are more complicated because they save only the things that changed since whatever they're relative to. People want incremental backups despite the extra complexity because they save a lot of space compared to backing up everything all the time.

There are two general challenges that make incremental backups more complicated than full backups. The first challenge is reliably finding everything that's changed, in the face of all of the stuff that can change in filesystems (or other sources of data). Full backups only need to be able to traverse all of the filesystem (or part of it), or in general the data source, and this is almost always a reliable thing because all sorts of things and people use it. Finding everything that has changed has historically been more challenging because it's not something that people do often outside of incremental backups.

(And when people do it they may not notice if they're missing some things, the way they absolutely will notice if a general traversal skips some files.)

The second challenge is handling things that have gone away. Once you have a way to find everything that's changed it's not too difficult to build a backup system that will faithfully reproduce everything that definitely was there as of the incremental. All you need to do is save every changed file and then unpack the sequence of full and incremental backups on top of each other, with the latest version of any particular file overwriting any previous one. But people often want their incremental restore to reflect the state of directories and so on as of the incremental, which means removing things that have been deleted (both files and perhaps entire directory trees). This means that your incrementals need some way to pass on information about things that were there in earlier backups but aren't there now, so that the restore process can either not restore them or remove them as it restores the sequence of full and incremental backups.

While there are a variety of ways to tackle the first challenge, backup systems that want to run quickly are often constrained by what features operating systems offer (and also what features your backup system thinks it can trust, which isn't always the same thing). You can checksum everything all the time and keep a checksum database, but that's usually not going to be the fastest thing. The second challenge is much less constrained by what the operating system provides, which means that in practice it's much more on you (the backup system) to come up with a good solution. Your choice of solution may interact with how you solve the first challenge, and there are tradeoffs in various approaches you can pick (for example, do you represent deletions explicitly in the backup format or are they implicit in various ways).

There is no single right answer to these challenges. I'll go as far as to say that the answer depends partly on what sort of data and changes you expect to see in the backups and partly where you want to put the costs between creating backups and handling restores.

Understanding the limitation of 'do in new frame/window' in GNU Emacs

By: cks
17 February 2026 at 03:09

GNU Emacs has a core model for how it operates, and some of its weird seeming limitations are easier to understand if you internalize that model. One of them is what you have to do in GNU Emacs to get the perfectly sensible operation of 'do <X> in a new frame or window'. For instance, one of the things I periodically want to do in MH-E is 'open a folder in a new frame', so that I can go through it while keeping my main MH-E environment on my inbox to process incoming email.

If you dig through existing GNU Emacs ELisp functions, you won't find a 'make-frame-do-operation' function, which is a bit frustrating. GNU Emacs has a whole collection of operations for making a new frame, and I can run mh-visit-folder in the context of this frame, so it seems like there should be a simple function I could invoke to do this and create my own 'C-x 5 v' binding for 'visit MH-E folder in other frame'.

The clue to what's going on is in the description of C-x 5 5 from the Creating Frames page of the manual, with the emphasis mine:

A more general prefix command that affects the buffer displayed by a subsequent command invoked after this prefix command (other-frame-prefix). It requests the buffer to be displayed by a subsequent command to be shown in another frame.

GNU Emacs frames (and windows) don't run commands and show their output, they display (GNU Emacs) buffers. In order to create a frame, you must have some buffer to display on that frame, and GNU Emacs must know what it is. GNU Emacs has some relatively complex and magical code to implement the 'C-x 5 5' and 'C-x 4 4' prefix commands, but it's all still fundamentally starting from having some buffer to display, not from running a command. The code basically assumes you're running a command that will at some point try to display a buffer, and it hooks into that 'please display this buffer' operation to make the new frame or window and then display the buffer in it.

(Buffers can be created to show files, but they can also be created for a lot of other purposes, including non-file buffers created by ELisp commands that want to present text to you. All of MH-E's buffers are non-file ones, as are things like Magit's information displays.)

The corollary of this is that the most straightforward way to write our own ELisp code to run a command in a new frame is to start out by switching to some buffer in another frame, such as '*scratch*', and then run our command. In an extremely minimal form, this looks like:

(defun mh-visit-folder-other-frame (folder &optional argp)
  "...."
  (interactive [...])
  (switch-to-buffer-other-frame "*scratch*")
  (mh-visit-folder folder argp))

If you know that your command displays a specific buffer, ideally you'll check to see if that buffer exists already and switch to it instead of to some scratch buffer that you're only using because you need to tell Emacs to display some buffer (any buffer) in the new frame.

(In normal GNU Emacs environments you can be pretty confident that there's a *scratch* buffer sitting around. GNU Emacs normally creates it on startup and most people don't delete it. And if you're writing your own code, you can definitely not delete it yourself.)

Now that I've written this entry, maybe I'll remember 'C-x 5 5' and also stop feeling vaguely irritated every time I do the equivalent by hand ('C-x 5 b', pick *scratch*, and then run my command in the newly created frame).

PS: It's probably possible to write a general ELisp function to run another function and make any buffers it wants to show come up on another frame, using the machinery that 'C-x 5 5' does. I will leave writing this function as an exercise for my readers (although maybe it already exists somewhere).

Sometimes giving syndication feed readers good errors is a mistake

By: cks
16 February 2026 at 03:56

Yesterday I wrote about the problem of giving feed readers error messages that people will actually see, because you can't just give them HTML text; in practice you have to wrap your HTML text up in a stub, single-entry syndication feed (and then serve it with a HTTP 200 success code). In many situations you're going to want to do this by replying to the initial feed request with a HTTP 302 temporary redirection that winds up on your stub syndication feed (instead of, say, a general HTML page explaining things, such as "this resource is out of service but you might want to look at ...").

Yesterday I put this into effect for certain sorts of problems, including claimed HTTP User-Agents that are for old browser. Then several people reported that this had caused Feedly to start presenting my feed as the special 'your feed reader is (claiming to be) a too-old browser' single entry feed. The apparent direct cause of this is that Feedly made some syndication feed requests with HTTP User-Agent headers of old versions of Chrome and Firefox, which wound up getting a series of HTTP 302 temporary redirections to my new 'your feed reader is a too-old browser' stub feed. Feedly then decided to switch its main feed fetcher over to directly using this new URL for various feeds, despite the HTTP redirections being temporary (and not served for its main feed fetcher, which uses "Feedly/1.0" for its User-Agent).

Feedly has been making these fake browser User-Agent syndication feed fetch attempts for some time, and for some time they've been getting HTTP 302 redirections. However, up until late yesterday, what Feedly wound up on was a regular HTML web page. I have to assume that since this wasn't a valid syndication feed, Feedly ignored it. Only when I did the right thing to give syndication feed readers a good, useful error result did Feedly receive a valid syndication feed and go over the cliff.

Providing a stub syndication feed to communicate errors and problems to syndication feed fetchers is clearly the technically correct answer. However, I'm now somewhat less convinced that it's the most useful answer in practice. In practice, plenty of syndication feed fetchers keep fetching and re-fetching these stub feeds from me, suggesting that people either aren't seeing them or aren't doing anything about it. And now I've seen a feed reader malfunction spectacularly and in a harmful way because I gave it a valid syndication feed result at the end of a temporary HTTP redirection.

(I will probably stick to the current situation, partly because I no longer feel like accepting bad behavior from web agents.)

PS: If you're a feed fetching system, please give your feeds IDs that you put in the User-Agent, so that when they all wind up shifted to the same URL through some misfortune, the website involved can sort them out and redirect them back to the proper URLs.

The problem of delivering errors to syndication feed readers

By: cks
15 February 2026 at 04:30

Suppose, not hypothetically, that there are some feed readers (or at least things fetching your syndication feeds) that are misbehaving or blocked for one reason or another. You could just serve these feed readers HTTP 403 errors and stop there, but you'd like to be more friendly. For regular web browsers, you can either serve a custom HTTP error page that explains the situation or answer with a HTTP 302 temporary redirection to a regular HTML page with the explanation. Often the HTTP 302 redirection will be easier because you can use various regular means to create the HTML pages (and even host them elsewhere if you want). Unfortunately, this probably leaves syndication feed readers out in the cold.

(This can also come up if, for example, you decommission a syndication feed but want to let people know more about the situation than a simple HTTP 404 would give them.)

As far as I know, most syndication feed readers expect that the reply to their HTTP feed fetching request is in some syndication feed format (Atom, RSS, etc), which they will parse, process, and display to the person involved. If they get a reply in a different format, such as text/html, this is an error and it won't be shown to the person. Possible the HTML <title> element will make it through, or the HTTP status code response for an error, or maybe both. But your carefully written HTML error page is unlikely to be seen.

(Since syndication feed readers need to be able to display HTML in general, they could do something to show people at least the basic HTML text they got back. But I don't think this is very common.)

As a practical thing, if you want people using blocked syndication feed readers to have a chance to see your explanation, you need to reply with a syndication feed with an entry that is your (HTML) message to them (either directly or through HTTP 302 redirections). Creating this stub feed and properly serving it to appropriate visitors may be anywhere from annoying to challenging. Also, you can't reply with HTTP error statuses (and the feed) even though that's arguably the right thing to do. If you want syndication feed readers to process your stub feed, you need to provide it as part of a HTTP 200 reply.

(Speaking from personal experience I can say that hand-writing stub Atom syndication feeds is a pain, and it will drive you to put very little HTML in the result. Which is okay, you can make it mostly a link to your regular HTML page about whatever issue it is.)

If you're writing a syndication feed reader, I urge you to optionally display the HTML of any HTTP error response or regular HTML page that you receive. If I was writing some sort of blog system today, I would make it possible to automatically generate a syndication feed version of any special error page the software could serve to people (probably through some magic HTTP redirection). That way people can write each explanation only once and have it work in both contexts.

The (very) old "repaint mode" GUI approach

By: cks
14 February 2026 at 04:34

Today I ran across another article that talked in passing about "retained mode" versus "immediate mode" GUI toolkits (this one, via), and gave some code samples. As usual when I read about immediate mode GUIs and see source code, I had a pause of confusion because the code didn't feel right. That's because I keep confusing "immediate mode" as used here with a much older approach, which I will call repaint mode for lack of a better description.

A modern immediate mode system generally uses double buffering; one buffer is displayed while the entire window is re-drawn into the second buffer, and then the two buffers are flipped. I believe that modern retained mode systems also tend to use double buffering to avoid screen tearing and other issues (and I don't know if they can do partial updates or have to re-render the entire new buffer). In the old days, the idea of having two buffers for your program's window was a decided luxury. You might not even have one buffer and instead be drawing directly onto screen memory. I'll call this repaint mode, because you directly repainted some or all of your window any time you needed to change anything in it.

You could do an immediate mode GUI without double buffering, in this repaint mode, but it would typically be slow and look bad. So instead people devoted a significant amount of effort to not repainting everything but instead identifying what they were changing and repainting only it, along with any pixels from other elements of your window that had been 'damaged' from prior activity. If you did do a broader repaint, you (or the OS) typically set clipping regions so that you wouldn't actually touch pixels that didn't need to be changed.

(The OS's display system typically needed to support clipping regions in any situation where windows partially overlapped yours, because it couldn't let you write into their pixels.)

One reason that old display systems worked this way is that it required as little memory as possible, which was an important consideration back in the day (which was more or less the 1980s to the early to mid 1990s). People could optimize their repaint code to be efficient and do as little work as possible, but they couldn't materialize RAM that wasn't there. Today, RAM is relatively plentiful and we care a lot more about non-tearing, coherent updates.

The typical code style for a repaint mode system was that many UI elements would normally only issue drawing commands to update or repaint themselves when they were altered. If you had a slider or a text field and its value was updated as a result of input, the code would typically immediately call its repaint function, which could lead to a relatively tight coupling of input handling to the rendering code (a coupling that I believe Model-view-controller was designed to break). Your system had to be capable of a full window repaint, but if you wanted to look good, it wasn't a common operation. A corollary of this is that your code might spend a significant amount of effort working out what was the minimal amount of repainting you needed to do in order to correctly get between two states (and this code could be quite complicated).

(Some of the time this was hidden from you in widget and toolkit internals, although they didn't necessarily give you minimal repaints as you changed widget organization. Also, because a drawing operation was issued right away didn't mean that it took effect right away. In X, server side drawing operations might be batched up to be sent to the X server only when your program was about to wait for more X events.)

Because I'm used to this repaint mode style, modern immediate mode code often looks weird to me. There's no event handler connections, no repaint triggers, and so on, but there is an explicit display step. Alternately, you aren't merely configuring widgets and then camping out in the toolkit's main loop, letting it handle events and repaints for you (the widgets approach is the classical style for X applications, including PyTk applications such as pyhosts).

These days, I suspect that any modern toolkit that still looks like a repaint mode system is probably doing double buffering behind the scenes (unless you deliberately turn that off). Drawing directly to what's visible right now on screen is decidedly out of fashion because of issues like screen tearing, and it's not how modern display systems like Wayland want to operate. I don't know if toolkits implement this with a full repaint on the new buffer, or if they try to copy the old buffer to the new one and selectively repaint parts of it, but I suspect that the former works better with modern graphics hardware.

PS: My view is that even the widget toolkit version of repaint mode isn't a variation of retained mode because the philosophy was different. The widget toolkit might batch up operations and defer redoing layout and repainting things until you either returned to its event loop or asked it to update the display, but you expected a more or less direct coupling between your widget operations and repaints. But you can see it as a continuum that leads to retained mode when you decouple and abstract things enough.

(Now that I've written this down, perhaps I'll stop having that weird 'it's wrong somehow' reaction when I see immediate mode GUI code.)

Testing Linux memory limits is a bit of a pain

By: cks
13 February 2026 at 04:23

For reasons outside of the scope of this entry, I want to test how various systemd memory resource limits work and interact with each other (which means that I'm really digging into cgroup v2 memory controls). When I started trying to do this, it turned out that I had no good test program (or programs), although I had some ones that gave me partial answers.

There are two complexities in memory usage testing programs in a cgroups environment. First, you may be able to allocate more memory than you can actually use, depending on your system's settings for strict overcommit. So it's not enough to see how much memory you can allocate using the mechanism of your choice (I tend to use mmap() rather than go through language allocators). After you've either determined how much memory you can allocate or allocated your target amount, you have to at least force the kernel to materialize your memory by writing something to every page of it. Since the kernel can probably swap out some amount of your memory, you may need to keep repeatedly reading all of it.

The second issue is that if you're not in strict overcommit (and sometimes even if you are), the kernel can let you allocate more memory than you can actually use and then you try to use it, hit you with the OOM killer. For my testing, I care about the actual usable amount of memory, not how much memory I can allocate, so I need to deal with this somehow (and this is where my current test programs are inadequate). Since the OOM killer can't be caught by a process (that's sort of the point), the simple approach is probably to have my test program progressively report on how much memory its touched so far, so I can see how far it got before it was OOM-killed. A more complex approach would be to do the testing in a child process with progress reports back to the parent so it could try to narrow in on how much it could use rather than me guessing that I wanted progress reports every, say, 16 MBytes or 32 MBytes of memory touching.

(Hopefully the OOM killer would only kill the child and not the parent, but with the OOM killer you can never be sure.)

I'm probably not the first person to have this sort of need, so I suspect that other people have written test programs and maybe even put them up somewhere. I don't expect to be able to find them in today's ambient Internet search noise, plus this is very close to the much more popular issue of testing your RAM memory.

(Will I put up my little test program when I hack it up? Probably not, it's too much work to do it properly, with actual documentation and so on. And these days I'm not very enthused about putting more repositories on Github, so I'd need to find some alternate place.)

Undo in Vi and its successors, and my views on the mess

By: cks
12 February 2026 at 04:19

The original Bill Joy vi famously only had a single level of undo (which is part of what makes it a product of its time). The 'u' command either undid your latest change or it redid the change, undo'ing your undo. When POSIX and the Single Unix Specification wrote vi into the standard, they required this behavior; the vi specification requires 'u' to work the same as it does in ex, where it is specified as:

Reverse the changes made by the last command that modified the contents of the edit buffer, including undo.

This is one particular piece of POSIX compliance that I think everyone should ignore.

Vim and its derivatives ignore the POSIX requirement and implement multi-level undo and redo in the usual and relatively obvious way. The vim 'u' command only undoes changes but it can undo lots of them, and to redo changes you use Ctrl-r ('r' and 'R' were already taken). Because 'u' (and Ctrl-r) are regular commands they can be used with counts, so you can undo the last 10 changes (or redo the last 10 undos). Vim can be set to vi compatible behavior if you want. I believe that vim's multi-level undo and redo is the default even when it's invoked as 'vi' in an unconfigured environment, but I can't fully test that.

Nvi has opted to remain POSIX compliant and operate in the traditional vi way, while still supporting multi-level undo. To get multi-level undo in nvi, you extend the first 'u' with '.' commands, so 'u..' undoes the most recent three changes. The 'u' command can be extended with '.' in either of its modes (undo'ing or redo'ing), so 'u..u..' is a no-op. The '.' operation doesn't appear to take a count in nvi, so there is no way to do multiple undos (or redos) in one action; you have to step through them by hand. I'm not sure how nvi reacts if you want do things like move your cursor position during an undo or redo sequence (my limited testing suggests that it can perturb the sequence, so that '.' now doesn't continue undoing or redoing the way vim will continue if you use 'u' or Ctrl-r again).

The vi emulation package evil for GNU Emacs inherits GNU Emacs' multi-level undo and nominally binds undo and redo to 'u' and Ctrl-r respectively. However, I don't understand its actual stock undo behavior. It appears to do multi-level undo if you enter a sequence of 'u' commands and accepts a count for that, but it feels not vi or vim compatible if you intersperse 'u' commands with things like cursor movement, and I don't understand redo at all (evil has some customization settings for undo behavior, especially evil-undo-system). I haven't investigated Evil extensively and this undo and redo stuff makes me less likely to try using it in the future.

The BusyBox implementation of vi is minimal but it can be built with support for 'u' and multi-level undo, which is done by repeatedly invoking 'u'. It doesn't appear to have any redo support, which makes a certain amount of sense in an environment when your biggest concern may be reverting things so they're no worse than they started out. The Ubuntu and Fedora versions of busybox appear to be built this way, but your distance may vary on other Linuxes.

My personal view is that the vim undo and redo behavior is the best and most human friendly option. Undo and redo are predictable and you can predictably intersperse undo and redo operations with other operations that don't modify the buffer, such as moving the cursor, searching, and yanking portions of text. The nvi behavior essentially creates a special additional undo mode, where you have to remember that you're in a sequence of undo or redo operations and you can't necessarily do other vi operations in the middle (such as cursor movement, searches, or yanks). This matters a lot to me because I routinely use multi-level undo when I'm writing text to rewind my buffer to a previous state and yank out some wording that I've decided I like better than its replacement.

(For additional vi versions, on the Fediverse, I was also pointed to nextvi, which appears to use vim's approach to undo and redo; I believe neatvi also does this but I can't spot any obvious documentation on it. There are vi-inspired editors such as vile and vis, but they're not things people would normally use as a direct replacement for vi. I believe that vile follows the nvi approach of 'u.' while vis follows the vim model of 'uu' and Ctrl-r.)

Moving to make many of my SSH logins not report things on login

By: cks
11 February 2026 at 04:32

I've been logging in to Unix machines for what is now quite a long time. When I started, it was traditional for your login process to be noisy. The login process itself would tell you last login details and the 'message of the day' ('motd'), and people often made their shell .profile or .login report more things, so you could see things like:

Last login: Tue Feb 10 22:16:14 2026 from 128.100.X.Y
 22:22:42 up 1 day, 11:22,  3 users,  load average: 0.40, 2.95, 3.30
cks cks cks
[output from fortune elided]
: <host> ;

(There is no motd shown here but it otherwise hits the typical high points, including a quote from fortune. People didn't always use 'fortune' itself but printing a randomly selected quote on login used to be common.)

Many years ago I modified my shell environment on our servers so that it wouldn't report the currently logged in users, show the motd, or tell me my last login. But I kept the 'uptime' line:

$ ssh cs.toronto.edu
 22:26:05 up 209 days,  5:26, 167 users,  load average: 0.47, 0.51, 0.60
: apps0.cs ;

Except, I typically didn't see that. I see this only on full login sessions, and when I was in the office I typically used special tools (also, also, also) that didn't actually start a login session and so didn't show me this greeting banner. Only when I was at home did I do SSH logins (with tooling) and so see this, and I didn't do that very much (because I didn't normally work from home, so I had no reason to be routinely opening windows on our servers).

As a long term result of that 2020 thing I work from home a lot more these days and so I open up a lot more SSH logins than I used to. Recently I was thinking about how to make this feel nicer, and it struck me that one of the things I found quietly annoying was that line from 'uptime' (to the point that sometimes my first action on login was to run 'clear', so I had a clean window). It was the one last thing cluttering up 'give me a new window on host X' and making the home experience visibly different from the office experience.

So far I've taken only a small step forward. I've made it so that I skip running 'uptime' if I'm logging in from home and the load on the machine I'm logging in to is sufficiently low to be uninteresting (which is often the case). As I get used to (or really, accept) this little change, I'll probably slowly move to silence 'uptime' more often.

When I think about it, making this change feels long overdue. Printing out all sorts of things on login made sense in a world where I logged in to places relatively infrequently. But that's not the case in my world any more. My terminal windows are mostly transient and I mostly work on servers that I have to start new windows on, and right from very early I made my office environment not treat them as login sessions, with the full output and everything (if I cared about routinely seeing the load on a server, that's what xload was for (cf)).

(I'm bad about admitting to myself that my usage has shifted and old settings no longer make sense.)

A fun Python puzzle with circular imports

By: cks
10 February 2026 at 04:12

Baptiste Mispelon asked an interesting Python quiz (via, via @glyph):

Can someone explain this #Python import behavior?
I'm in a directory with 3 files:

a.py contains `A = 1; from b import *`
b.py contains `from a import *; A += 1`
c.py contains `from a import A; print(A)`

Can you guess and explain what happens when you run `python c.py`?

I encourage you to guess which of the options in the original post is the actual behavior before you read the rest of this entry.

There are two things going on here. The first thing is what actually happens when you do 'from module import ...'. The short version is that this copies the current bindings of names from one module to another. So when module b does 'from a import *', it copies the binding of a.A to b.A and then the += changes that binding. The behavior would be the same if we used 'from a import A' and 'from b import A' in the code, and if we did we could describe what each did in isolation as starting with 'A = 1' (in a), then 'A = a.A; A += 2' (in b), and then 'A = b.A' (back in a) successively (and then in c, 'A = a.A').

The second thing going on is that you can import incomplete modules (this is true in both Python 2 and Python 3, which return the same results here). To see how this works we need to combine the description of 'import' and 'from' and the approximation of what happens during loading a module, although neither is completely precise. To summarize, when a module is being loaded, the first thing that happens is that a module namespace is created and is added to sys.modules; then the code of the module is executed in that namespace. When Python encounters a 'from', if there is an entry for the module in sys.modules, Python immediately imports things from it; it implicitly assumes that the module is already fully loaded.

At first I was surprised by this behavior, but the more I think about it the more it seems a reasonable choice. It avoids having to explicitly detect circular imports and it makes circular imports work in the simple case (where you do 'import b' and then don't use anything from b until all imports are finished and the program is running). It has the cost that if you have circular name uses you get an unhelpful error message about 'cannot import name' (or 'NameError: name ... is not defined' if you use 'from module import *'):

$ cat a.py
from b import B; A = 10 + B
$ cat b.py
from a import A; B = 20 + A
$ cat c.py
from a import A; print(A)
$ python c.py
[...]
ImportError: cannot import name 'A' from 'a' [...]

(Python 3.13 does print a nice stack trace the points to the whole set of 'from ...' statements.)

Given all of this, here is what I believe is the sequence of execution in Baptiste Mispelon's example:

  1. c.py does 'from a import A', which initiates a load of the 'a' module.
  2. an 'a' module is created and added to sys.modules
  3. that module begins executing the code from a.py, which creates an 'a.A' name (bound to 1) and then does 'from b import *'.
  4. a 'b' module is created and added to sys.modules.
  5. that module begins executing the code from b.py. This code starts by doing 'from a import *', which finds that 'sys.modules["a"]' exists and copies the a.A name binding, creating b.A (bound to 1).
  6. b.py does 'A += 1', which mutates the b.A binding (but not the separate a.A binding) to be '2'.
  7. b.py finishes its code, returning control to the code from a.py, which is still part way through 'from b import *'. This import copies all names (and their bindings) from sys.modules["b"] into the 'a' module, which means the b.A binding (to 2) overwrites the old a.A binding (to 1).
  8. a.py finishes and returns control to c.py, where 'from a import A' can now complete by copying the a.A name and its binding into 'c', make it the equivalent of 'import a; A = a.A; del a'.
  9. c.py prints the value of this, which is 2.

At the end of things, there is all of c.A, a.A, and b.A, and they are bindings to the same object. The order of binding was 'b.A = 2; a.A = b.A; c.A = a.A'.

(There's also a bonus question, where I have untested answers.)

Sidebar: A related circular import puzzle and the answer

Let's take a slightly different version of my error message example above, that simplifies things by leaving out c.py:

$ cat a.py
from b import B; A = 10 + B
$ cat b.py
from a import A; B = 20 + A
$ python a.py
[...]
ImportError: cannot import name 'B' from 'b' [...]

When I first did this I was quite puzzled until the penny dropped. What's happening is that running 'python a.py' isn't creating an 'a' module but instead a __main__ module, so b.py doesn't find a sys.modules["a"] when it starts and instead creates one and starts loading it. That second version of a.py, now in an "a" module, is what tries to refer to b.B and finds it not there (yet).

Systemd and blocking connections to localhost, including via 'any'

By: cks
9 February 2026 at 04:21

I recently discovered a surprising path to accessing localhost URLs and services, where instead of connecting to 127.0.0.1 or the IPv6 equivalent, you connected to 0.0.0.0 (or the IPv6 equivalent). In that entry I mentioned that I didn't know if systemd's IPAddressDeny would block this. I've now tested this, and the answer is that systemd's restrictions do block this. If you set 'IPAddressDeny=localhost', the service or whatever is blocked from the 0.0.0.0 variation as well (for both outbound and inbound connections). This is exactly the way it should be, so you might wonder why I was uncertain and felt I needed to test it.

There are a variety of ways at different levels that you might implement access controls on a process (or a group of processes) in Linux, for IP addresses or anything else. For example, you might create an eBPF program that filtered the system calls and system call arguments allowed and attach it to a process and all of its children using seccomp(2). Alternately, for filtering IP connections specifically, you might use a cgroup socket address eBPF program (also), which are among the the cgroup program types that are available. Or perhaps you'd prefer to use a cgroup socket buffer program.

How a program such as systemd implements filtering has implications for what sort of things it has to consider and know about when doing the filtering. For example, if we reasonably conclude that the kernel will have mapped 0.0.0.0 to 127.0.0.1 by the time it invokes cgroup socket address eBPF programs, such a program doesn't need to have any special handling to block access to localhost by people using '0.0.0.0' as the target address to connect to. On the other hand, if you're filtering at the system call level, the kernel has almost certainly not done such mapping at the time it invokes you, so your connect() filter had better know that '0.0.0.0' is equivalent to 127.0.0.1 and it should block both.

This diversity is why I felt I couldn't be completely sure about systemd's behavior without actually testing it. To be honest, I didn't know what the specific options were until I researched them for this entry. I knew systemd used eBPF for IPAddressDeny (because it mentions that in the manual page in passing), but I vaguely knew there are a lot of ways and places to use eBPF and I didn't know if systemd's way needed to know about 0.0.0.0 or if systemd did know.

Sidebar: What systemd uses

As I found out through use of 'bpftool cgroup list /sys/fs/cgroup/<relevant thing>' on a systemd service that I knew uses systemd IP address filtering, systemd uses cgroup socket buffer programs, and is presumably looking for good and bad IP addresses and netblocks in those programs. This unfortunately means that it would be hard for systemd to have different filtering for inbound connections as opposed to outgoing connections, because at the socket buffer level it's all packets.

(You'd have to go up a level to more complicated filters on socket address operations.)

The original vi is a product of its time (and its time has passed)

By: cks
8 February 2026 at 03:50

Recently I saw another discussion of how some people are very attached to the original, classical vi and its behaviors (cf). I'm quite sympathetic to this view, since I too am very attached to the idiosyncratic behavior of various programs I've gotten used to (such as xterm's very specific behavior in various areas), but at the same time I had a hot take over on the Fediverse:

Hot take: basic vim (without plugins) is mostly what vi should have been in the first place, and much of the differences between vi and vim are improvements. Multi-level undo and redo in an obvious way? Windows for easier multi-file, cross-file operations? Yes please, sign me up.

Basic vi is a product of its time, namely the early 1980s, and the rather limited Unix machines of the time (yes a VAX 11/780 was limited).

(The touches of vim superintelligence, not so much, and I turn them off.)

For me, vim is a combination of genuine improvements in vi's core editing behavior (cf), frustrating (to me) bits of trying too hard to be smart (which I mostly disable when I run across them), and an extension mechanism I ignore but people use to make vim into a superintelligent editor with things like LSP integrations.

Some of the improvements and additions to vi's core editing may be things that Bill Joy either didn't think of or didn't think were important enough. However, I feel strongly that some or even many of omitted features and differences are a product of the limited environments vi had to operate in. The poster child for this is vi's support of only a single level of undo, which drastically constrains the potential memory requirements (and implementation complexity) of undo, especially since a single editing operation in vi can make sweeping changes across a large file (consider a whole-file ':...s/../../' substitution, for example).

(The lack of split windows might be one part memory limitations and one part that splitting an 80 by 24 serial terminal screen is much less useful than splitting, say, an 80 by 50 terminal window.)

Vim isn't the only improved version of vi that has added features like multi-level undo and split windows so you can see multiple files at once (or several parts of the same file); there's also at least nvi. I'm used to vim so I'm biased, but I happen to think that a lot of vim's choices for things like multi-level undo are good ones, ones that will be relatively obvious and natural to new people and avoid various sorts of errors and accidents. But other people like nvi and I'm not going to say they're wrong.

I do feel strongly that giving stock vi to anyone who doesn't specifically ask for it is doing them a disservice, and this includes installing stock vi as 'vi' on new Unix installs. At this point, what new people are introduced to and what is the default on systems should be something better and less limited than stock vi. Time has moved on and Unix systems should move on with it.

(I have similar feelings about the default shell for new accounts for people, as opposed to system accounts. Giving people bare Bourne shell is not doing them any favours and is not likely to make a good first impression. I don't care what you give them but it should at least support cursor editing, file completion, and history, and those should be on by default.)

PS: I have complicated feelings about Unixes that install stock vi as 'vi' and something else under its full name, because on the one hand that sounds okay but on the other hand there is so much stuff out there that says to use 'vi' because that's the one name that's universal. And if you then make 'vi' the name of the default (visual) editor, well, it certainly feels like you're steering new people into it and doing them a disservice.

(I don't expect to change the mind of any Unix that is still shipping stock vi as 'vi'. They've made their cultural decisions a long time ago and they're likely happy with the results.)

How we failed to notice a power failure

By: cks
7 February 2026 at 04:25

Over on the Fediverse, I mentioned that we once missed noticing that there had been a power failure. Naturally there is a story there (and this is the expanded version of what I said in the Fediverse thread). A necessary disclaimer is that this was all some time ago and I may be mangling or mis-remembering some of the details.

My department is spread across multiple buildings, one of which has my group's offices and our ancient machine room (which I believe has been there since the building burned down and was rebuilt). But for various reasons, this building doesn't have any of the department's larger meeting rooms. Once upon a time we had a weekly meeting of all the system administrators (and our manager), both my group and all of the Points of Contact, which amounted to a dozen people or so and needed one of the larger meeting rooms, which was of course in a different building than our machine room.

As I was sitting in the meeting room during one weekly meeting, fiddling around, I tried to get my Linux laptop on either our wireless network or our wired laptop network (it's been long enough that I can't remember which). This was back in the days when networking on Linux laptops wasn't a 100% reliable thing, especially wireless, so I initially assumed that my inability to get on the network was the fault of my laptop and its software. Only after a bit of time and also failing on both wired and wireless networking did I ask to see if anyone else (with a more trustworthy laptop) could get on the network. As a ripple of "no, not me" spread around the room, we realized that something was wrong.

(This was in the days before smartphones were pervasive, and also it must have been before the university-wide wireless network was available in that meeting room.)

What was wrong turned out to be a short power failure that had been isolated to the building that our machine room was in. Had people been in their offices, the problem would have been immediately obvious; we'd have seen all networking fail, and the people in the building would have seen the lights go out and so on. But because the power issue hit at exactly the time that we were all in our weekly meeting in a different building, we missed it.

(My memory is that by the time we'd reached the machine room the power was coming back, but obviously we had a variety of work to do to clean the situation up so that was it for the meeting.)

For extra irony, the building we were meeting in was right next to our machine room's building, and the meeting room had a window that literally looked across the alleyway at our building. At least that made it quick and easy to get to the machine room, because we could just walk across the bridge that connects the two buildings.

PS: In our environment, this is such a rare collection of factors that it's not worth trying to set up some sort of alerting for it, especially today in a world with pervasive smartphones (where people outside the meeting room can easily send some of us messages, even with the network down).

(Also, these days we don't normally have such big meetings any more and if we did, they'd be virtual meetings and we'd definitely notice bits of the network going down, one way or another.)

A surprising path to accessing localhost URLs and HTTP services

By: cks
6 February 2026 at 03:43

One of the classic challenges in web security is DNS rebinding. The simple version is that you put some web service on localhost in order to keep outside people from accessing it, and then some joker out in the world makes 'evil.example.org' resolve to 127.0.0.1 and arranges to get you to make requests to it. Sometimes this is through JavaScript in a browser, and sometimes this is by getting you to fetch things from URLs they supply (because you're running a service that fetches and processes things from external URLs, for example).

One way people defend against this is by screening out 127.0.0.0/8, IPv6's ::1, and other dangerous areas of IP address space from DNS results (either in the DNS resolver or in your own code). And you can also block URLs with these as explicit IP addresses, or 'localhost' or the like. Sometimes you might add extra security restrictions to a process or an environment through means like Linux eBPF to screen out which IP addresses you're allowed to connect to (cf, and I don't know whether systemd's restrictions would block this).

As I discovered the other day, if you connect to INADDR_ANY, you connect to localhost (which any number of people already knew). Then in a comment Kevin Lyda reminded me that INADDR_ANY is also known as 0.0.0.0, and '0' is often accepted as a name that will turn into it, resulting in 'ssh 0' working and also (in some browsers) 'http://0:<port>/'. The IPv6 version of INADDR_ANY is also an all-zero address, and '::0' and '::' are both accepted as names for it, and then of course it's easy to create DNS records that resolve to either the IPv4 or IPv6 versions. As I said on the Fediverse:

Surprise: blocking DNS rebinding to localhost requires screening out more than 127/8 and ::1 answers. This is my face.

It turns out that this came up in mid 2024 in the browser context, as '0.0.0.0 Day' (cf). Modern versions of Chrome and Safari apparently explicitly block requests to 0.0.0.0 (and presumably also the IPv6 version), while Firefox will still accept it. And of course your URL-fetching libraries will almost certainly also accept it, especially through DNS lookups of ordinary looking but attacker controlled hostnames.

In my view, it's not particularly anyone's fault that this slipped through the cracks, both in browsers and in tools that handle fetching content from potentially hostile URLs. The reality of life is that how IP behaves in practice is complicated and some of it is historical practice that's been carried forward and isn't necessarily obvious or well known (and certainly isn't standardized). Then URLs build on top of this somewhat rickety foundation and surprises happen.

(This is related to the issue of browsers being willing to talk to 'local' IPs, which Chrome once attempted to start blocking (and I believe that shipped, but I don't use Chrome any more so I don't know what the current state is).)

The meaning of connecting to INADDR_ANY in TCP and UDP

By: cks
5 February 2026 at 02:55

An interesting change to IP behavior landed in FreeBSD 15, as I discovered by accident. To quote from the general networking section of the FreeBSD 15 release notes:

Making a connection to INADDR_ANY, i.e., using it as an alias for localhost, is now disabled by default. This functionality can be re-enabled by setting the net.inet.ip.connect_inaddr_wild sysctl to 1. cd240957d7ba

The change's commit message has a bit of a different description:

Previously connect() or sendto() to INADDR_ANY reached some socket bound to some host interface address. Although this was intentional it was an artifact of a different era, and is not desirable now.

This is connected to an earlier change and FreeBSD bugzilla #28075, which has some additional background and motivation for the overall change (as well as the history of this feature in 4.x BSD).

The (current) Linux default behavior matches the previous FreeBSD behavior. If you had something listening on localhost (in IPv4, specifically 127.0.0.1) or listening on INADDR_ANY, connecting to INADDR_ANY would reach it and give the source of your connection a localhost address (either 127.0.0.1 or ::1 depending on IPv4 versus IPv6). Obviously the current FreeBSD default behavior has now changed, and the Linux behavior may change at some point (or at least become something that can be changed by a sysctl).

(Linux specifically restricts you to connecting to 127.0.0.1; you can't reach a port listening on, eg, 127.0.0.10, although that is also a localhost address.)

One of the tricky API issues here is that higher level APIs can often be persuaded or tricked into using INADDR_ANY by default when they connect to something. For example, in Go's net package, if you leave the hostname blank, you currently get INADDR_ANY (which is convenient behavior for listening but not necessarily for connecting). In other APIs, your address variable may start with an initial zero value for the target IP address, which is INADDR_ANY for IPv4; if your code never sets it (perhaps because the 'host' is a blank string), you get a connection to INADDR_ANY and thus to localhost. In top of that, a blank host name to connect to may have come about through accident or through an attacker's action (perhaps they can make decoding or parsing the host name fail, leaving the 'host name' blank on you).

I believe that what's happening with Go's tests is that the net package guarantees that things like net.Dial("tcp", ":<port>") connect to localhost, so of course the net package has tests to insure that this stays working. Currently, Go's net package implements this behavior by mapping a blank host to INADDR_ANY, which has traditionally worked and been the easiest way to get the behavior Go wants. It also means that Go can use uniform parsing of 'host:port' for both listening, where ':port' is required to mean listening on INADDR_ANY, and for connecting, where the host has to be localhost. Since this is a high level API, Go can change how the mapping works, and it pretty much has to in order to fully work as documented on FreeBSD 15 in a stock configuration.

(Because that would be a big change to land right before the release of Go 1.26, I suspect that the first bugfix that will land is to skip these tests on FreeBSD, or maybe only on FreeBSD 15+ if that's easy to detect.)

I prefer to pass secrets between programs through standard input

By: cks
4 February 2026 at 04:12

There are a variety of ways to pass secrets from one program to another on Unix, and many of them may expose your secrets under some circumstances. A secret passed on the command line is visible in process listings; a secret passed in the environment can be found in the process's environment (which can usually be inspected by outside parties). When I've had to deal with this in administrative programs in our environment, I have reached for an old Unix standby: pass the secret between programs through file descriptors, specifically standard input and standard output. This can even be used and done in shell scripts. However, there are obviously some cautions, both in general and in shell scripts.

Although Bourne shell script variables look like environment variables, they aren't exported into the environment until you ask for this with 'export'. Naturally you should never do this for the environment variables that hold secrets. Also, these days 'echo' is a built-in in any version of the Bourne shell you want to use, so 'echo $somesecret' does not actually run a process that has the secret visible in its command line arguments. However, you have to be careful what commands you use here, because potentially convenient ones like printf aren't builtin and can't be used like this.

As a general caution, you need to either limit the characters that are allowed in secrets or encode the secret somehow (you might as well use base64). If you need to pass more than one thing between your programs this way, you'll need to define a very tiny protocol, if only so that you write down the order that things are sent between programs (and if they are, for example, newline-delimited).

One advantage of passing secrets this way is that it's easy to pass them from machine to machine through mechanisms like SSH (if you have passwordless SSH). Instead of 'provide-secret | consume-secret', you can simply change to 'provide-secret | ssh remote consume-secret'.

In the right (Unix) environment it's possible to pass secrets this way to programs that want to read them from a file, using features like Bash's '<(...)' notation or the underlying Unix features that enable that Bash feature (specifically, /dev/fd).

Passing secrets between programs this way can seem a little janky and improper, but I can testify that it works. We have a number of things that move secrets around this way, including across machines, and they've been doing it for years without problems.

(There are fancy ways to handle this on Linux for some sorts of secrets, generally static secrets, but I don't know of any other generally usable way of doing this for dynamic secrets that are generated on the fly, especially if some of the secrets consumers are shell scripts. But you probably could write a D-Bus based system to do this with all sorts of bells and whistles, if you had to do it a lot and wanted something more professional looking.)

The consoles of UEFI, serial and otherwise, and their discontents

By: cks
3 February 2026 at 03:07

UEFI is the modern firmware standard for x86 PCs and other systems; sometimes the actual implementation is called a UEFI BIOS, but the whole area is a bit confusing. I recently wrote about getting FreeBSD to use a serial console on a UEFI system and mentioned that some UEFI BIOSes could echo console output to a serial port, which caused Greg A. Woods to ask a good question in a comment:

So, how does one get a typical UEFI-supporting system to use a serial console right from the firmware?

The mechanical answer is that you go into your UEFI BIOS settings and see if it has any options for what is usually called 'console redirection'. If you have it, you can turn it on and at that point the UEFI console will include the serial device you picked, theoretically allowing both output and input from the serial device. This is very similar to the 'console redirection' option in 'legacy' pre-UEFI BIOSes, although it's implemented rather differently. An important note here is that UEFI BIOS console redirection only applies to things using the UEFI console. Your UEFI BIOS definitely uses the UEFI console, and your UEFI operating system boot loader hopefully does. Your operating system almost certainly doesn't.

A UEFI BIOS doesn't need to have such an option and typical desktop ones probably don't. The UEFI standard provides a standard set of ways to implement console redirection (and alternate console devices in general), but UEFI doesn't require it; it's perfectly standard compliant for a UEFI BIOS to only support the video console. Even if your UEFI BIOS provides console redirection, your actual experience of trying to use it may vary. Watching boot output is likely to be fine, but trying to interact with the BIOS from your serial port may be annoying.

How all of this works is that UEFI has a notion of an EFI console, which is (to quote the documentation) "used to handle input and output of text-based information intended for the system user during the operation of code in the boot services environment". The EFI console is an abstract thing, and it's also some globally defined variables that include ConIn and ConOut, the device paths of the console input and output device or devices. Device paths can include multiple sub-devices (in generic device path structures), and one of the examples specifically mentioned is:

[...] An example of this would be the ConsoleOut environment variable that consists of both a VGA console and serial output console. This variable would describe a console output stream that is sent to both VGA and serial concurrently and thus has a Device Path that contains two complete Device Paths. [...]

(Sometimes this is 'ConsoleIn' and 'ConsoleOut', eg, and sometimes 'ConIn' and 'ConOut'. Don't ask me why.)

In theory, a UEFI BIOS can hook a wide variety of things up to ConIn, ConOut, or both, as it decides (and implements), possibly including things like IPv4 connections. In practice it's up to the UEFI BIOS to decide what it will bother to support. Server UEFI BIOSes will typically support serial console redirection, which is to say connecting some serial port to ConIn and ConOut in addition to the VGA console. Desktop motherboard UEFI BIOSes probably won't. I don't know if there are very many server UEFI BIOSes that will use only the serial console and exclude the VGA console from ConIn and ConOut.

(Also in theory I believe a UEFI BIOS could wire up ConOut to include a serial port but not connect it to ConIn. In practice I don't know of any that do.)

EFI also defines a protocol (a set of function calls) for console input and output. For input, what people (including the UEFI BIOS itself) get back is either or both of an EFI scan code or a Unicode character. The 'EFI scan code' is used to determine what special key you typed, for example F11 to go into some UEFI BIOS setup mode. The UEFI standard also has an appendix with examples of mapping various sorts of input to these EFI scan codes, which is very relevant for entering anything special over a serial console.

If you look at this appendix B, you'll note that it has entries for both 'ANSI X3.64 / DEC VT200-500 (8-bit mode)' and 'VT100+ (7-bit mode)'. Now you have two UEFI BIOS questions. First, does your UEFI BIOS even implement this, or does it either ignore the whole issue (leaving you with no way to enter special characters) or come up with its own answers? And second, does your BIOS restrict what it recognizes over the serial port to just whatever type it's set the serial port to, or will it recognize either sequence for something like F11? The latter question is very relevant because your terminal emulator environment may or may not generate what your UEFI BIOS wants for special keys like F11 (or it may even intercept some keys, like F11; ideally you can turn this off).

(Another question is what your UEFI BIOS may call the option that controls what serial port key mapping it's using. One machine I've tested on calls the setting "Putty KeyPad" and the correct value for the "ANSI X3.64" version is "XTERMR6", for example, which corresponds to what xterm, Gnome-Terminal and probably other modern terminal programs send.)

Another practical issue is that if you do anything fancy with a UEFI serial console, such as go into the BIOS configuration screens, your UEFI BIOS may generate output that assumes a very specific and unusual terminal resolution. For instance, the Supermicro server I've been using for my FreeBSD testing appears to require a 100x30 terminal in its BIOS configuration screens; if you have any other resolution you get various sorts of jumbled results. Many of our Dell servers take a different approach, where the moment you turn on serial console redirection they choke their BIOS configuration screens down to an ASCII 80x24 environment. OS boot environments may be more forgiving in various ways.

The good news is that your operating system's bootloader will probably limit itself to regular characters, and in practice what you care about a lot of the time is interacting with the bootloader (for example, for alternate boot and disaster recovery), not your UEFI BIOS.

As FreeBSD discusses in loader.efi(8), it's not necessarily straightforward for an operating system boot loader to decode what the UEFI ConIn and ConOut are connected to in order to pass the information to the operating system (which normally won't be using UEFI to talk to its console(s)). This means that the UEFI BIOS console(s) may not wind up being what the OS console(s) are, and you may have to configure them separately.

PS: As you may be able to tell from what I've written here, if you care significantly about UEFI BIOS access from the serial port, you should expect to do a bunch of experimentation with your specific hardware. Remember to re-check your results with new server generations and new UEFI BIOS firmware versions.

Estimating where your Prometheus Blackbox TCP query-response check failed

By: cks
2 February 2026 at 04:20

As covered recently, the normal way to check simple services from outside in a Prometheus environment is with Prometheus Blackbox, which is somewhat complicated to understand. One of its abstractions is a prober, a generic way of checking some service using HTTP, DNS queries, a TCP connection, and so on. The TCP prober supports conducting a query-response dialog once you connect, but currently (as of Blackbox 0.28.0) it doesn't directly expose metrics that tell you where your TCP probe with a query-response set failed (and why), and sometimes you'd like to know.

A somewhat typical query-response probe looks like this:

  smtp_starttls:
    prober: tcp
    tcp:
      query_response:
        - expect: "^220"
        - send: "EHLO something\r"
        - expect: "^250-STARTTLS"
        - expect: "^250 "
        - send: "STARTTLS\r"
        - expect: "^220"
        - starttls: true
        - expect: "^220"
        - send: "QUIT\r"

To understand what metrics we can look for on failure, we need to both understand how each important option in a step can fail, and what metrics they either set on failure or create when they succeed.

  • starttls will fail if it can't successfully negotiate a TLS connection with the server, possibly including if the server's TLS certificate fails to verify. It sets no metrics on failure, but on success it will set various TLS related metrics such as the probe_ssl_* family and probe_tls_version_info.

  • send will fail if there is an error sending the line, such as the TCP connection closing on you. It sets no metrics on either success or failure.

  • expect reads lines from the TCP connection until either a line matches your regular expression, it hits EOF, or it hits a network error. If it hit a network error, including from the other end abruptly terminating the connection in a way that raises a local error, it sets no metrics. If it hit EOF, it sets the metric probe_failed_due_to_regex to 1; if it matched a line, it sets that metric to 0.

    One important case of 'network error' is if the check you're doing times out. This is internally implemented partly by putting a (Go) deadline on the TCP connection, which will cause an error if it runs too long. Typical Blackbox module timeouts aren't very long (how long depends on both configuration settings and how frequent your checks are; they have to be shorter than the check interval).

    If you have multiple 'expect' steps and you check fails at one of them, there's (currently) no way to find out which one it failed at unless you can determine this from other metrics, for example the presence or absence of TLS metrics.

  • expect_bytes fails if it doesn't immediately read those bytes from the TCP connection. If it failed because of an error or because it read fewer bytes than required (including no bytes, ie an EOF), it sets no metrics. If it read enough bytes it sets the probe_failed_due_to_bytes metric to either 0 (if they matched) or 1 (if they didn't).

In many protocols, the consequences of how expect works means that if the server at the other end spits out some error response instead of the response you expect, your expect will skip over it and then wait endlessly. For instance, if the SMTP server you're probing gives you a SMTP 4xx temporary failure response in either its greeting banner or its reply to your EHLO, your 'expect' will sit there trying to read another line that might start with '220'. Eventually either your check will time out or the SMTP server will, and probably it will be your check (resulting in a 'network error' that leaves no traces in metrics). Generally this means you can only see a probe_failed_due_to_regex of 1 in a TCP probe based module if the other end cleanly closed the connection, so that you saw EOF. This tends to be pretty rare.

(We mostly see it for SSH probes against overloaded machines, where we connect but then the SSH daemon immediately closes the connection without sending the banner, giving us an EOF in our 'expect' for the banner.)

If the probe failed because of a DNS resolution failure, I believe that probe_ip_addr_hash will be 0 and I think probe_ip_protocol will also be 0.

If the check involves TLS, the presence of the TLS metrics in the result means that you got a connection and got as far as starting TLS. In the example above, this would mean that you got almost all of the way to the end.

I'm not sure if there's any good way to detect that the connection attempt failed. You might be able to reasonably guess that from an abnormally low probe_duration_seconds value. If you know the relevant timeout values, you can detect a probe that failed due to timeout by looking for a suitably high probe_duration_seconds value.

If you have some use of the special labels action, then the presence of a probe_expect_info metric means that the check got to that step. If you don't have any particular information that you want to capture from an expect line, you can use labels (once) to mark that you've succeeded at some expect step by using a constant value for your label.

(Hopefully all of this will improve at some point and Blackbox will provide, for example, a metric that tells you the step number that a query-response block failed on. See issue #1528, and also issue #1527 where I wish for a way to make an 'expect' fail immediately and definitely if it receives known error responses, such as a SMTP 4xx code.)

Early Linux package manager history and patching upstream source releases

By: cks
1 February 2026 at 03:19

One of the important roles of Linux system package managers like dpkg and RPM is providing a single interface to building programs from source even though the programs may use a wide assortment of build processes. One of the source building features that both dpkg and RPM included (I believe from the start) is patching the upstream source code, as well as providing additional files along with it. My impression is that today this is considered much less important in package managers, and some may make it at least somewhat awkward to patch the source release on the fly. Recently I realized that there may be a reason for this potential oddity in dpkg and RPM.

Both dpkg and RPM are very old (by Linux standards). As covered in Andrew Nesbitt's Package Manager Timeline, both date from the mid-1990s (dpkg in January 1994, RPM in September 1995). Linux itself was quite new at the time and the Unix world was still dominated by commercial Unixes (partly because the march of x86 PCs was only just starting). As a result, Linux was a minority target for a lot of general Unix free software (although obviously not for Linux specific software). I suspect that this was compounded by limitations in early Linux libc, where apparently it had some issues with standards (see eg this, also, also, also).

As a minority target, I suspect that Linux regularly had problems compiling upstream software, and for various reasons not all upstreams were interested in fixing (or changing) that (especially if it involved accepting patches to cope with a non standards compliant environment; one reply was to tell Linux to get standards compliant). This probably left early Linux distributions regularly patching software in order to make it build on (their) Linux, leading to first class support for patching upstream source code in early package managers.

(I don't know for sure because at that time I wasn't using Linux or x86 PCs, and I might have been vaguely in the incorrect 'Linux isn't Unix' camp. My first Linux came somewhat later.)

These days things have changed drastically. Linux is much more standards compliant and of course it's a major platform. Free software that works on non-Linux Unixes but doesn't build cleanly on Linux is a rarity, so it's much easier to imagine (or have) a package manager that is focused on building upstream source code unaltered and where patching is uncommon and not as easy (or trivial) as dpkg and RPM make it.

(You still need to be able to patch upstream releases to handle security patches and so on, since projects don't necessarily publish new releases for them. I believe some projects simply issue patches and tell you to apply them to their current release. And you may have to backport a patch yourself if you're sticking on an older release of the project that they no longer do patches for.)

Making a FreeBSD system have a serial console on its second serial port

By: cks
31 January 2026 at 04:57

Over on the Fediverse I said:

Today's other work achievement: getting a UEFI booted FreeBSD 15 machine to use a serial console on its second serial port, not its first one. Why? Because the BMC's Serial over Lan stuff appears to be hardwired to the second serial port, and life is too short to wire up physical serial cables to test servers.

The basics of serial console support for your FreeBSD machine are covered in the loader.conf manual page, under the 'console' setting (in the 'Default Settings' section). But between UEFI and FreeBSD's various consoles, things get complicated, and for me the manual pages didn't do a great job of putting the pieces together clearly. So I'll start with my descriptions of all of the loader.conf variables that are relevant:

console="efi,comconsole"
Sets both the bootloader console and the kernel console to both the EFI console and the serial port, by default COM1 (ttyu0, Linux ttyS0). This is somewhat harmful if your UEFI BIOS is already echoing console output to the serial port (or at least to the serial port you want); you'll get doubled serial output from the FreeBSD bootloader, but not doubled output from the kernel.

boot_multicons="YES"
As covered in loader_simp(8), this establishes multiple low level consoles for kernel messages. It's not necessary if your UEFI BIOS is already echoing console output to the serial port (and the bootloader and kernel can recognize this), but it's harmless to set it just in case.

comconsole_speed="115200"
Sets the serial console speed (and in theory 115200 is the default). It's not necessary if the UEFI BIOS has set things up but it's harmless. See loader_simp(8) again.

comconsole_port="0x2f8"
Sets the serial port used to COM2. It's not necessary if the UEFI BIOS has set things up, but again it's harmless. You can use 0x3f8 to specify COM1, although it's the default. See loader_simp(8).

hw.uart.console="io:0x2f8,br:115200"
This tells the kernel where the serial console is and what baud rate it's at, here COM2 and 115200 baud. The loader will automatically set it for you if you set the comconsole_* variables, either because you also need a 'console=' setting or because you're being redundant. See loader.efi(8) (and then loader_simp(8) and uart(4)).

(That the loader does this even without a 'comconsole' in your nonexistent 'console=' line may some day be considered a bug and fixed.)

If they agree with each other, you can safely set both hw.uart.console and the comconsole_* variables.

On a system where the UEFI BIOS isn't echoing the UEFI console output to a serial port, the basic version of FreeBSD using both the video console (settings for which are in vt(4)) and the serial console (on the default of COM1), with the primary being the video console, is a loader.conf setting of:

console="efi,comconsole"
boot_multicons="YES"

This will change both the bootloader console and the kernel console after boot. If your UEFI BIOS is already echoing 'console' output to the serial port, bootloader output will be doubled and you'll get to see fun bootloader output like:

LLooaaddiinngg  ccoonnffiigguurreedd  mmoodduulleess......

If you see this (or already know that your UEFI BIOS is doing this), the minimal alternate loader.conf settings (for COM1) are:

# for COM1 / ttyu0
hw.uart.console="io:0x3f8,br:115200"

(The details are covered in loader.efi(8)'s discussion of console considerations.)

If you don't need a 'console=' setting because of your UEFI BIOS, you must set either hw.uart.console or the comconsole_* settings. Technically, setting hw.uart.console is the correct approach; that setting only comconsole_* still works may be a bug.

If you don't explicitly set a serial port to use, FreeBSD will use COM1 (ttyu0, Linux ttyS0) for the bootloader and kernel. This is only possible if you're using 'console=', because otherwise you have to directly or indirectly set 'hw.uart.console', which directly tells the kernel which serial port to use (and the bootloader will use whatever UEFI tells it to). To change the serial port to COM2, you need to set the appropriate one of 'comconsole_port' and 'hw.uart.console' from 0x3f8 (COM1) to the right PC port value of 0x2f8.

So our more or less final COM2 /boot/loader.conf for a case where you can turn off or ignore the BIOS echoing to the serial console is:

console="efi,comconsole"
boot_multicons="YES"
comconsole_speed="115200"
# For the COM2 case
comconsole_port="0x2f8"

If your UEFI BIOS is already echoing 'console' output to the serial port, the minimal version of the above (again for COM2) is:

# For the COM2 case
hw.uart.console="io:0x2f8,br:115200"

(As with Linux, the FreeBSD kernel will only use one serial port as the serial console; you can't send kernel messages to two serial ports. FreeBSD at least makes this explicit in its settings.)

As covered in conscontrol and elsewhere, FreeBSD has a high level console, represented by /dev/console, and a low level console, used directly by the kernel for things like kernel messages. The high level console can only go to one device, normally the first one; this is either the first one in your 'console=' line or whatever UEFI considers the primary console. The low level console can go to multiple devices. Unlike Linux, this can be changed on the fly once the system is up through conscontrol (and also have its state checked).

Conveniently, you don't need to do anything to start a serial login on your chosen console serial port. All four possible (PC) serial ports, /dev/ttyu0 through /dev/ttyu3, come pre-set in /etc/ttys with 'onifconsole' (and 'secure'), so that if the kernel is using one of them, there's a getty started on it. I haven't tested what happens if you use conscontrol to change the console on the fly.

Booting FreeBSD on a UEFI based system is covered through the manual page series of uefi(8), boot(8), loader.efi(8), and loader(8). It's not clear to me if loader.efi is the EFI specific version of loader(8), or if the one loads and starts the other in a multi-stage boot process. I suspect it's the former.

Sidebar: What we may wind up with in loader.conf

Here's what I think is a generic commented block for serial console support:

# Uncomment if the UEFI BIOS does not echo to serial port
#console="efi,comconsole"
boot_multicons="YES"
comconsole_speed="115200"
# Uncomment for COM2
#comconsole_port="0x2f8"
# change 0x3f8 (COM1) to 0x2f8 for COM2
hw.uart.console="io:0x3f8,br:115200"

All of this works for me on FreeBSD 15, but your distance may vary.

Why I'm ignoring pretty much all new Python packaging tools

By: cks
30 January 2026 at 03:59

One of the things going on right now is that Python is doing a Python developer survey. On the Fediverse, I follow a number of people who do Python stuff, and they've been posting about various aspects of the survey, including a section on what tools people use for what. This gave me an interesting although very brief look into a world that I'm deliberately ignoring, and I'm doing that because I feel my needs are very simple and are well met by basic, essentially universal tools that I already know and have.

Although I do some small amount of Python programming, I'm not a Python developer; you could call me a consumer of Python things, both programs and packages. The thing I do most is use programs written in Python that aren't single-file, dependency free things, almost always for my own personal use (for example, asncounter and the Python language server). The tool I use for almost all of these is pipx, which I feel handles pretty much everything I could ask for and comes pre-packaged in most Linuxes. Admittedly I've written some tools to make my life nicer.

(One important think pipx does is install each program separately. This allows me to remove one clearly and also to use PyPy or CPython as I prefer on a program by program basis.)

For programs that we want to use as part of our operations (for example), the modern, convenient approach is to make a venv and then install the program into it with pip. Pip is functionally universal and the resulting venvs effectively function as self contained artifacts that can be moved or put anywhere (provided that we stick to the same Ubuntu LTS version). So far we haven't tried to upgrade these in place; if a new version of the program comes out, we build a new venv and swap which one is used.

(It's possible that package dependencies of the program could be updated even if it hasn't released a new version, but we treat these built venvs as if they were compiled binaries; once produced, they're not modified.)

Finally, our Django based web application now uses a Django setup where Django is installed into a venv and then the production tree of our application lives outside that venv (previously we didn't use venvs at all but that stopped working). Our application isn't versioned or built into a Python artifact; it's a VCS tree and is managed through VCS operations. The Django venv is created separately, and I use pip for that because again pip is universal and familiar. This is a crude and brute force approach but it's also ensured that I haven't had to care about the Python packaging ecosystem (and how to make Python packages) for the past fifteen years. At the moment we use only standard Django without any third party packages that we'd also have to add to the venv and manage, and I expect that we're going to stay that way. A third party package would have to be very attractive (or become extremely necessary) in order for us to take it on and complicate life.

I'm broadly aware that there are a bunch of new Python package management and handling tools that go well beyond pip and pipx in both performance and features. My feeling so far is that I don't need anything more than I have and I don't do the sort of regular Python development where the extra features the newer tools have would make a meaningful difference. And to be honest, I'm wary of some or all of these turning out to be a flavour of the month. My mostly outside impression is that Python packaging and package management has had a great deal of churn over the years, and from seeing the Go ecosystem go through similar things from closer up I know that being stuck with a now abandoned tool is not particularly fun. Pip and pipx aren't the modern hot thing but they're also very unlikely to go away.

Why Linux wound up with system package managers

By: cks
29 January 2026 at 04:37

Yesterday I discussed the two sorts of program package managers, system package managers that manage the whole system and application package managers that mostly or entirely manage third party programs. Commercial Unix got application package managers in the very early 1990s, but Linux's first program managers were system package managers, in dpkg and RPM (or at least those seem to be the first Linux package managers).

The abstract way to describe why is to say that Linux distributions had to assemble a whole thing from separate pieces; the kernel came from one place, libc from another, coreutils from a third, and so on. The concrete version is to think about what problems you'd have without a package manager. Suppose that you assembled a directory tree of all of the source code of the kernel, libc, coreutils, GCC, and so on. Now you need to build all of these things (or rebuild, let's ignore bootstrapping for the moment).

Building everything is complicated partly because everything goes about it differently. The kernel has its own configuration and build system, a variety of things use autoconf but not necessarily with the same set of options to control things like features, GCC has a multi-stage build process, Perl has its own configuration and bootstrapping process, X is frankly weird and vaguely terrifying, and so on. Then not everyone uses 'make install' to actually install their software, so you have another set of variations for all of this.

(The less said about the build processes for either TeX or GNU Emacs in the early to mid 1990s, the better.)

If you do this at any scale, you need to keep track of all of this information (cf) and you want a uniform interface for 'turn this piece into a compiled and ready to unpack blob'. That is, you want a source package (which encapsulates all of the 'how to do it' knowledge) and a command that takes a source package and does a build with it. Once you're building things that you can turn into blobs, it's simpler to always ship a new version of the blob whenever you change anything.

(You want the 'install' part of 'build and install' to result in a blob rather than directly installing things on your running system because until it finishes, you're not entirely sure the build and install has fully worked. Also, this gives you an easy way to split overall system up into multiple pieces, some of which people don't have to install. And in the very early days, to split them across multiple floppy disks, as SLS did.)

Now you almost have a system package manager with source packages and binary packages. You're building all of the pieces of your Linux distribution in a standard way from something that looks a lot like source packages, and you pretty much want to create binary blobs from them rather than dump everything into a filesystem. People will obviously want a command that takes a binary blob and 'installs' it by unpacking it on their system (and possibly extra stuff), rather than having to run 'tar whatever' all the time themselves, and they'll also want to automatically keep track of which of your packages they've installed rather than having to keep their own records. Now you have all of the essential parts of a system package manager.

(Both dpkg and RPM also keep track of which package installed what files, which is important for upgrading and removing packages, along with things having versions.)

The two subtypes of one sort of package managers, the "program manager"

By: cks
28 January 2026 at 02:08

I've written before that one of the complications of talking about package managers and package management is that there are two common types of package managers, program managers (which manage installed programs on a system level) and module managers (which manage package dependencies for your project within a language ecosystem or maybe a broader ecosystem). Today I realized that there is a further important division within program managers. I will call this division application (package) managers and system (package) managers.

A system package manager is what almost all Linux distributions have (in the form of Debian's dpkg and its set of higher level tools, Fedora's RPM and its set of higher level tools, Arch's pacman, and so on). It manages everything installed by the distribution on the system, from the kernel all the way up to the programs that people run to get work done, but certainly including what we think of as system components like the core C library, basic POSIX utilities, and so on. In modern usage, all updates to the system are done by shipping new package versions, rather than by trying to ship 'patches' that consist of only a few changed files or programs.

(Some Linux distributions are moving some high level programs like Chrome to an application package manager.)

An application package manager doesn't manage the base operating system; instead it only installs, manages, and updates additional (and optional) software components. Sometimes these are actual applications, but at other times, especially historically, these were things like the extra-cost C compiler from your commercial Unix vendor. On Unix, files from these application packages were almost always installed outside of the core system areas like /usr/bin; instead they might go into /opt/<something> or /usr/local or various other things.

(Sometimes vendor software comes with its own internal application package manager, because the vendor wants to ship it in pieces and let you install only some of them while managing the result. And if you want to stretch things a bit, browsers have their own internal 'application package management' for addons.)

A system package manager can also be used for 'applications' and routinely is; many Linux systems provide undeniable applications like Firefox and LibreOffice through the system package manager (not all of them, though). This can include third party packages that put themselves in non-system places like /opt (on Unix) if they want to. I think this is most common on Linux systems, where there's no common dedicated application package manager that's widely used, so third parties wind up building their own packages for the system package manager (which is sure to be there).

For relatively obvious reasons, it's very hard to have multiple system package managers in use on the same system at once; they wind up fighting over who owns what and who changes what in the operating system. It's relatively straightforward to have multiple application package managers in use at once, provided that they keep to their own area so that they aren't overwriting each other.

For the most part, the *BSDs have taken a base system plus application manager approach, with things like their 'ports' system being their application manager. Where people use third party program managers, including pkgsrc on multiple Unixes, Homebrew on macOS, and so on, these are almost always application managers that don't try to also take over and manage the core ('base') operating system programs, libraries, and so on.

(As a result, the *BSDs ship system updates as 'patches', not as new packages, cf OpenBSD's syspatch. I've heard some rumblings that FreeBSD may be working to change this.)

I believe that Microsoft Windows has some degree of system package management, in that it has components that you might or might not install and that can be updated or restored independently, but I don't have much exposure to the Windows world. I will let macOS people speak up in the comments about how that system operates (as people using macOS experience, not as how it's developed; as developed there are a bunch of different parts to macOS, as one can see from the various open source repositories that Apple publishes).

PS: The Linux flatpak movement is mostly or entirely an application manager, and so usually separate from the system package manager (Snap is the same thing but I ignore Canonical's not-invented-here pet projects as much as possible). You can also see containers as an extremely overweight application 'package' delivery model.

PPS: In my view, to count as package management a system needs to have multiple 'packages' and have some idea of what packages are installed. It's common but not absolutely required for the package manager to keep track of what files belong to what package. Generally this goes along with a way to install and remove packages. A system can be divided up into components without having package management, for example if there's no real tracking of what components you've installed and they're shipped as archives that all get unpacked in the same hierarchy with their files jumbled together.

Forcing a Go generic type to be a pointer type (and some challenges)

By: cks
27 January 2026 at 04:48

Recently I saw a Go example that made me scratch my head and decode what was going on (you can see it here). Here's what I understand about what's going on. Suppose that you want to create a general interface for a generic type that requires any concrete implementation to be a pointer type. We can do this by literally requiring a pointer:

type Pointer[P any] interface {
   *P
}

That this is allowed is not entirely obvious from the specification, but it's not forbidden. We're not allowed to use just 'P' or '~P' in the interface type, because you're not allowed to directly or indirectly embed yourself as a type parameter, but '*P' isn't doing that directly; instead, it's forcing a pointer version of some underlying type. Actually using it is a bit awkward, but I'll get to that.

We can then require such a generic type to have some methods, for example:

type Index[P any] interface {
   New() *P
   *P
}

This can be implemented by, for example:

type base struct {
	i int
}

func (b *base) New() *base {
	return &base{-1}
}

But suppose we want to have a derived generic type, for example a struct containing an Index field of this Index (generic) type. We'd like to write this in the straightforward way:

type Example[P any] struct {
	Index Index[P]
}

This doesn't work (at least not today); you can't write 'Index[P]' outside of a type constraint. In order to make this work you must create the type with two related generic type constraints:

type Example[T Index[P], P any] struct {
	Index T
}

This unfortunately means that when we use this generic type to construct values of some concrete type, we have to repeat ourselves:

e := Example[*base, base]{&base{0}}

However, requiring both type constraints means that we can write generic methods that use both of them:

func (e *Example[T, P]) Do() {
	e.Index = (T)(new(P))
}

I believe that the P type would otherwise be inaccessible and you'd be unable to construct this, but I could be wrong; these are somewhat deep waters in Go generics.

You run into a similar issue with functions that you simply want to take an argument that is a Pointer (or an Index), because our Pointer (and Index) generic types are specified relative to an underlying type and can't be used without specifying that underlying type, either explicitly or through type inference. So you have to write generic functions that look like:

func Something[T Pointer[P], P any] (p T) {
   [...]
}

This generic function can successfully use type inference when invoked, but it has to be declared this way and if type inference doesn't work in your specific case you'll need to repeat yourself, as with constructing Example values.

Looking into all of this and writing it out has left me less enlightened than I hoped at the start of the process, but Go generics are a complicated thing in general (or at least I find all of their implications and dark corners to be complicated).

(Original source and background, which is slightly different from what I've done here.)

Sidebar: The type inference way out for constructing values

In the computer science tradition, we can add a layer of indirection.

func NewExample[T Index[P], P any] (p *P) Example[T,P] {
    var e Example[T,P]
    e.Index = p
    return e
}

Then you can call this as 'NewExample(&base{0})' and type inference will fill in al of the types, at least in this case. Of course this isn't an in-place construction, which might be important in some situations.

Sidebar: The mind-bending original version

The original version was like this:

type Index[P any, T any] interface {
	New() T
	*P
}

type Example[T Index[P, T], P any] struct {
	Index T
}

In this version, Example has a type parameter that refers to itself, 'T Index[P, T]'. This is legal in a type parameter declaration; what would be illegal is referring to 'Example' in the type parameters. It's also satisfiable (which isn't guaranteed).

Scraping the FreeBSD 'mpd5' daemon to obtain L2TP VPN usage data

By: cks
26 January 2026 at 04:00

We have a collection of VPN servers, some OpenVPN based and some L2TP based. They used to be based on OpenBSD, but we're moving from OpenBSD to FreeBSD and the VPN servers recently moved too. We also have a system for collecting Prometheus metrics on VPN usage, which worked by parsing the output of things. For OpenVPN, our scripts just kept working when we switched to FreeBSD because the two OSes use basically the same OpenVPN setup. This was not the case for our L2TP VPN server.

OpenBSD does L2TP using npppd, which supports a handy command line control program, npppctl, that can readily extract and report status information. On FreeBSD, we wound up using mpd5. Unfortunately, mpd5 has no equivalent of npppctl. Instead, as covered (sort of) in its user manual you get your choice of a TCP based console that's clearly intended for interactive use and a web interface that is also sort of intended for interactive use (and isn't all that well documented).

Fortunately, one convenient thing about the web interface is that it uses HTTP Basic authentication, which means that you can easily talk to it through tools like curl. To do status scraping through the web interface, first you need to turn it on and then you need an unprivileged mpd5 user you'll use for this:

set web self 127.0.0.1 5006
set web open

set user metrics <some-password> user

At this point you can use curl to get responses from the mpd5 web server (from the local host, ie your VPN server itself):

curl -s -u metrics:... --basic 'http://localhost:5006/<something>'

There are two useful things you can ask the web server interface for. First, you can ask it for a complete dump of its status in JSON format, by asking for 'http://localhost:5006/json' (although the documentation claims that the information returned is what 'show summary' in the console would give you, it is more than that). If you understand mpd5 and like parsing and processing JSON, this is probably a good option. We did not opt to do this.

The other option is that you can ask the web interface to run console (interface) commands for you, and then give you the output in either a 'pleasant' HTML page or in a basic plain text version. This is done by requesting either '/cmd?<command>' or '/bincmd?<command>' respectively. For statistics scraping, the most useful version is the 'bincmd' one, and the command we used is 'show session':

curl -s -u metrics:... --basic 'http://localhost:5006/bincmd?show%20session'

This gets you output that looks like:

ng1  172.29.X.Y  B2-2 9375347-B2-2  L2-2  2  9375347-L2-2  someuser  A.B.C.D
RESULT: 0

(I assume 'RESULT: 0' would be something else if there was some sort of problem.)

Of these, the useful fields for us are the first, which gives the local network device, the second, which gives the internal VPN IP of this connection, and the last two, which give us the VPN user and their remote IP. The others are internal MPD things that we (hopefully) don't have to care about. The internal VPN IP isn't necessary for (our) metrics but may be useful for log correlation.

To get traffic volume information, you need to extract the usage information from each local network device that a L2TP session is using (ie, 'ng1' and its friends). As far as I know, the only tool for this in (base) FreeBSD is netstat. Although you can invoke it interface by interface, probably the better thing to do (and what we did) is to use 'netstat -ibn -f link' to dump everything at once and then pick through the output to get the lines that give you packet and byte counts for each L2TP interface, such as ng1 here.

(I'm not sure if dropped packets is relevant for these interfaces; if you think it might be, you want 'netstat -ibnd -f link'.)

FreeBSD has a general system, 'libxo', for producing output from many commands in a variety of handy formats. As covered in xo_options, this can be used to get this netstat output in JSON if you find that more convenient. I opted to get the plain text format and use field numbers for the information I wanted for our VPN traffic metrics.

(Partly this was because I could ultimately reuse a lot of my metrics generation tools from the OpenBSD npppctl parsing. Both environments generated two sets of line and field based information, so a significant amount of the work was merely shuffling around which field was used for what.)

PS: Because of how mpd5 behaves, my view is that you don't want to let anyone but system staff log on to the server where you're using it. It is an old C code base and I would not trust it if people can hammer on its TCP console or its web server. I certainly wouldn't expose the web server to a non-localhost network, even apart from the bit where it definitely doesn't support HTTPS.

Printing things in colour is not simple

By: cks
25 January 2026 at 03:47

Recently, Verisimilitude left a comment on my entry on X11's DirectColor visual type, where they mentioned that L Peter Deutsch, the author of Ghostscript, lamented using twenty-four bit colour for Ghostscript rather than a more flexible approach, which you may need in printing things with colour. As it happens, I know a bit about this area for two or three reasons, which come at it from different angles. A long time ago I was peripherally involved in desktop publishing software, which obviously cares about printing colour, and then later I became a hobby photographer and at one point had some exposure to people who care about printing photographs (both colour and black and white).

(The actual PDF format supports much more complex colour models than basic 24-bit sRGB or sGray colour, but apparently Ghostscript turns all of that into 24-bit colour internally. See eg, which suggests that modern Ghostscript has evolved into a more complex internal colour model.)

On the surface, printing colour things out in physical media may seem simple. You convert RGB colour to CMYK colour and then send the result off to the printer, where your inkjet or laser printer uses its CMYK ink or toner to put the result on the paper. Photographic printers provide the first and lesser complication in this model, because serious photographic printers have many more colours of ink than CMYK and they put these inks on various different types of fine art paper that have different effects on how the resulting colours come out.

Photographic printers have so many ink colours because this results in more accurate and faithful colours or, for black and white photographs (where a set of grey inks may be used), in more accurate and faithful greys. Photographers who care about this will carefully profile their printer using its inks on the particular fine art paper they're going to use in order to determine how RGB colours can be most faithfully reproduced. Then as part of the printing process, the photographic print software and the printer driver will cooperate to take the RGB photograph and map its colours to what combination of inks and ink intensity can best do the job.

(Photographers use different fine art papers because the papers have different characteristics; one of the high level ones is matte versus glossy papers. But the rabbit hole of detailed paper differences goes quite deep. So does the issue of how many inks a photo printer should have and what they should be. Naturally photographers who make prints have lots of opinions on this whole area.)

Where this stops being just a print driver issue is that people editing photographs often want to see roughly how they'll look when printed out without actually making a print (which is generally moderately expensive). This requires the print subsystem to be capable of feeding colour mapping results back to the editing layer, so you can see that certain things need to be different at the RGB colour level so that they come out well in the printed photograph. This is of course all an approximation, but at the very least photo editing software like darktable wants to be able to warn you when you're creating an 'out of gamut' colour that can't be accurately printed.

(I don't have any current numbers for the cost of making prints on photographic printers, but it's not trivial, especially if you're making large prints; you'll use a decent amount of ink and the fine art paper isn't cheap either. You don't want to make more test prints than you really have to.)

All of this is still in the realm of RGB colour, though (although colour space and display profiling and management complicate the picture). To go beyond this we need to venture into the twin worlds of printing advertising, including product boxes, and fine art printing. Printed product ads and especially boxes for products not infrequently use spot colours, where part of the box will be printed with a pure ink colour rather than approximated with process colours (CMYK or other). You don't really want to manage spot colours by saying that they're a specific RGB value and then everything with that RGB value will be printed with that spot colour; ideally you want to manage them as a specific spot colour layer for each spot colour you're using. An additional complication is that product boxes for mass products aren't necessarily printed with CMYK inks at all; like photographic prints, they may use a custom ink set that's designed to do a good job with the limited colour gamut that appears on the product box.

(This leads to a fun little game you can play at home.)

Desktop publishing software that wants to do a good job with this needs a bunch of features. I believe that generally you want to handle spot colours as separate editing layers even if they're represented in RGB. You probably also want features to limit the colour space and colours that the product designer can do, because the company that will print your boxes may have told you it has certain standard ink sets and please keep your box colours to things they handle well as much as possible. Or you may want to use only pure spot colours from your set of them and not have a product designer accidentally set something to another colour.

Printing art books of fine art has similar issues. The artwork that you're trying to reproduce in the art book may use paint colours that don't reproduce well in standard CMYK colours, or in any colour set without special inks (one case is metallic colours, which are readily available for fine art paints and which some artists love). The artist whose work you're trying to print may have strong opinions about you doing a good job of it, while the more inks you use (and the more special inks) the more expensive the book will be. Some compromise is inevitable but you have to figure out where and what things will be the most mangled by various ink set options. This means your software should be able to map from something roughly like RGB scans or photographs into ink sets and let you know about where things are going to go badly.

For fine art books, my memory is that there are a variety of tricks that you can play to increase the number of inks you can use. For example, sometimes you can print different sections of the book with different inks. This requires careful grouping of the pages (and artwork) that will be printed on a single large sheet of paper with a single set of inks at the printing plant. It also means that your publishing software needs to track ink sets separately for groups of pages and understand how the printing process will group pages together, so it can warn you if you're putting an artwork onto a page that clashes with the ink set it needs.

(Not all art books run into these issues. I believe that a lot of art books for Japanese anime have relatively few problems here because the art they're reproducing was already made for an environment with a restricted colour gamut. No one animates with true metallic colours for all sorts of reasons.)

To come back to PDFs and colour representation, we can see why you might regret picking a single 24-bit RGB colour representation for everything in a program that handles things that will eventually be printed. I'm not sure there's any reasonable general format that will cover everything you need when doing colour printing, but you certainly might want to include explicit provisions for spot colours (which are very common in product boxes, ads, and so on), and apparently Ghostscript eventually gained support for them (as well as various other colour related things).

Understanding query_response in Prometheus Blackbox's tcp prober

By: cks
24 January 2026 at 02:54

Prometheus Blackbox is somewhat complicated to understand. One of its fundamental abstractions is a 'prober', a generic way of probing some service (such as making HTTP requests or DNS requests). One prober is the 'tcp' prober, which makes a TCP connection and then potentially conducts a conversation with the service to verify its health. For example, here's a ClamAV daemon health check, which connects, sends a line with "PING", and expects to receive "PONG":

  clamd_pingpong:
    prober: tcp
    tcp:
      query_response:
        - send: "PING\n"
        - expect: "PONG"

The conversation with the service is detailed in the query_response configuration block (in YAML). For a long time I thought that this was what it looks like here, a series of entries with one directive per entry, such as 'send', 'expect', or 'starttls' (to switch to TLS after, for example, you send a 'STARTTLS' command to the SMTP or IMAP server).

However, much like an earlier case with Alertmanager, this is not actually what the YAML syntax is. In reality each step in the query_response YAML array can have multiple things. To quote the documentation:

 [ - [ [ expect: <string> ],
       [ expect_bytes: <string> ],
       [ labels:
         - [ name: <string>
             value: <string>
           ], ...
       ],
       [ send: <string> ],
       [ starttls: <boolean | default = false> ]
     ], ...
 ]

When there are multiple keys in a single step, Blackbox handles them in almost the order listed here: first expect, then labels if the expect matched, then expect_bytes, then send, then starttls. Normally you wouldn't have both expect and expect_bytes in the same step (and combining them is tricky). This order is not currently documented, so you have to read prober/query_response.go to determine it.

One reason to combine expect and send together in a single step is that then send can use regular expression match groups from the expect in its text. There's an example of this in the example blackbox.yml file:

  irc_banner:
    prober: tcp
    tcp:
      query_response:
      - send: "NICK prober"
      - send: "USER prober prober prober :prober"
      - expect: "PING :([^ ]+)"
        # cks: note use of ${1}, from PING
        send: "PONG ${1}"
      - expect: "^:[^ ]+ 001"

The 'labels:' key is something added in v0.26.0, in #1284. As shown in the example blackbox.yml file, it can be used to do things like extract SSH banner information into labels on a metric:

  ssh_banner_extract:
    prober: tcp
    timeout: 5s
    tcp:
      query_response:
      - expect: "^SSH-2.0-([^ -]+)(?: (.*))?$"
        labels:
        - name: ssh_version
          value: "${1}"
        - name: ssh_comments
          value: "${2}"

This creates a metric that looks like this:

probe_expect_info {ssh_comments="Ubuntu-3ubuntu13.14", ssh_version="OpenSSH_9.6p1"} 1

At the moment there are some undocumented restrictions on the 'labels' key (or action or whatever you want to call it). First, it only works if you use it in a step that has an 'expect'. Even if all you want to do is set constant label values (for example to record that you made it to a certain point in your steps), you need to expect something; you can't use 'labels' in a step that otherwise only has, say, 'send'. Second, you can only have one labels in your entire query_response section; if you have more than one, you'll currently experience a Go panic when checking reaches the second.

This is unfortunate because Blackbox is currently lacking good ways to see how far your query_response steps got if the probe fails. Sometimes it's obvious where your probe failed, or irrelevant, but sometimes it's both relevant and not obvious. If you could use multiple labels, you could progressively set fixed labels and tell how far you got by what labels were visible in the scrape metrics.

(And of course you could also record various pieces of useful information that you don't get all at once.)

Sidebar: On (not) condensing expect and send together

My personal view is that I normally don't want to condense 'expect' and 'send' together into one step entry unless I have to, because most of the time it inverts the relationship between the two. In most protocols and protocol interactions, you send something and expect a response; you don't receive something and then send a response to it. In my opinion this is more naturally written in the style:

      query_response:
      - expect: "something"
      - send: "my request"
      - expect: "reply to my request"
      - send: "something else"
      - expect: "reply to something else"

Than as:

      query_response:
      - expect: "something"
        send: "my request"
      - expect: "reply to my request"
        send: "something else"
      - expect: "reply to something else"

What look like pairs (an expect/send in the same step) are not actually pairs; the 'expect' is for a previous 'send' and then 'send' pairs with the next 'expect' in the next step. So it's clearer to write them all as separate steps, which doesn't create any expectations of pairing.

Pitfalls in using Prometheus Blackbox to monitor external SMTP

By: cks
23 January 2026 at 04:15

The news of the day is that Microsoft had a significant outage inside their Microsoft 365 infrastructure. We noticed when we stopped being able to deliver email to the university's institutional email system, which was a bit mysterious in the usual way of today's Internet:

The joys of modern email: "Has Microsoft decided to put all of our email on hold or are they having a global M365 inbound SMTP email incident?"

(For about the last hour and a half, if it's an incident someone is having a bad day.)

We didn't find out immediately when this happened (and if our systems had been working right, we wouldn't have found out when I did, but that's another story). Initially I was going to write an entry about whether or not we should use our monitoring system to monitor external services that other people run, but it turns out that we do try to monitor whether we can do a SMTP conversation to the university's M365-hosted institutional email. There were several things that happened with this monitoring.

The first thing that happened is that the alerts related to it rotted. The university once had a fixed set of on-premise MX targets and we monitored our ability to talk to them and alerted on it. Then the university moved their MX targets to M365 and our old alerts stopped applying, so we commented them out and never added any new alerts for any new checking we were doing.

One of the reasons for that is that we were doing this monitoring through Prometheus Blackbox, and Blackbox is not ideal for monitoring Microsoft 365 MX targets. The way M365 does redundancy in their inbound mail servers for your domain is not by returning multiple DNS MX records, but by returning one MX record for a hostname that has multiple IP addresses (and the IP addresses may change). What a mailer will do is try all of the IP addresses until one responds. What Blackbox does is it picks one IP address and then it probes the IP address; if the address fails, there is no attempt to check the other IP addresses. Failing if one IP of many is not responding is okay for casual checks, but you don't necessarily want to alert on it.

(I believe that Blackbox picks the first IP address in the DNS A record, but this depends on how the Go standard library and possibly your local resolver behaves. If either sort the results, you get the first A record in the sorted result.)

The final issue is that we weren't necessarily checking enough of the SMTP conversation. For various reasons, we decided that all we could safely and confidently check was that the university's mail system accepted a testing SMTP MAIL FROM from our subdomain; we didn't check that it also accepted a SMTP RCPT TO. I believe that during part of this Microsoft 365 incident, the inbound M365 SMTP servers would accept our SMTP MAIL FROM but report an error at the RCPT TO (although I can't be sure). Certainly if we want to have a more realistic check of 'is email to M365 working', we should go as far as a SMTP RCPT TO.

(During parts of the incident, DNS lookups didn't succeed for the MX target. Without detailed examination I can't be sure of what happened in the other cases.)

Overall, Blackbox is probably the wrong tool to check an external mail target like M365 if we're serious about it and want to do a good job. At the moment it's not clear to me if we should go to the effort to do better, since it is an external service and there's nothing we can do about problems (although we can let people know, which has some value, but that's another entry).

PS: You can get quite elaborate in a mail deliverability test, but to some degree the more elaborate you get the more pieces of infrastructure you're testing, and you may want a narrow test for better diagnostics.

What ZFS people usually mean when they talk about "ZFS metadata"

By: cks
22 January 2026 at 04:14

Recently I read Understanding ZFS Scrubs and Data Integrity (via), which is a perfectly good article and completely accurate, bearing in mind some qualifications which I'm about to get into. One of the things this article says in the preface is:

In this article, we will walk through what scrubs do, how the Merkle tree layout lets ZFS validate metadata and data from end to end, [...]

This is both completely correct and misleading, because what ZFS people mean we talk about "metadata" is probably not what ordinary people (who are aware of filesystems) think of as "metadata". This misunderstanding leads people (which once upon a time included me) to believe that ZFS scrubs check much more than they actually do.

Specifically, in normal use "ZFS metadata" is different from "filesystem metadata", like directories. A core ZFS concept is DMU objects (dnodes), which are a basic primitive of ZFS's structure; a DMU object stores data in a more or less generic way. As covered in more detail in my broad overview on how ZFS is structured on disk, filesystem objects like directories, files, ACLs, and so on are all DMU objects that are stored in the filesystem's (DMU) object set and are referred to (for examine in filesystem directories) by object number (the equivalent of an inode number). At this level, filesystem metadata is ZFS data.

What ZFS people and ZFS scrubs mean by "ZFS metadata" are things such as each filesystem's DMU object set (which is itself a DMU object, because in ZFS it's turtles most of the way down), the various DSL (Dataset and Snapshot Layer) objects, the various DMU objects used to track and manage free space in the ZFS pool, and so on. All of this ZFS metadata is organized in a tree that's rooted in the uberblock and the pool's Meta Object Set (MOS) that the uberblock points to. It is this tree that is guarded and verified by checksums and ZFS scrubs, from the very top down to the leaves.

As far as I know, all filesystem level files, directories, symbolic links, ACLs, and so on are leaves of this tree of ZFS metadata; they are merely ZFS data. While they make up a logical filesystem tree (we hope), they aren't a tree at the level of ZFS objects; they're merely DMU objects in the filesystem's object set. Only at the ZFS filesystem layer (ZPL, the "ZFS POSIX Layer") does ZFS look inside these various filesystem objects and maintain structural relationships, such as a filesystem's directory tree or parent information (some of which is maintained using generic ZFS facilities like ZAP objects).

Scrubs must go through the tree of ZFS metadata in order to find everything that's in use in order to verify its checksum, but they don't have to go through the filesystem's directory tree. To verify the checksum of everything in a filesystem, all a scrub has to do is go through the filesystem's DMU object set, which contains every in-use object in the filesystem regardless of whether it's a regular file, a directory, a symbolic link, an ACL, or whatever.

The long painful history of (re)using login to log people in

By: cks
21 January 2026 at 03:36

The news of the time interval is that Linux's usual telnetd has had a giant security vulnerability for a decade. As people on the Fediverse observed, we've been here before; Solaris apparently had a similar bug 20 or so years ago (which was CVE-2007-0882, cf, via), and AIX in the mid 1990s (CVE-1999-0113, source, also)), and also apparently SGI Irix, and no doubt many others (eg). It's not necessarily telnetd at fault, either, as I believe it's sometimes been rlogind.

All of these bugs have a simple underlying cause; in a way that root cause is people using Unix correctly and according to its virtue of modularity, where each program does one thing and you string programs together to achieve your goal. Telnetd and rlogind have the already complicated job of talking a protocol to the network, setting up ptys, and so on, so obviously they should leave the also complex job of logging the user in to login, which already exists to do that. In theory this should work fine.

The problem with this is that from more or less the beginning, login has had several versions of its job. From no later than V3 in 1972, login could also be used to switch from one user to another, not just log in initially. In 4.2 BSD, login was modified and reused to become part of rlogind's authentication mechanism (really; .rhosts is checked in the 4.2BSD login.c, not in rlogind). Later, various versions of login were modified to support 'automatic' logins, without challenging for a password (see eg FreeBSD login(1), OpenBSD login(1), and Linux login(1); use of -f for this appears to date back to around 4.3 Tahoe). Sometimes this was explicitly for the use of things that were running as root and had already authenticated the login.

In theory this is all perfectly Unixy. In practice, login figured out which of these variations of its basic job it was being used for based on a combination of command line arguments and what UID it was running as, which made it absolutely critical that programs running as root that reused login never allowed login to be invoked with arguments that would shift it to a different mode than they expected. Telnetd and rlogind have traditionally run as root, creating this exposure.

People are fallible, programmers included, and attackers are very ingenious. Over the years any number of people have found any number of ways to trick network daemons running as root into running login with 'bad' arguments.

The one daemon I don't think has ever been tricked this way is OpenSSH, because from very early on sshd refused to delegate logging people in to login. Instead, sshd has its own code to log people in to the system. This has had its complexities but has also shielded sshd from all of these (login) context problems.

In my view, this is one of the unfortunate times when the ideals of Unix run up against the uncomfortable realities of the world. Network daemons delegating logging people in to login is the correct Unix answer, but in practice it has repeatedly gone wrong and the best answer is OpenSSH's.

TCP, UDP, and listening only on a specific IP address

By: cks
20 January 2026 at 02:33

One of the surprises of TCP and UDP is that when your program listens for incoming TCP connections or UDP packets, you can chose to listen only on a specific IP address instead of all of the IP addresses that the current system has. This behavior started as a de-facto standard but is now explicitly required for TCP in RFC 9293 section 3.9.1.1. There are at least two uses of this feature; to restrict access to your listening daemon, and to run multiple daemons on the same port.

The classical case of restricting access to a listening daemon is a program that listens only on the loopback IP address (IPv4 or IPv6 or both). Since loopback addresses can't be reached from outside the machine, only programs running on the machine can reach the daemon. On a machine with multiple IP addresses that are accessible from different network areas, you can also listen on only one IP address (perhaps an address 'inside' a firewall) to shield your daemon from undesired connections.

(Except in the case of the loopback IP address, this shielding isn't necessarily perfect. People on any of your local networks can always throw packets at you for any of your IP addresses, if they know them. In some situations, listening only on RFC 1918 private addresses can be reasonably safe from the outside world.)

The other use is to run multiple daemons that are listening on the same port but on different IP addresses. For example, you might run a public authoritative DNS server for some zones that is listening on port 53 (TCP and UDP) on your non-localhost IPs and a private resolving DNS server that is listening on localhost:53. Or you could have a 'honeypot' IP address that is running a special SSH server to look for Internet attackers, while still running your regular SSH server (to allow regular access) on your normal IP addresses. Broadly, this can be useful any time you want to have different configurations on the same port for different IP addresses.

Using restricted listening for access control has a lot of substitutes. Your daemon can check incoming connections and drop them depending on the local or remote IPs, or your host could have some simple firewall rules, or some additional software layer could give you a hand. Also, as mentioned, if you listen on anything other than localhost, you need to be sure that your overall configuration makes that safe enough. The other options are more complex but also more sure, or at least more obviously sure (or flawed).

Using restricted listening to have different things listening on the same TCP or UDP port doesn't have any good substitutes in current systems. Even if the operating system allows multiple things to listen generally on the same port, it has no idea which instance should get which connection or packet. To do this steering today, you'd need either a central 'director' daemon that received all packets or connection attempts and then somehow passed them to the right other program, or you'd have programs listen on different ports and then use OS firewall rules to (re)direct traffic to the right instance.

You can imagine an API that allows all of the programs to tell the operating system which connections they're interested in and which ones they aren't. One simple form of that API is 'listen on a specific IP address instead of all of them', and it conveniently also allows the OS to trivially detect conflicts between programs (even if some of them initially seem artificial).

(It would be nice if OSes gave programs nice APIs for choosing what incoming connections and packets they wanted and what they didn't, but mostly we deal with the APIs we have, not the ones we want.)

Single sign on systems versus X.509 certificates for the web

By: cks
19 January 2026 at 03:59

Modern single sign on specifications such as OIDC and SAML and systems built on top of them are fairly complex things with a lot of moving parts. It's possible to have a somewhat simple surface appearance for using them in web servers, but the actual behind the scenes implementation is typically complicated, and of course you need an identity provider server and its supporting environment as well (which can get complicated). One reaction to this is to suggest using X.509 certificates to authenticate people (as a recent comment on this entry did).

There are a variety of technical considerations here, like to what extent browsers (and other software) might support personal X.509 certificates and make them easy to use, but to my mind there's also an overriding broad consideration that makes the two significantly different. Namely, people can remember passwords but they have to store X.509 certificates. OIDC and SAML may pass around tokens and programs dealing with them may store tokens, but the root of everything is in passwords, and you can recover all the tokens from there. This is not true with X.509 certificates; the certificate is the thing.

(There are also challenges around issuing, managing, checking, and revoking personal X.509 certificates, but let's ignore them.)

To make using X.509 certificate practical for authenticating people, people have to be able to use them on multiple devices and move them between browsers. Many people have multiple devices and people do change what browsers they use (for all that browser and platform vendors like them not to, or at least the ones that are currently popular are often all for that). Today, there is basically nothing that helps people deal with this, and as a result X.509 certificates are at best awkward for people to use (and remember, security is people).

(In common use, it's easy to move passwords between browsers and devices because they're in your head (excluding password managers, which are still not used by a lot of people).)

Of course you could develop standards and software for moving and managing X.509 certificates. In many ways, passkeys show what's possible here, and also show many of the hazards of using things for authentication that can't be memorized (or copied) by people in order to transport them between environments. However, no such standards and software exist today, and no one has every shown much interest in developing them, even back in the days when personal X.509 certificates were close to your only game in town.

(You could also develop much better browser UIs for dealing with personal X.509 certificates, something that was extremely under-developed back in the days when they were sometimes in use. Even importing such a certificate into your browser could be awkward, never mind using it.)

In the past, people have authenticated web applications through the use of personal X.509 certificates (as a more secure form of passwords). As far as I know, pretty much everyone has given up on that and moved to better options, first passwords (sometimes plus some form of additional confirmation) and then these days trying to get people to use passkeys. One reason they gave up was that actually using X.509 certificates in practice was awkward and something that people found quite annoying.

(I had to use a personal X.509 certificate for a while in order to get free TLS certificates for our servers. It wasn't a particularly great experience and I'm not in the least bit surprised that everyone ditched it for single sign on systems.)

PS: It's no good saying that X.509 certificates would be great if all of the required technology was magically developed, because that's not going to just happen. If you want personal X.509 certificates to be a thing, you have a great deal of work ahead of you and there is no guarantee you'll be successful. No one else is going to do that work for you.

PPS: You can imagine a system where people use their passwords and other multi-factor authentication to issue themselves new personal X.509 certificates signed by your local Certificate Authority, so they can recover from losing the X.509 certificate blob (or get a new certificate for a new device). Congratulations, you have just re-invented a manual version of OIDC tokens (also, it's worse in various ways).

People cannot "just pay attention" to (boring, routine) things

By: cks
18 January 2026 at 02:04

Sometimes, people in technology believe that we can solve problems by getting people to pay attention. This comes up in security, anti-virus efforts, anti-phish efforts, monitoring and alert handling, warning messages emitted by programs, warning messages emitted by compilers and interpreters, and many other specific contexts. We are basically always wrong.

One of the core, foundational results from human factors research, research into human vision, the psychology of perceptions, and other related fields, is that human brains are a mess of heuristics and have far more limited capabilities than we think (and they lie to us all the time). Anyone who takes up photography as a hobby has probably experienced this (I certainly did); you can take plenty of photographs where you literally didn't notice some element in the picture at the time but only saw it after the fact while reviewing the photograph.

(In general photography is a great education on how much our visual system lies to us. For example, daytime shadows are blue, not black.)

One of the things we have a great deal of evidence about from both experiments and practical experience is that people (which is to say, human brains) are extremely bad at noticing changes in boring, routine things. If something we see all the time quietly disappears or is a bit different, the odds are extremely high that people will literally not notice. Our minds have long since registered whatever it is as 'routine' and tuned it out in favour of paying attention to more important things. You cannot get people to pay attention to these routine, almost always basically the same thing by asking them to (or yelling at them to do so, or blaming them when they don't), because our minds don't work that way.

We also have a tendency to see what we expect to see and not see what we don't expect to see, unless what we don't expect shoves itself into our awareness with unusual forcefulness. There is a famous invisible gorilla experiment that shows one aspect of this, but there are many others. This is why practical warning, alerts, and so on cannot be unobtrusive. Fire alarms are blaringly loud and obtrusive so that you cannot possibly miss them despite not expecting to hear them. A fire alarm that was "pay attention to this light if it starts blinking and makes a pleasant ringing tone" would get people killed.

There are hacks to get people to pay attention anyway, such as checklists, but these hacks are what we could call "not scalable" for many of the situations that people in technology care about. We cannot get people to go through a "should you trust this" checklist every time they receive an email message, especially when phish spammers deliberately craft their messages to create a sense of urgency and short-cut people's judgment. And even checklists are subject to seeing what you expect and not paying attention, especially if you do them over and over again on a routine basis.

(I've written a lot about this in various narrower areas before, eg 1, 2, 3, 4, 5. And in general, everything comes down to people, also.)

Systemd-networkd and giving your virtual devices alternate names

By: cks
17 January 2026 at 03:28

Recently I wrote about how Linux network interface names have a length limit, of 15 characters. You can work around this limit by giving network interfaces an 'altname' property, as exposed in (for example) 'ip link'. While you can't work around this at all in Canonical's Netplan, it looks like you can have this for your VLANs in systemd-networkd, since there's AlternativeName= in the systemd.link manual page.

Except, if you look at an actual VLAN configuration as materialized by Netplan (or written out by hand), you'll discover a problem. Your VLANs don't normally have .link files, only .netdev and .network files (and even your normal Ethernet links may not have .link files). The AlternativeName= setting is only valid in .link files, because networkd is like that.

(The AlternativeName= is a '[Link]' section setting and .network files also have a '[Link]' section, but they allow completely different sets of '[Link]' settings. The .netdev file, which is where you define virtual interfaces, doesn't have a '[Link]' section at all, although settings like AlternativeName= apply to them just as much as to regular devices. Alternately, .netdev files could support setting altnames for virtual devices in the '[NetDev]' section along side the mandatory 'Name=' setting.)

You can work around this indirectly, because you can create a .link file for a virtual network device and have it work:

[Match]
Type=vlan
OriginalName=vlan22-mlab

[Link]
AlternativeNamesPolicy=
AlternativeName=vlan22-matterlab

Networkd does the right thing here even though 'vlan22-mlab' doesn't exist when it starts up; when vlan22-mlab comes into existence, it matches the .link file and has the altname stapled on.

Given how awkward this is (and that not everything accepts or sees altnames), I think it's probably not worth bothering with unless you have a very compelling reason to give an altname to a virtual interface. In my case, this is clearly too much work simply to give a VLAN interface its 'proper' name.

Since I tested, I can also say that this works on a Netplan-based Ubuntu server where the underlying VLAN is specified in Netplan. You have to hand write the .link file and stick it in /etc/systemd/network, but after that it cooperates reasonably well with a Netplan VLAN setup.

TCP and UDP and implicit "standard" elements of things

By: cks
16 January 2026 at 02:34

Recently, Verisimilitude left a comment on this entry of mine about binding TCP and UDP ports to a specific address. That got me thinking about features that have become standard elements of things despite not being officially specified and required.

TCP and UDP are more or less officially specified in various RFCs and are implicitly specified by what happens on the wire. As far as I know, nowhere in these standards (or wire behavior) does anything require that a multi-address host machine allow you to listen for incoming TCP or UDP traffic on a specific port on only a restricted subset of those addresses. People talking to your host have to use a specific IP, obviously, and established TCP connections have specific IP addresses associated with them that can't be changed, but that's it. Hosts could have an API where you simply listened to a specific TCP or UDP port and then they provided you with the local IP when you received inbound traffic; it would be up to your program to do any filtering to reject addresses that you didn't want used.

However, I don't think anyone has such an API, and anything that did would likely be considered very odd and 'non-standard'. It's become an implicit standard feature of TCP and UDP that you can opt to listen on only one or a few IP addresses of a multi-address host, including listening only on localhost, and connections to your (TCP) port on other addresses are rejected without the TCP three-way handshake completing. This has leaked through into the behavior that TCP clients expect in practice; if a port is not available on an IP address, clients expect to get a TCP layer 'connection refused', not a successful connection and then an immediate disconnection. If a host had the latter behavior, clients would probably not report it as 'connection refused' and some of them would consider it a sign of a problem on the host.

This particular (API) feature comes from a deliberately designed element of the BSD sockets API, the bind() system call. Allowing you to bind() local addresses to your sockets means that you can set the outgoing IP address for TCP connection attempts and UDP packets, which is important in some situations, but BSD could have provided a different API for that. BSD's bind() API does allow you maximum freedom with only a single system call; you can nail down either or both of the local IP and the local port. Binding the local port (but not necessarily the local IP) was important in BSD Unix because it was part of a security mechanism.

(This created an implicit API requirement for other OSes. If you wanted your OS to have an rlogin client, you had to be able to force the use of a low local port when making TCP connections, because the BSD rlogind.c simply rejected connections from ports that were 1024 and above even in situations where it would ask you for a password anyway.)

A number of people copied the BSD sockets API rather than design their own. Even when people designed their own API for handling networking (or IPv4 and later IPv6), my impression is that they copied the features and general ideas of the BSD sockets API rather than starting completely from scratch and deviating significantly from the BSD API. My usual example of a relatively divergent API is Go, which is significantly influenced by a quite different networking history inside Bell Labs and AT&T, but Go's net package still allows you to listen selectively on an IP address.

(Of course Go has to work with the underlying BSD sockets API on many of the systems it runs on; what it can offer is mostly constrained by that, and people will expect it to offer more or less all of the 'standard' BSD socket API features in some form.)

PS: The BSD TCP API doesn't allow a listening program to make a decision about whether to allow or reject an incoming connection attempt, but this is turned out to be a pretty sensible design. As we found out witn SYN flood attacks, TCP's design means that you want to force the initiator of a connection attempt to prove that they're present before the listening ('server') side spends much resources on the potential connection.

Linux network interface names have a length limit, and Netplan

By: cks
15 January 2026 at 02:19

Over on the Fediverse, I shared a discovery:

This is my (sad) face that Linux interfaces have a maximum name length. What do you mean I can't call this VLAN interface 'vlan22-matterlab'?

Also, this is my annoyed face that Canonical Netplan doesn't check or report this problem/restriction. Instead your VLAN interface just doesn't get created, and you have to go look at system logs to find systemd-networkd telling you about it.

(This is my face about Netplan in general, of course. The sooner it gets yeeted the better.)

Based on both some Internet searches and looking at kernel headers, I believe the limit is 15 characters for the primary name of an interface. In headers, you will find this called IFNAMSIZ (the kernel) or IF_NAMESIZE (glibc), and it's defined to be 16 but that includes the trailing zero byte for C strings.

(I can be confident that the limit is 15, not 16, because 'vlan22-matterlab' is exactly 16 characters long without a trailing zero byte. Take one character off and it works.)

At the level of ip commands, the error message you get is on the unhelpful side:

# ip link add dev vlan22-matterlab type wireguard
Error: Attribute failed policy validation.

(I picked the type for illustration purposes.)

Systemd-networkd gives you a much better error message:

/run/systemd/network/10-netplan-vlan22-matterlab.netdev:2: Interface name is not valid or too long, ignoring assignment: vlan22-matterlab

(Then you get some additional errors because there's no name.)

As mentioned in my Fediverse post, Netplan tells you nothing. One direct consequence of this is that in any context where you're writing down your own network interface names, such as VLANs or WireGuard interfaces, simply having 'netplan try' or 'netplan apply' succeed without errors does not mean that your configuration actually works. You'll need to look at error logs and perhaps inventory all your network devices.

(This isn't the first time I've seen Netplan behave this way, and it remains just as dangerous.)

As covered in the ip link manual page, network interfaces can have either or both of aliases and 'altname' properties. These alternate names can be (much) longer than 16 characters, and the 'ip link property' altname property can be used in various contexts to make things convenient (I'm not sure what good aliases are, though). However this is somewhat irrelevant for people using Netplan, because the current Netplan YAML doesn't allow you to set interface altnames.

You can set altnames in networkd .link files, as covered in the systemd.link manual page. The direct thing you want is AlternativeName=, but apparently you may also want to set a blank alternative names policy, AlternativeNamesPolicy=. Of course this probably only helps if you're using systemd-networkd directly, instead of through Netplan.

PS: Netplan itself has the notion of Ethernet interfaces having symbolic names, such as 'vlanif0', but this is purely internal to Netplan; it's not manifested as an actual interface altname in the 'rendered' systemd-networkd control files that Netplan writes out.

(Technically this applies to all physical device types.)

Safely querying Spamhaus DNSBLs in Exim

By: cks
14 January 2026 at 02:53

When querying Spamhaus DNS blocklists, either their public mirrors or through a DQS account, the DNS blocklists can potentially return error codes in 127.255.255.0/24 (also). Although Exim has a variety of DNS blocklist features, it doesn't yet let you match return codes based on CIDR netblocks. However, it does have a magic way of doing this.

The magic way is to stick '!&0.255.255.0' on the end of the DNS blocklist name. This is a negated DNS (blocklist) matching conditions, specifically a negated bitmask (a 'bitwise-and'). The whole thing looks like:

deny dnslists = zen.spamhaus.org!&0.255.255.0

What this literally means is to consider the lookup to have failed if the resulting IP address matches '*.255.255.*'. Because Exim already requires successful lookup results to be in 127.0.0.0/8, this implicitly constrains the entire result to not match 127.255.255.*, which is what we want.

As covered in Additional matching conditions for DNS lists, Exim can match DNS blocklist results by a specific IP or a bitmap, the latter of which is written as, eg, '&0.255.255.0'. When you match by bitmap, the IP address is anded with the bitmap and the result must be the same as the bitmap (meaning that all bits set in the bitmask are set in the IP address):

(ip & bitmask) == bitmask

(You can consider both the IP and the bitmask as 32-bit numbers, or you can consider each octet separately in both, whichever makes it easier.)

There's no way to say that the match succeeds if the result of and'ing the IP and the bitmask is non-zero (has any bits set). For small number of bits, you can sort of approximate that by using multiple bitmasks. For example, to succeed if either of the two lowest bits are set:

a.example&0.0.0.1,0.0.0.2

(The 'lowest bit' here is the lowest bit of the rightmost octet.)

If you negate a bitmask condition by writing it as '!&', the lookup is considered to have failed if the '&<bitmask>' match is successful, which is to say that the IP address anded with the bitmask is the same as the bitmask.

This is why '!&0.255.255.0' does what we want. '&0.255.255.0' successfully matches if the IP address is exactly *.255.255.*, because both middle octets have all their bits set in the mask so they have to have all their bits set in the IP address, and because the first and last octets in the mask are 0, their value in the IP address isn't looked at. Then we negate this, so the lookup is considered to have failed if the bitmask matched, which would mean that Spamhaus returned results in 127.255.255.0/24.

I'm writing all of that out in detail because here is what the current Exim documentation says about negated DNS bitmask conditions:

Negation can also be used with a bitwise-and restriction. The dnslists condition with only be true if a result is returned by the lookup which, anded with the restriction, is all zeroes.

This is not how Exim behaves. If it was how Exim behaves, Spamhaus DBL lookups would not work correctly with '!&0.255.255.0'. DBL lookups return results in 127.0.1.0/24; if you bitwise-and that with 0.255.255.0, you get '0.0.1.0', which is not all zeroes.

(It could be useful to have a version of '&' that succeeded if any of the bits in the result were non-zero, but that's not what Exim has today, as discussed above.)

Something you don't want to do when using Spamhaus's DQS with Exim

By: cks
13 January 2026 at 04:16

For reasons outside the scope of this entry, we recently switched from Spamhaus's traditional public DNS (what is now called the 'public mirrors') to an account with their Data Query Service. The DQS data can still be queried via DNS, which presents a problem: DNS queries have no way to carry any sort of access key with them. Spamhaus has solved this problem by embedding your unique access key in the zone name you must use. Rather than querying, say, zen.spamhaus.org, you query '<key>.zen.dq.spamhaus.net'. Because your DQS key is tied to your account and your account has query limits, you don't want to spread your DQS key around for other people to pick up and use.

We use the Exim mailer (which is more of a mailer construction kit out of the box). Exim has a variety of convenient features for using DNS (block) lists. One of them is that when Exim finds an entry in a DNS blocklist in an ACL, it sets some (Exim) variables that you can use later in various contexts, such as creating log messages. To more or less quote from the Exim documentation on (string) expansion variables:

$dnslist_domain
$dnslist_matched
$dnslist_text
$dnslist_value

When a DNS (black) list lookup succeeds, these variables are set to contain the following data from the lookup: the list’s domain name, the key that was looked up, the contents of any associated TXT record, and the value from the main A record. [...]

To make life easier on yourself, it's conventional to use these variables (among others) in things like SMTP error messages and headers that you add to messages:

deny hosts = !+local_networks
     message = $sender_host_address is listed \
               at $dnslist_domain: $dnslist_text
     dnslists = rbl-plus.mail-abuse.example

warn dnslists = weird.example
     add_header = X-Us-DNSBL: listed in $dnslist_domain

However, if you're using Spamhaus DQS, using $dnslist_domain as these examples do is dangerous. The DNS list domain will be the full domain, and that full domain will include your DQS access key, which you will thus be exposing in message headers and SMTP error messages. You probably don't want to do that.

(Certainly it feels like a bad practice to leak a theoretically confidential value into the world, even if the odds are that no one is going to pick it up and abuse it.)

You have two options. The first option is to simply hard code some appropriate name for the list instead of using $dnslist_domain. However, this only works if you're using a single DNS list in each ACL condition, instead of something where you check multiple DNS blocklists at once (with 'dnslists = a.example : b.example : c.example'). It's also a bit annoying to have to repeat yourself.

(This is what I did to our Exim configuration when I realized the problem.)

The second option is that Exim has a comprehensive string expansion language, so determined people can manipulate $dnslist_domain to detect that it contains your DQS key and remove it. The brute force way would be to use ${sg} (from expansion items) to replace your key with nothing, something like (this is untested):

${sg{$dnslist_domain}{<DQS key>}{}}

You could probably wrap this up in an Exim macro, call it 'DNSLIST_NAME', and then write ACLs as, say:

deny hosts = !+local_networks
     message = $sender_host_address is listed \
               at DNSLIST_NAME
     dnslists = rbl-plus.mail-abuse.example

(Because we're using ${sg}, we won't change the name of a DNSBL domain that doesn't contain the DQS key.)

This isn't terrible and it does cope with a single Exim ACL condition that checks multiple DNS blocklists.

An annoyance in how Netplan requires you to specify VLANs

By: cks
12 January 2026 at 04:27

Netplan is Canonical's more or less mandatory method of specifying networking on Ubuntu. Netplan has a collection of limitations and irritations, and recently I ran into a new one, which is how VLANs can and can't be specified. To explain this, I can start with the YAML configuration language. To quote the top level version, it looks like:

network:
  version: NUMBER
  renderer: STRING
  [...]
  ethernets: MAPPING
  [...]
  vlans: MAPPING
  [...]

To translate this, you specify VLANs separately from your Ethernet or other networking devices. On the one hand, this is nicely flexible. On the other hand it creates a problem, because here is what you have to write for VLAN properties:

network:
  vlans:
    vlan123:
      id: 123
      link: enp5s0
      addresses: <something>

Every VLAN is on top of some networking device, and because VLANs are specified as a separate category of top level devices, you have to name the underlying device in every VLAN (which gets very annoying and old very fast if you have ten or twenty VLANs to specify). Did you decide to switch from a 1G network port to a 10G network port for the link with all of your VLANs on it? Congratulations, you get to go through every 'vlans:' entry and change its 'link:' value. We hope you don't overlook one.

(Or perhaps you had to move the system disks from one model of 1U server to another model of 1U server because the hardware failed. Or you would just like to write generic install instructions with a generic block of YAML that people can insert directly.)

The best way for Netplan to deal with this would be to allow you to also specify VLANs as part of other devices, especially Ethernet devices. Then you could write:

network:
  ethernet:
    enp5s0: 
      vlans:
        vlan123:
          id: 123
          addresses: <something>

Every VLAN specified in enp5s0's configuration would implicitly use enp5s0 as its underlying link device, and you could rename all of them trivially. This also matches how I think most people think of and deal with VLANs, which is that (obviously) they're tied to some underlying device, and you want to think of them as 'children' of the other device.

(You can have an approach to VLANs where they're more free-floating and the interface that delivers any specific VLAN to your server can change, for load balancing or whatever. But you could still do this, since Netplan will need to keep supporting the separate 'vlans:' section.)

If you want to work around this today, you have to go for the far less convenient approach of artificial network names.

network:
  ethernet:
    vlanif0:
      match:
        name: enp5s0

  vlans:
    vlan123:
      id: 123
      link: vlanif0
      addresses: <something>

This way you only need to change one thing if your VLAN network interface changes, but at the cost of doing a non-standard way of setting up the base interface. (Yes, Netplan accepts it, but it's not how the Ubuntu installer will create your netplan files and who knows what other Canonical tools will have a problem with it as a result.)

We have one future Ubuntu server where we're going to need to set up a lot of VLANs on one underlying physical interface. I'm not sure which option we're going to pick, but the 'vlanif0' option is certainly tempting. If nothing else, it probably means we can put all of the VLANs into a separate, generic Netplan file.

Early experience with using Linux tc to fight bufferbloat latency

By: cks
11 January 2026 at 03:52

Over on the Fediverse I mentioned something recently:

Current status: doing extremely "I don't know what I'm really doing, I'm copying from a websiteΒΉ" things with Linux tc to see if I can improve my home Internet latency under load without doing too much damage to bandwidth or breaking my firewall rules. So far, it seems to work and thingsΒ² claim to like the result.

ΒΉ <documentation link>
Β² https://bufferbloat.libreqos.com/ via @davecb

What started this was running into a Fediverse post about the bufferbloat test, trying it, and discovering that (as expected) my home DSL link performed badly, with significant increased latency during downloads, uploads, or both. My memory is that reported figures went up to the area of 400 milliseconds.

Conveniently for me, my Linux home desktop is also my DSL router; it speaks PPPoE directly through my DSL modem. This means that doing traffic shaping on my Linux desktop should cover everything, without any need to wrestle with a limited router OS environment. And there was some more or less cut and paste directions on the site.

So my outbound configuration was simple and obviously not harmful:

tc qdisc add root dev ppp0 cake bandwidth 7.6Mbit

The bandwidth is a guess, although one informed by checking both my raw DSL line rate and what testing sites told me.

The inbound configuration was copied from the documentation and it's where I don't understand what I'm doing:

ip link add name ifb4ppp0 type ifb
tc qdisc add dev ppp0 handle ffff: ingress
tc qdisc add dev ifb4ppp0 root cake bandwidth 40Mbit besteffort
ip link set ifb4ppp0 up
tc filter add dev ppp0 parent ffff: matchall action mirred egress redirect dev ifb4ppp0

(This order follows the documentation.)

Here is what I understand about this. As covered in the tc manual page, traffic shaping and scheduling happens only on 'egress', which is to say for outbound traffic. To handle inbound traffic, we need a level of indirection to a special ifb (Intermediate Functional Block) (also) device, that is apparently used only for our (inbound) tc qdisc.

So we have two pieces. The first is the actual traffic shaping on the IFB link, ifb4ppp0, and setting the link 'up' so that it will actually handle traffic instead of throw it away. The second is that we have to push inbound traffic on ppp0 through ifb4ppp0 to get its traffic shaping. To do this we add a special 'ingress' qdisc to ppp0, which applies to inbound traffic, and then we use a tc filter that matches all (ingress) traffic and redirects it to ifb4ppp0 as 'egress' traffic. Since it's now egress traffic, the tc shaping on ifb4ppp0 will now apply to it and do things.

When I set this up I wasn't certain if it was going to break my non-trivial firewall rules on the ppp0 interface. However, everything seems to fine, and the only thing the tc redirect is affecting is traffic shaping. My firewall blocks and NAT rules are still working.

Applying these tc rules definitely improved my latency scores on the test site; my link went from an F rating to an A rating (and a C rating for downloads and uploads happening at once). Does this improve my latency in practice for things like interactive SSH connections while downloads and uploads are happening? It's hard for me to tell, partly because I don't do such downloads and uploads very often, especially while I'm doing interactive stuff over SSH.

(Of course partly this is because I've sort of conditioned myself out of trying to do interactive SSH while other things are happening on my DSL link.)

The most I can say is that this probably improves things, and that since my DSL connection has drifted into having relatively bad latency to start with (by my standards), it probably helps to minimize how much worse it gets under load.

I do seem to get slightly less bandwidth for transfers than I did before; experimentation says that how much less can be fiddled with by adjusting the tc 'bandwidth' settings, although that also changes latency (more bandwidth creates worse latency). Given that I rarely do large downloads or uploads, I'm willing to trade off slightly lower bandwidth for (much) less of a latency hit. One reason that my bandwidth numbers are approximate anyway is that I'm not sure how much PPPoE DSL framing compensation I need.

(The Arch wiki has a page on advanced traffic control that has some discussion of tc.)

Sidebar: A rewritten command order for ingress traffic

If my understanding is correct, we can rewrite the commands to set up inbound traffic shaping to be more clearly ordered:

# Create and enable ifb link
ip link add name ifb4ppp0 type ifb
ip link set ifb4ppp0 up

# Set CAKE with bandwidth limits for
# our actual shaping, on ifb link.
tc qdisc add dev ifb4ppp0 root cake bandwidth 40Mbit besteffort

# Wire ifb link (with tc shaping) to inbound
# ppp0 traffic.
tc qdisc add dev ppp0 handle ffff: ingress
tc filter add dev ppp0 parent ffff: matchall action mirred egress redirect dev ifb4ppp0

The 'ifb4ppp0' name is arbitrary but conventional, set up as 'ifb4<whatever>'.

Distribution source packages and whether or not to embed in the source code

By: cks
10 January 2026 at 03:46

When I described my current ideal Linux source package format, I said that it should be embedded in the source code of the software being packaged. In a comment, bitprophet had a perfectly reasonable and good preference the other way:

Re: other points: all else equal I think I vaguely prefer the Arch "repo contains just the extras/instructions + a reference to the upstream source" approach as it's cleaner overall, and makes it easier to do "more often than it ought to be" cursed things like "apply some form of newer packaging instructions against an older upstream version" (or vice versa).

The Arch approach is isomorphic to the source RPM format, which has various extras and instructions plus a pre-downloaded set of upstream sources. It's not really isomorphic to the Debian source format because you don't normally work with the split up version; the split up version is just a package distribution thing (as dgit shows).

(I believe the Arch approach is also how the FreeBSD and OpenBSD ports trees work. Also, the source package format you work in is not necessarily how you bundle up and distribute source packages, again as shown by Debian.)

Let's call these two packaging options the inline approach (Debian) and the out of line approach (Arch, RPM). My view is that which one you want depends on what you want to do with software and packages. The out of line approach makes it easier to build unmodified packages, and as bitprophet comments it's easy to do weird build things. If you start from a standard template for the type of build and install the software uses, you can practically write the packaging instructions yourself. And the files you need to keep are quite compact (and if you want, it's relatively easy to put a bunch of them into a single VCS repository, each in its own subdirectory).

However, the out of line approach makes modifying upstream software much more difficult than a good version of the inline approach (such as, for example, dgit). To modify upstream software in the out of line approach you have to go through some process similar to what you'd do in the inline approach, and then turn your modifications into patches that your packaging instructions apply on top of the pristine upstream. Moving changes from version to version may be painful in various ways, and in addition to those nice compact out of line 'extras/instructions' package repos, you may want to keep around your full VCS work tree that you built the patches from.

(Out of line versus inline is a separate issue from whether or not the upstream source code should include packaging instructions in any form; I think that generally the upstream should not.)

As a system administrator, I'm biased toward easy modification of upstream packages and thus upstream source because that's most of why I need to build my own packages. However, these days I'm not sure if that's what a Linux distribution should be focusing on. This is especially true for 'rolling' distributions that mostly deal with security issues and bugs not by patching their own version of the software but by moving to a new upstream version that has the security fix or bug fix. If most of what a distribution packages is unmodified from the upstream version, optimizing for that in your (working) source package format is perfectly sensible.

The Amanda backup system and "dump promotion"

By: cks
9 January 2026 at 03:05

The Amanda backup system is what we use to handle our backups. One of Amanda's core concepts is a 'dump cycle', the amount of time between normally scheduled full backups for filesystems. If you have a dumpcycle of 7 days and Amanda does a full backup of a filesystem on Monday, its normal schedule for the next full backup is next Monday. However, Amanda can 'promote' a full backup ahead of schedule if it believes there's room for the full backup in a given backup run. Promoting full backups is a good idea in theory because it reduces how much data you need to restore a filesystem.

The amanda.conf configuration file has a per-dumptype option that affects this:

maxpromoteday int
Default: 10000. The maximum number of day[s] for a promotion, set it 0 if you don't want promotion, set it to 1 or 2 if your disks get overpromoted.

As written, I find this a little bit opaque (to be polite). What maxpromoteday controls is the maximum of how many days ahead of the normal schedule Amanda will promote a full backup. For example, if you have a 7-day dump cycle, a maxpromoteday of 2, and did a full dump of a filesystem on Monday, the earliest Amanda will possibly schedule a 'promoted' full backup is two days before next Monday, so the coming Saturday or Sunday. By extension, if you set maxpromoteday to '0', Amanda will only consider promoting a full backup of a filesystem zero days ahead of schedule, which is to say 'not at all'. Any value larger than your 'dumpcycle' setting has no effect, because Amanda is already doing full backups that often and so a larger value doesn't add any extra constraints on Amanda's scheduling of full backups.

You might wonder why you'd want to set 'maxpromoteday' down to limit full backup promotions, and naturally there is a story here.

Amanda is a very old backup system, and although it's not necessarily used with physical tapes and tape robots today (our 'tapes' are HDDs), many of its behaviors date back to that era. While the modern version of Amanda can split up a single large backup of a single (large) filesystem across multiple 'tapes', what it refuses to do is to split such a backup across multiple Amanda runs. If a filesystem backup can't be completely written out to tape in the current Amanda run, any partially written amount is ignored; the entire filesystem backup will be (re)written in the next run, using up the full space. If Amanda managed to write 90% of your large filesystem to your backup media today, that 90% is ignored because the last 10% couldn't be written out.

The consequence of this is that if you're backing up large filesystems with Amanda, you really don't want to run out of tape space during a backup run because this can waste hundreds of gigabytes of backup space (or more, if you have multi-terabyte filesystems). In environments like ours where the 'tapes' are artificial and we have a lot of them available to Amanda (our tapes a partitions on HDDs and we have a dozen HDDs or more mounted on each backup server at any given time), the best way to avoid running out of tape space during a single Amanda run is to tell Amanda that it can use a lot of tapes, way more tapes than it should ever actually need.

(Even in theory, Amanda can't perfectly estimate how much space a given full or incremental backup will actually use and so it can run over the tape capacity you actually want it to use. In practice, in many environments you may have to tell Amanda to use 'server side estimates', where it guesses based on past backup behavior, instead of the much more time-consuming 'client side estimates', where it basically does an estimation pass over each filesystem to be backed up.)

However, if you tell Amanda it can use a lot of tapes in a standard Amanda setup, Amanda will see a vast expanse of available tape capacity and enthusiastically reach the perfectly rational conclusion that it should make use of that capacity by aggressively promoting full backups of filesystems (both small and large ones). This is very much not what you (we) actually want. We're letting Amanda use tons of 'tapes' to insure that it never wastes tape space, not so that it can do extra full backups; if Amanda doesn't need to use the tape space we don't want it to touch that tape space.

The easiest way for us to achieve this is to set 'maxpromoteday 0' in our Amanda configuration, at least for Amanda servers that back up very large filesystems (where the wasted tape space of an incompletely written backup could be substantial). Unfortunately I think you'll generally want to set this for all dump types in a particular Amanda server, because over-promotion of even small(er) filesystems could eat up a bunch of tape space that you want to remain unused.

(Amanda talks about 'dumps' because it started out on Unix systems where for a long time the filesystem backup program was called 'dump'. These days your Amanda filesystem backups are probably done with GNU Tar, although I think people still talk about things like 'database dumps' for backups.)

What 24 hours of traffic looks like to our main web server in January 2026

By: cks
8 January 2026 at 03:54

One of the services we operate for the department is a traditional Apache-based shared web server, with things like people's home pages (eg), pages for various groups, and so on (we call this our departmental web server). This web server has been there for a very long time and its URLs have spread everywhere, and in the process it's become quite popular for some things. These days there are a lot of things crawling everything in sight, and our server has no general defenses against them (we don't even have much of a robots.txt).

(Technically our perimeter firewall has basic HTTP and HTTPS brute-force connection rate limits, but people typically have to really work to trigger them and they mostly don't. Although now that I look at yesterday, more IPs wound up listed than I expected, although listings normally last at most five minutes.)

The first, very noticeable thing that we have is people who do very slow downloads from us. Our server rolls over the logs at midnight, but Apache only writes a log record when a HTTP request completes, possibly to the old log file. Yesterday (Tuesday), the last log record was written at 05:24, for a request that started at 22:44. Over the 24 hours that requests were initiated in, we saw 1.2 million requests.

The two most active User-Agents were (in somewhat rounded numbers):

426000 "Mozilla/5.0 (iPhone; CPU iPhone OS 18_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.0 Mobile/15E148 Safari/604.1"
424000 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0 Safari/537.36"

The most active thing that was willing to admit it wasn't a human with a browser was "ChatGPT-User", with just under 20,000 requests. After that came "GoogleOther" and "Amazonbot", at about 12,000 requests each, then "Googlebot" with 10,000 and bingbot with about 6,000. Of course, some of those could be people impersonating the real Googlebot and bingbot.

To my surprise, the most popular HTTP result code by far was HTTP 301 Moved Permanently, at 844,000 responses (HTTP 200s were 347,000, everything else was small by comparison). And most of the requests by the those two most active User-Agents got HTTP 301 responses (roughly 418,000 each). I don't know what's going on there, but someone seems to have latched on to a lot of URLs that require redirects (which include things like directory URLs without the '/' on the end). On the positive side, most of those requests will have been pretty cheap for Apache to handle.

A single DigitalOcean IP claiming to be running Chrome 61 on 'Windows NT 10.0' made 11,000 requests, most of which got HTTP 404 errors because it was requesting URLs like '/wp-login.php'. There's no point complaining to hosting providers about this sort of thing, it's just background noise. No other single IP stood out to that degree (well, our monitoring system made over 10,000 requests, but that's expected). Google mostly crawled from a few IPs, with large counts, but other crawlers were more spread out.

To find out more traffic information, we need to go to looking at Autonomous System Numbers (ASNs), using asncounter. This reports:

 count   percent ASN     AS
 463536  36.55   210906  BITE-US, LT
 152237  12.0    212286  LONCONNECT, GB
 65064   5.13    3257    GTT-BACKBONE GTT, US
 53927   4.25    7385    ABUL-14-7385, US
 45255   3.57    8075    MICROSOFT-CORP-MSN-AS-BLOCK, US
 32557   2.57    7029    WINDSTREAM, US
 32101   2.53    55286   SERVER-MANIA, CA
 30037   2.37    15169   GOOGLE, US
 24412   1.92    239     UTORONTO-AS, CA
 21745   1.71    7015    COMCAST-7015, US
 16311   1.29    64200   VIVIDHOSTING, US
 [...]

And then for prefixes:

 count   percent prefix  ASN     AS
 64312   5.07    138.226.96.0/20 3257    GTT-BACKBONE GTT, US
 43459   3.43    85.254.128.0/22 210906  BITE-US, LT
 43161   3.4     185.47.92.0/22  210906  BITE-US, LT
 43111   3.4     45.131.216.0/22 212286  LONCONNECT, GB
 43040   3.39    45.145.136.0/22 212286  LONCONNECT, GB
 42998   3.39    45.138.248.0/22 212286  LONCONNECT, GB
 42870   3.38    185.211.96.0/22 210906  BITE-US, LT
 32365   2.55    85.254.112.0/22 210906  BITE-US, LT
 26937   2.12    66.249.64.0/20  15169   GOOGLE, US
 23785   1.88    128.100.0.0/16  239     UTORONTO-AS, CA
 23088   1.82    45.154.148.0/22 212286  LONCONNECT, GB
 21767   1.72    85.254.42.0/23  210906  BITE-US, LT
 [and then five more BITE-US prefixes at the same
  volume level, then many more prefixes]

Given that we have two extremely prolific User-Agents, let's look at where those requests came from in specific, and you will probably not be surprised at the results:

 count   percent ASN     AS
 462925  54.37   210906  BITE-US, LT
 152155  17.87   212286  LONCONNECT, GB
 64321   7.55    3257    GTT-BACKBONE GTT, US
 53649   6.3     7385    ABUL-14-7385, US
 32287   3.79    7029    WINDSTREAM, US
 31955   3.75    55286   SERVER-MANIA, CA
 21710   2.55    7015    COMCAST-7015, US
 16304   1.92    64200   VIVIDHOSTING, US
 [...]

If you have the ability to block traffic by ASN and you don't need to accept requests from clouds and your traffic is anything like this, you can probably drop a lot of it quite easily.

I can ask a different question: if we exclude those two popular User-Agents and look only at successful requests (HTTP 200 responses), where do they come from?

 count   percent ASN     AS
 38821   11.61   8075    MICROSOFT-CORP-MSN-AS-BLOCK, US
 25510   7.63    15169   GOOGLE, US
 16968   5.07    239     UTORONTO-AS, CA
 12816   3.83    14618   AMAZON-AES, US
 11529   3.45    396982  GOOGLE-CLOUD-PLATFORM, US
 [...]

(There are about 334,000 of these in total.)

The 'UTORONTO-AS' listing includes our own monitoring, with its 10,000 odd requests. Much of Google's requests come from their 66.249.64.0/20 prefix, which is mostly or entirely used by various Google crawlers.

Around 138,000 requests were for a set of commonly used ML training data, and they probably account for most of the bandwidth used by this web server (which typically averages 40 Mbytes/sec of outgoing bandwidth all of the time on weekdays).

(I've previously done HTTP/2 stats for this server as of mid 2025.)

Why we have some AC units on one of our our internal networks

By: cks
7 January 2026 at 03:13

I mentioned on the Fediverse a while back that we have air conditioners on our internal network. Well, technically what we have on the internal network is separate (and optional) controller devices that connect to the physical AC units themselves, but as they say, this is close enough. Of course there's a story here:

Why do we have networked AC controllers? Well, they control portable AC units that are in our machine rooms for emergency use, and having their controllers on our internal network means we can possibly turn them on from home if the main room AC stops working out of hours, on weekends, etc.

(It would still be a bad time, just maybe a little less bad.)

Our machine rooms are old (cf) and so are their normal AC units. Over the years we've had enough problems with these AC units that we've steadily accumulated emergency measures. A couple of years ago, these emergency measures reached the stage of pre-deploying wheeled portable AC units with their exhaust hoses connected up to places where they could vent hot air that would take it outside of the machine room.

Like most portable ACs, these units are normally controlled in person from their front panels (well, top panels). However, these are somewhat industrial AC units and you could get optional network-accessible controllers for them; after thinking about it, we did and then hooked the controllers (and thus the ACs) up to our internal management network. As I mentioned, the use case for networked control of these AC units is to turn them on from home during emergencies. They don't have anywhere near enough cooling power to cover all of the systems we normally have running in our machine rooms, but we might be able to keep a few critical systems up rather than being completely down.

(We haven't had serious AC issues since we put these portable AC units into place, so we aren't sure how well they'd perform and how much we'd be able to keep up.)

These network controllers can get status information (including temperatures) from the ACs and have some degree of support for SNMP, so we could probably pull information from them for metrics purposes if we wanted to. Right now we haven't looked into this, partly because we have our own temperature monitoring and partly because I'm not sure I trust the SNMP server implementation to be free of bugs, memory leaks, and other things that might cause problems for the overall network controller.

(Like most little things, these network controllers are probably running some terrifyingly ancient Linux kernel and software stack. A quick look at the HTTP server headers says that it's running a clearly old version of nginx on Ubuntu, although it's slightly more recent than I expected.)

Prometheus, Let's Encrypt, and making sure all our TLS certificates are monitored

By: cks
6 January 2026 at 03:11

I recently wrote about the complexities of getting programs to report the TLS certificates they use, where I theorized about writing a script to scrape this information out of places like the Apache configuration files, and then today I realized the obvious specific approach for our environment:

Obvious realization is obvious: since we universally use Let's Encrypt with certbot and follow standard naming, I can just look in /etc/letsencrypt/live to find all live TLS certificates and (a) host name for them, for cross-checking against our monitoring.

Our TLS certificates usually have multiple names associated with them, only one of which is the directory name in /etc/letsencrypt/live. However, we usually monitor the TLS certificate under what we think of as the primary name, and in any case we can make this our standard Prometheus operating procedure.

In our Prometheus environment we create a standard label for the 'host' being monitored, including for metrics obtained through Blackbox. Given that Blackbox exposes TLS certificate metrics, we can use things like direct curl queries to Prometheus to verify that we have TLS certificate monitoring for everything in /etc/letsencrypt/live. The obvious thing to check is that we have a probe_ssl_earliest_cert_expiry metric with the relevant 'host' value for each Let's Encrypt primary name.

If we want to, we can go further by looking at probe_ssl_last_chain_info. This Blackbox metric directly exposes labels for the TLS 'subject' and 'subjectalternative', so we can in theory search them for either the primary name that Let's Encrypt will be using or for what we consider an important name to be covered. It appears that this wouldn't be needed to cover any additional TLS certificates for us, as we're already checking everything under its primary name.

(Well, we are after I found one omission in a manual check today.)

With the right tools (also), I don't need to make this a pre-written shell script that runs on each machine; instead, I can do this centrally by hand every so often. On the one hand this isn't as good as automating it, but on the other hand every bit of locally built automation is another bit of automation we have to maintain ourselves. We mostly haven't had a problem with tracking TLS certificates, and we have other things to notice failures.

(I should probably write a personal script to do this, just to capture the knowledge.)

Some notes to myself on Super-based bindings in GNU Emacs

By: cks
5 January 2026 at 03:11

I recently had to deal with GNU Emacs lsp-mode in a context where I cared a bit about its keybindings, and in the process of that ran across mention of what one could call its leader prefix, s-l. People who use GNU Emacs a lot will know what this specific 's-' notation means, but I'm not one of them, so it took me a bit of research to work it out. This is GNU Emacs' notation for 'Super', one of the theoretical extra key modifiers that you can have on keyboards.

(I suspect that lsp-mode uses s-l as its prefix on its key bindings because everything else good is taken.)

My impression is that it's normal for Unix desktop environments to have a key mapped to 'Super', often the left 'Microsoft' key; this is the case in my unusual X desktop environment. On Windows and macOS machines, you can apparently set up mappings in GNU Emacs itself as covered by Xah Lee in "Emacs Keys: Super Hyper" (via). This gives me a working Super key (if I remember it, which I hopefully will now) when I'm using a GUI GNU Emacs that has direct access to relatively raw key information, either locally or on a server with X forwarding.

However, things aren't so good for me if I'm using GNU Emacs in any sort of terminal window. Unlike Alt, for which there's a standard way to handle it in terminals, there appears to be no special handling for Super in either xterm or Gnome-Terminal. Super plus a regular character gives me the regular character, both locally and over SSH connections. In this environment, the only way to access Super-based bindings is with the special and awkward GNU Emacs way to add Super (and Hyper) to key sequences. For Super, this is 'C-x @ s ...', and you can see why I'm not enthused about typing it all that often. In practice, I'm more likely to invoke obscure (to me) lsp-mode things through M-x and orderless.

Fortunately, I think lsp-mode is the only thing that has Super bindings in my usual GNU Emacs environment, which means this is something I mostly won't need to care about. Given the challenges in using Super, I'll avoid any temptation to bind my own things with it. I also suspect that there's pretty much no hope for (Unix) terminal emulators and the terminal environment to add support for it, which will probably discourage other Emacs addons from using it.

(I did a crude search of all of the .el files I use and no obvious Super bindings turned up other than lsp-mode's.)

A small suggestion in modern Linux: take screenshots (before upgrades)

By: cks
4 January 2026 at 03:50

Mike Hoye recently wrote Powering Up, which is in part about helping people install (desktop) Linux, and the Fediverse thread version of it reminded me of something that I don't do enough of:

A related thing I've taken to doing before potential lurching changes (like Linux distribution upgrades) is to take screenshots and window images. Because comparing a now and then image is a heck of a lot easier than restoring backups, and I can look at it repeatedly as I fix things on the new setup.

Linux distributions and the software they package have a long history of deciding to change things for your own good. They will tinker with font choices, font sizes, default DPI determinations, the size of UI elements, and so on, not quite at the drop of a hat but definitely when you do something like upgrade your distribution and bring in a bunch of significant package version changes (and new programs to replace old programs).

Some people are perfectly okay with these changes. Other people, like me, are quite attached to the specifics of how their current desktop environment looks and will notice and be unhappy about even relatively small changes (eg, also). However, because we're fallible humans, people like me can't always recognize exactly what changed and remember exactly what the old version looked like (these two are related); instead, sometimes all we have is the sense that something changed but we're not quite sure exactly what or exactly how.

Screenshots and window images are the fix for that unspecific feeling. Has something changed? You can call up an old screenshot to check, and to example what (and then maybe work out how to reverse it, or decide to live with the change). Screenshots aren't perfect; for example, they won't necessarily tell you what the old fonts were called or what sizes were being used. But they're a lot better than trying to rely on memory or other options.

It would probably also do me good to get into the habit of taking screenshots periodically, even outside of distribution upgrades. Looking back over time every so often is potentially useful to see more subtle, more long term changes, and perhaps ask myself either why I'm not doing something any more or why I'm still doing it.

(Currently I'm somewhat lackadasical about taking screenshots even before distribution upgrades. I have a distribution upgrade process but I haven't made screenshots part of it, and I don't have an explicit checklist for the process. Which I definitely should create. Possibly I should also try to capture font information in text form, to the extent that I can find it.)

The complexities of getting programs to report the TLS certificates they use

By: cks
3 January 2026 at 03:17

One of the practical reasons that TLS certificates have dangerous expiry times is that in most environments, it's up to you to remember to add monitoring for each TLS certificate that you use, either as part of general purpose monitoring of the service or specific monitoring for certificate expiry. It would be nice if programs that used TLS certificates inherently monitored their expiry, but that's a fairly big change (for example, you have to decide how to send alerts about that information). A nominally easier change would be for programs routinely to be able to report what TLS certificates they're using, either as part of normal metrics and log messages or through some additional command line switch.

(If your program uses TLS certificates and it has some sort of built in way of reporting metrics, it would be very helpful to system administrators if it reported basic TLS certificate metrics like the 'notAfter' time.)

In a lot of programs, this would be relatively straightforward (in theory). A common pattern is for programs to read in all of the TLS certificates they're going to use on startup, before they drop privileges, which means that these programs reliably know what all of those certificates are (and some programs will abort if some TLS certificates can't be read). They could then report the TLS certificate file paths on startup, either as part of their regular startup or in a special 'just report configuration information' mode. In many cases, one could write your own script that scanned the program's configuration files and did a reasonably good job of finding all of the TLS certificate filenames (and you could then make it report the names those TLS certificates were for, and cross-check this against your existing monitoring).

(I should probably write such a script for our Apache environment, because adding TLS based virtual hosts and then forgetting to monitor them is something we could definitely do.)

However, not all programs are straightforward this way. There are some programs that can at least potentially generate the TLS certificate file name on the fly at runtime (for example, Exim's settings for TLS certificate file names are 'expanded strings' that might depend on connection parameters). And even usually straightforward programs like Apache can have conditional use of TLS certificates, although this probably will only leave you doing some extra monitoring of unused TLS certificates (let's assume you're not using SSLCertificateFile token identifiers). These programs would probably need to log TLS certificate filenames on their first use, assuming that they cache loaded TLS certificates rather than re-read them from scratch every time they're necessary.

There's also no generally obvious and good way to expose this information, which means that logging it or printing it out is only the first step and not necessarily deeply useful by itself. If programs put it into logs, people have to pull it out of logs; if programs report it from the command line, people need to write additional tooling. If a program has built in metrics that it exposes in some way, exposing metrics for any TLS certificates it uses is great, but most programs don't have their own metrics and statistics systems.

(Still, it would be nice if programs supported this first step.)

A Go question: how do you test select based code?

By: cks
2 January 2026 at 03:24

A while back I wrote an entry about understanding reading all available things from a Go channel (with a timeout), where the code used two selects to, well, let me quote myself:

The goal of waitReadAll() is to either receive (read) all currently available items from a channel (possibly a buffered one) or to time out if nothing shows up in time. This requires two nested selects, with the inner one in a for loop.

In a recent comment on that entry, Aristotle Pagaltzis proposed a code variation that only used a single select:

func waitReadAll[T any](c chan T, d time.Duration) ([]T, bool) {
    var out []T
    for {
        select {
        case v, ok := <-c:
            if !ok {
               return out, false
    	       }
            out = append(out, v)

        case <-time.After(d):
            if len(out) == 0 {
               return out, true
            }

        default:
            return out, true
        }
    }
}

Aristotle Pagaltzis wrote tests for this code in the Go playground, but despite passing those tests, this code has an intrinsic bug that means it can't work as designed. The bug is that if this code is entered with nothing in the channel, the default case is immediately triggered rather than it waiting for the length of the timeout. When I saw this code, I was convinced it had the bug and so I tried to modify the Go playground code to have a test that would expose the bug. However, I couldn't find an easy way to do so at the time, and even now my attempts have been somewhat awkward, so at the least I think it's not obvious how to do this.

In Go 1.25 (and later), the primary tool for testing synchronization and concurrency is the testing/synctest package (also). Running our hypothetical test with synctest.Test() do it in an environment where time won't advance arbitrarily on us, insuring that the timeout in waitReadAll() won't trigger before we can do other things, like send to the channel. To create ordering in our case, I believe we can use synctest.Wait(). Consider this sketched code inside a synctest.Test():

c := make(chan int)
// sending goroutine:
go func() {
    // Point 1
    synctest.Wait()
    // Point 2
    time.Sleep(1*time.Second)
    c <- 1
}

// Point 3 (receiving goroutine)
out, ok = waitReadAll(c, 2*time.Second)
// assert ok and len(out) == 1

The synctest.Wait() in the sending goroutine at point 1 will wait until everything is 'durably blocked'; the first durable block point is in theory a working select inside waitReadAll(), called at point 3 in a different goroutine. Then in our sending goroutine at point 2 we use time.Sleep() to wait less than the timeout, forcing ordering, and finally we send to the channel, which waitReadAll() should pick up before it times out. This (and a related test for a timeout) works properly with a working waitReadAll(), but it took a bunch of contortions to avoid having it panic in various ways with the buggy version of waitReadAll(). I'm also not convinced my testing code is completely correct.

(Some of the initial panics came from me learning that you often want to avoid using t.Fatal() inside a synctest bubble; instead you want to call t.Error() and arrange to have the rest of your code still work right.)

Effectively I'm using synctest to try to create an ordering of events between two goroutines without modifying any code to have explicit locking or synchronization. Synctest doesn't completely serialize execution but it does create predictable 'durable blocking' points where I know where everything is if things are working correctly. But it's awkward, and I can't directly wait and check for a blocked select at point 1.

Synctest also makes certain things that normally would be races into safer, probably race-free operations. Consider a version of this test with a bit more checking:

c := make(chan int)
readall := false
go func() {
    // Point 1
    synctest.Wait()
    // Point 2
    time.Sleep(1*time.Second)
    if readall {
       // failure!
    }
    c <- 1
}

// Point 3
out, ok = waitReadAll(c, 2*time.Second)
readall = true
// assert ok and len(out) == 1

Because of how synctest.Wait() and time work within synctest bubbles, I believe in theory the only way that the two goroutines can access readall at the same time is if waitReadAll() is delaying for the same amount of time as our sending goroutine (instead of the amount of time we told it to). But the whole area is alarmingly subtle and I'm not sure I'm right.

(One of the synctest examples uses an unguarded variable in broadly this way.)

It's entirely possible that there's an easier way to do this sort of testing of select expressions, and I'd certainly hope so. However, synctest itself is quite new, so perhaps there's no better way right now. Also, possibly this sort of low level testing isn't necessary very often in practice. Both Aristotle Pagaltzis and I are in a sort of artificial situation where we're narrowly focused on a single peculiar function.

A little bit of complex design in phone "Level" applications

By: cks
1 January 2026 at 02:24

Modern smartphones have a lot of sensors; for example, they often have sensors that will report the phone's orientation and when it changes (which is used for things like 'wake up the screen when you pick up the phone'). One of the uses for these sensors is for little convenience applications, such as a "Level" app that uses the available sensors to report when the phone is level so you can use it as a level, sometimes for trivial purposes.

For years, this application seemed pretty trivial and obvious to me, with the only somewhat complex bit being figuring out how the person is holding the phone to determine which sort of level they wanted and then adjusting the display to clearly reflect that (while keeping it readable, something that Apple's current efforts partially fail). Then I had a realization:

Today's random thought: Your phone, like mine, probably has a "Level" app, which is most naturally used with the phone on its side for better accuracy, including resting on top of (or below) things. Your phone (also like mine) probably has buttons on the sides that make its sides not 100% straight and level end to end (because the buttons make bumps). So, how does the Level app deal with that? Does it have a range of 'close enough to level', or some specific compensation, or button detection?

(By 'on its side' I meant with the long side of the phone, as opposed to the top or the bottom, which are often flat and button-less. You can also use the phone as a level horizontally, on top of a flat surface, where you have the bump of the camera lenses to worry about.)

My current phone has a noticeable camera bump, and the app I use to get relatively raw sensor data suggests that there's a detectable, roughly 1.5 degree difference in tilt between resting all of the phone on a surface and just having the phone case edge around the camera bump on the surface (which should make the phone as 'level' as possible). However, once it's reached a horizontal '0 degrees' level, the "Level" app will treat both of them as equivalent (I can tilt the phone back and forth without disturbing the green level marking). This isn't just the Level app being deliberately imprecise; before I achieve a horizontal 0 degrees level, the "Level" app does respond to tilting the phone back and forth, typically changing its tilt reading by a degree.

(Experimentation suggests that the side buttons create less tilt, probably under a degree, and also that the Level app probably ignores that tilt when it's reached 0 degrees of tilt. It may ignore such small changes in tilt in general, and there's certainly some noise in the sensor readings.)

As a system administrator and someone who peers into technology for fun, I'm theoretically well aware that often there's more behind the scenes than is obvious. But still, it can surprise me when I notice an aspect of something I've been using for years without thinking about it. There's a lot of magic that goes into making things work the way we expect them to (for example, digital microwaves doing what you want with time; this Level app behavior also sort of falls under the category of 'good UI').

My ideal Linux source package format (at the moment)

By: cks
31 December 2025 at 04:25

I've written recently on why source packages are complicated and why packages should be declarative (in contrast to Arch style shell scripts), but I haven't said anything about what I'd like in a source package format, which will mostly be from the perspective of a system administrator who sometimes needs to modify upstream packages or package things myself.

A source package format is a compromise. After my recent experiences with dgit, I now feel that the best option is that a source package is a VCS repository directory tree (Git by default) with special control files in a subdirectory. Normally this will be the upstream VCS repository with packaging control files and any local changes merged in as VCS commits. You perform normal builds in this checked out repository, which has the advantage of convenience and the disadvantage that you have to clean up the result, possibly with liberal use of 'git clean' and 'git reset'. Hermetic builds are done by some tool that copies the checked out files to a build area, or clones the repository, or some other option. If a binary package is built in an environment where this information is available, its metadata should include the exact current VCS commit it was built from, and I would make binary packages not build if there were uncommitted changes.

(Making the native source package a VCS tree with all of the source code makes it easy to work on but mingles package control files with the program source. In today's environment with good distributed VCSes I think this is the right tradeoff.)

The control files should be as declarative as possible, and they should directly express major package metadata such as version numbers (unlike the Debian package format, where the version number is derived from debian/changelog). There should be a changelog but it should be relatively free-form, like RPM changelogs. Changelogs are especially useful for local modifications because they go along with the installed binary package, which means that you can get an answer to 'what did we change in this locally modified package' without having to find your source. The main metadata file that controls everything should be kept simple; I would go as far as to say it should have a format that doesn't allow for multi-line strings, and anything that requires multi-line strings should go in additional separate files (including the package description). You could make it TOML but I don't think you should make it YAML.

Both the build time actions, such as configuring and compiling the source, and the binary package install time actions should by default be declarative; you should be able to say 'this is an autoconf based program and it should have the following additional options', and the build system will take care of everything else. Similarly you should be able to directly express that the binary package needs certain standard things done when it's installed, like adding system users and enabling services. However, this will never be enough so you should also be able to express additional shell script level things that are done to prepare, build, install, upgrade, and so on the package. Unlike RPM and Debian source packages but somewhat like Arch packages, these should be separate files in the control directory, eg 'pkgmeta/build.sh'. Making these separate files makes it much easier to do things like run shellcheck on them or edit them in syntax-aware editor environments.

(It should be possible to combine standard declarative prepare and build actions with additional shell or other language scripting. We want people to be able to do as much as possible with standard, declarative things. Also, although I used '.sh', you should be able to write these actions in other languages too, such as Python or Perl.)

I feel that like RPMs, you should have to at least default to explicitly declaring what files and directories are included in the binary package. Like RPMs, these installed files should be analyzed to determine the binary package dependencies rather than force you to try to declare them in the (source) package metadata (although you'll always have to declare build dependencies in the source package metadata). Like build and install scripts, these file lists should be in separate files, not in the main package metadata file. The RPM collection of magic ways to declare file locations is complex but useful so that, for example, you don't have to keep editing your file lists when the Python version changes. I also feel that you should have to specifically mark files in the file lists with unusual permissions, such as setuid or setgid bits.

The natural way to start packing something new in this system would be to clone its repository and then start adding the package control files. The packaging system could make this easier by having additional tools that you ran in the root of your just-cloned repository and looked around to find indications of things like the name, the version (based on repository tags), the build system in use, and so on, and then wrote out preliminary versions of the control files. More tools could be used incrementally for things like generating the file lists; you'd run the build and 'install' process, then have a tool inventory the installed files for you (and in the process it could recognize places where it should change absolute paths into specially encoded ones for things like 'the current Python package location').

This sketch leaves a lot of questions open, such as what 'source packages' should look like when published by distributions. One answer is to publish the VCS repository but that's potentially quite heavyweight, so you might want a more minimal form. However, once you create a 'source only' minimal form without the VCS history, you're going to want a way to disentangle your local changes from the upstream source.

Linux distribution packaging should be as declarative as possible

By: cks
30 December 2025 at 01:49

A commentator on my entry on why Debian and RPM (source) packages are complicated suggested looking at Arch Linux packaging, where most of the information is in a single file as more or less a shell script (example). Unfortunately, I'm not a fan of this sort of shell script or shell script like format, ultimately because it's only declarative by convention (although I suspect Arch enforces some of those conventions). One reason that declarative formats are important is that you can analyze and understand what they do without having to execute code. Another reason is that such formats naturally standardize things, which makes it much more likely that any divergence from the standard approach is something that matters, instead of a style difference.

Being able to analyze and manipulate declarative (source) packaging is useful for large scale changes within a distribution. The RPM source package format uses standard, more or less declarative macros to build most software, which I understand has made it relatively easy to build a lot of software with special C and C++ hardening options. You can inject similar things into a shell script based environment, but then you wind up with ad-hoc looking modifications in some circumstances, as we see in the Dovecot example.

Some things about declarative source packages versus Arch style minimalism are issues of what could be called 'hygiene'. RPM packages push you to list and categorize what files will be included in the built binary package, rather than simply assuming that everything installed into a scratch hierarchy should be packaged. This can be frustrating (and there are shortcuts), but it does give you a chance to avoid accidentally shipping unintended files. You could do this with shell script style minimal packaging if you wanted to, of course. Both RPM and Debian packages have standard and relatively declarative ways to modify a pristine upstream package, and while you can do that in Arch packages, it's not declarative, which hampers various sorts of things.

Basically my feeling is that at scale, you're likely to wind up with something that's essentially as formulaic as a declarative source package format without having its assured benefits. There will be standard templates that everyone is supposed to follow and they mostly will, and you'll be able to mostly analyze the result, and that 'mostly' qualification will be quietly annoying.

(On the positive side, the Arch package format does let you run shellcheck on your shell stanzas, which isn't straightforward to do in the RPM source format.)

Expiry times are dangerous, on "The dangers of SSL certificates"

By: cks
29 December 2025 at 03:48

Recently I read Lorin Hochstein's The dangers of SSL certificates (via, among others), which talks about a Bazel build workflow outage caused by an expired TLS certificate. I had some direct reactions to this but after thinking about it I want to step back and say that in general, it's clear that expiry times are dangerous, often more or less regardless of where they appear. TLS certificate expiry times are an obvious and commonly encountered instance of expiry times in cryptography, but TLS certificates aren't the only case; in 2019, Mozilla had an incident where the signing key for Firefox addons expired (I believe the system used certificates, but not web PKI TLS certificates). Another thing that expires is DNS data (not just DNSSEC keys) and there have been incidents where expiring DNS data caused problems. Does a system have caches with expiry times? Someone has probably had an incident where things expired by surprise.

One of the problems with expiry times in general is that they're usually implemented as an abrupt cliff. On one side of the expiry time everything is fine and works perfectly, and one second later on the other side of the expiry time everything is broken. There's no slow degradation, no expiry equivalent of 'overload', and so on, which means that there's nothing indirect to notice and detect in advance. You must directly check and monitor the expiry time, and if you forget, things explode. We're fallible humans so we forget every so often.

This abrupt cliff of failure is a technology choice. In theory we could begin degrading service some time before the expiry time, or we could allow some amount of success for a (short) time after the expiry time, but instead we've chosen to make things be a boolean choice (which has made time synchronization across the Internet increasingly important; your local system can no longer be all that much out of step with Internet time if things are to work well). This is especially striking because expiry times are most often a heuristic, not a hard requirement. We add expiry times to limit hypothetical damage, such as silent key compromise, or constrain how long out of data DNS data is given to people, or similar things, but we don't usually have particular knowledge that the key or data cannot and must not be used after a specific time (for example, because the data will definitely have changed at that point).

(Of course the mechanics of degrading the service around the expiry time are tricky, especially in a way that the service operator would notice or get reports about.)

Another problem, related to the abrupt cliff, is that generally expiry times are invisible or almost invisible. Most APIs and user interfaces don't really surface the expiry time until you fall over the cliff; generally you don't even get warnings logged that an expiry time is approaching (either in clients or in servers and services). We implicitly assume that expiry times will never get reached because something will handle the situation before then. Invisible expiry times are fine if they're never reached, but if they're hit as an abrupt cliff you have the worst of two worlds. Again, this isn't a simple problem with an obvious solution; for example, you might need things to know or advertise what is a dangerously close expiry time (if you report the expiry time all of the time, it becomes noise that is ignored; that's already effectively the situation with TLS certificates, where tools will give you all the notAfter dates you could ask for and no one bothers looking).

Some protocols do without expiry times entirely; SSH keypairs are one example (unless you use SSH certificates, but even then the key that signs certificates has no expiry). This has problems and risks that make it not suitable for all environments. If you're working in an environment that has and requires expiry times, another option is to simply set them as far in the future as possible. If you don't expect the thing to ever expire and have no process for replacing it, don't set its expiry time to ten years. But not everything can work this way; your DNS entries will change sooner or later, and often in much less than ten years.

Why Debian and RPM (source) packages are complicated

By: cks
28 December 2025 at 02:44

A commentator on my early notes on dgit mentioned that they found packaging in Debian overly complicated (and I think perhaps RPMs as well) and would rather build and ship a container. On the one hand, this is in a way fair; my impression is that the process of specifying and building a container is rather easier than for source packages. On the other hand, Debian and RPM source packages are complicated for good reasons.

Any reasonably capable source package format needs to contain a number of things. A source package needs to supply the original upstream source code, some amount of distribution changes, instructions for building and 'installing' the source, a list of (some) dependencies (for either or both build time and install time), a list of files and directories it packages, and possibly additional instructions for things to do when the binary package is installed (such as creating users, enabling services, and so on). Then generally you need some system for 'hermetic' builds, ones that don't depend on things in your local (Linux) login environment. You'll also want some amount of metadata to go with the package, like a name, a version number, and a description. Good source package formats also support building multiple binary packages from a single source package, because sometimes you want to split up the built binary files to reduce the amount of stuff some people have to install. A built binary package contains a subset of this; it has (at least) the metadata, the dependencies, a file list, all of the files in the file list, and those install and upgrade time instructions.

Built containers are a self contained blob plus some metadata. You don't need file lists or dependencies or install and removal actions because all of those are about interaction with the rest of the system and by design containers don't interact with the rest of the system. To build a container you still need some of the same information that a source package has, but you need less and it's deliberately more self-contained and freeform. Since the built container is a self contained artifact you don't need a file list, I believe it's uncommon to modify upstream source code as part of the container build process (instead you patch it in advance in your local repository), and your addition of users, activation of services, and so on is mostly free form and at container build time; once built the container is supposed to be ready to go. And my impression is that in practice people mostly don't try to do things like multiple UIDs in a single container.

(You may still want or need to understand what things you install where in the container image, but that's your problem to keep track of; the container format itself only needs a little bit of information from you.)

Containers have also learned from source packages in that they can be layered, which is to say that you can build your container by starting from some other container, either literally or by sticking another level of build instructions on the end. Layered source packages don't make any sense when you're thinking like a distribution, but they make a lot of sense for people who need to modify the distribution's source packages (this is what dgit makes much easier, partly because Git is effectively a layering system; that's one way to look at a sequence of Git commits).

(My impression of container building is that it's a lot more ad-hoc than package building. Both Debian and RPM have tried to standardize and automate a lot of the standard source code building steps, like running autoconf, but the cost of this is that each of them has a bespoke set of 'convenient' automation to learn if you want to build a package from scratch. With containers, you can probably mostly copy the upstream's shell-based build instructions (or these days, their Dockerfile).)

Dgit based building of (potentially modified) Debian packages can be surprisingly close to the container building experience. Like containers, you first prepare your modifications in a repository and then you run some relatively simple commands to build the artifacts you'll actually use. Provided that your modifications don't change the dependencies, files to be packaged, and so on, you don't have to care about how Debian defines and manipulates those, plus you don't even need to know exactly how to build the software (the Debian stuff takes care of that for you, which is to say that the Debian package builders have already worked it out).

In general I don't think you can get much closer to the container build experience other than the dgit build experience or the general RPM experience (if you're starting from scratch). Packaging takes work because packages aren't isolated, self contained objects; they're objects that need to be integrated into a whole system in a reversible way (ie, you can uninstall them, or upgrade them even though the upgraded version has a somewhat different set of files). You need more information, more understanding, and a more complicated build process.

(Well, I suppose there are flatpaks (and snaps). But these mostly don't integrate with the rest of your system; they're explicitly designed to be self-contained, standalone artifacts that run in a somewhat less isolated environment than containers.)

Python 2, GNU Emacs, and my LSP environment combine to shoot me in the foot

By: cks
26 December 2025 at 21:50

So I had a thing happen:

This is my angry face that GNU Emacs appears to have re-indented my entire Python file to a different standard without me noticing and I didn't catch it in time. And also it appears impossible in GNU Emacs to FIX this. I do not want four space no tabs, this is historical code that all files should be eight spaces with tabs (yes, Python 2).

That 'Python 2' bit turns out to be load-bearing. The specific problem turned out to be that if I hit TAB with a region selected or M-q when GNU Emacs point was outside a comment, the entire file was reformatted to modern 4-space indents (and long expressions got linewrapped, and some other formatting changes). I'm not sure which happened to trigger the initial reformatting that I didn't notice in time, but I suspect I was trying to use M-q to reflow a file level comment block and had my cursor (point) in the wrong spot. My TAB and M-q bindings are standard, and when I investigated deeply enough I discovered that this was LSP related.

The first thing I learned is that just 'turning off' LSP mode with 'lsp-mode' (or 'M-: (lsp-mode -1))' isn't enough to actually turn off LSP based indentation handling. This is discussed in lsp-mode issue #824, and apparently the solution is some combination of deactivating an additional minor mode, invoking lsp-disconnect through M-x (or using the 's-l w D' key binding if you have Super available), or setting lsp-enable-indentation to 'nil' (probably as a buffer-local variable, although tastes may differ).

The second thing I discovered is that in my environment this doesn't happen for Python 3 code. With my normal Python 3 GNU Emacs LSP environment, using python-lsp-server (pylsp) (also), the LSP environment will make no changes and report 'No formatting changes provided'. My problem only happens in Python 2 buffers, and that's because in Python 2 buffers I wasn't using pylsp (which only officially supports Python 3 code) but instead the older and now unsupported pyls. Either pyls has always behaved differently than pylsp when the LSP server asks it to do formatting stuff, or at some point the LSP protocol and expectations around formatting actions changed and pyls (which has been unmaintained since 2020) didn't change to keep up.

My immediate fix was to set lsp-enable-indentation to nil in my GNU Emacs lsp-mode hook for python-mode. As a longer term thing I'm going to experiment with using pylsp even for Python 2 code, to see how it goes. Otherwise I may wind up disabling LSP for Python 2 code and buffers, although that's somewhat tricky since there's no explicit separate settings for Python 2 versus Python 3. Another immediate fix is that in the future I may be editing this particular code base more in vi(m) or perhaps sam than GNU Emacs.

(My Python 2 code is mostly or entirely written using tabs for indentation, so the presence of leading tabs is a reliable way of detecting 'Python 2' code.)

PS: This particular Python 2 program is DWiki, the wiki engine underlying Wandering Thoughts, so while it will move to Python 3 someday and I once got a hacked version vaguely running that way, it's not going to happen any time soon for multiple reasons.

We should probably write some high level overviews of our environment

By: cks
26 December 2025 at 03:28

Over on the Fediverse, I shared an old story that's partly about (system) documentation, and it sparked a thought, which is that we (I) should write up a brief high level overview of our overall environment. This should probably be one level higher than an end of service writeup, which are focused on a specific service (if we write them at all). The reason to do this is because our regular documentation assumes a lot of context and part of that context is what our overall environment is. We know what the environment is because it's the water we work in, but a new person arriving here could very easily be lost.

What I'm thinking of is something as simple as saying (in a bit more words) that we store our data on a bunch of NFS fileservers and people get access to their home directories and so on by logging in to various multi-user Unix servers that all run Ubuntu Linux, or using various standard services like email (IMAP and webmail), Samba/CIFS file access, and printing. Our logins and passwords are distributed around as files from a central password server and a central NFS-mounted filesystem. There's some more that I would write here (including information about our networks) and I'd probably put in a bit more details about some names of the various servers and filesystems, but not too much more.

(At least not in the front matter. Obviously such an overview could get increasingly detailed in later sections.)

A bunch of this information is already on our support website in some form, but I feel the support website is both too detailed and not complete enough. It's too detailed because it's there to show people how to do things, and it's not complete because we deliberately omit some things that we consider implementation details (such as our NFS fileservers). A new person here should certainly read all the way through the support site sooner or later, but that's a lot of information to absorb. A high level overview is a quick start guide that's there to orient people and leave them with fewer moments of 'wait, you have a what?' or 'what is this even talking about?' as they're exposed to our usual documentation.

One reason to keep the high level overview at a high level is that the less specific it is, the less it's going to fall out of date as things change. Updating such a high level overview is always going to be low on the priority list, since it's almost never used, so the less updating it needs the better. Also, I can also write somewhat more detailed high level overviews of specific aspects or sub-parts of our environment, if I find myself feeling that the genuine high level version doesn't say enough. Another reason to keep it high level is to keep it short, because asking a new person to read a couple of pages (at most) as high level orientation is a lot better than throwing them into the deep end with dozens of pages and thousands of words.

(I'm writing this down partly to motivate myself to do this when we go back to work in the new year, even though it feels both trivial and obvious. I have to remind myself that the obvious things about our environment to me are that way partly because I'm soaking in it.)

Some notes on using the Sec-CH-UA HTTP headers that Chrome supports

By: cks
25 December 2025 at 02:39

A while back, Chrome proposed and implemented what are called user agent hints, which are a collection of Sec-CH-UA HTTP headers that can provide you with additional information about the browser beyond what the HTTP User-Agent header provides. As mentioned, only Chrome and browsers derived from Chromium (or if you prefer, 'Blink') support these headers, and only since early 2021 (for Chrome; later for some others). However, Chrome is what a lot of people use. More to the point, Chrome is what a lot of bad crawlers claim to be in their User-Agent header. As has been written up by other people, you can use these headers to detect inconsistencies that give away crawlers.

In an ideal world, it would be enough to detect a recent enough Chrome version and then require it to be consistent between the User-Agent, the platform from Sec-CH-UA-Platform, and the version information from Sec-CH-UA. We don't live in an ideal world. The first issue is that some versions of Chrome don't send these user agent hints by default (I've seen this specifically from Android Pixel devices). To get them to do so, you must reply with a HTTP 307 redirection that includes Accept-CH and Critical-CH headers for the Sec-CH-UA headers you care about. I'm not sure if you can redirect the browser to the current URL; I opt to redirect to the URL with a special query parameter added, which then redirects back to the original version of the URL.

(One advantage of this is that in my HTTP request handling, I can reject a request with the special query parameter if it still doesn't including the Sec-CH-UA headers I ask for. This avoids infinite redirect loops and lets me log definite failures. Chrome browser setups that refuse to provide them even when requested are currently redirected to an error page explaining the situation.)

Cross checking the browser version from Sec-CH-UA against the 'browser version' in the User-Agent is complicated by the question of what is a browser version. This is especially the case because the 'brand names' used in Sec-CH-UA aren't necessarily the '<whatever>/<ver>' names used in the User-Agent; for example, Microsoft Edge will report itself as 'Microsoft Edge' in Sec-CH-UA but 'Edg/' in the User-Agent. Some browsers based on Chrome will report a Chrome version that is the same as their brand name version (this appears to be true for Edge, for example), but others definitely won't, so you may need a mapping table from brand name to User-Agent name if you want to go that far. Sometimes the best you can do is verify the claimed 'Chromium' version against the 'Chrome/' version from the User-Agent.

Platform names definitely require a mapping from the Sec-CH-UA-Platform value to what appears in the User-Agent. On top of that, sometimes browsers will change their User-Agent platform name without changing Sec-CH-UA-Platform. One case I know of is that some versions of Android Opera (and perhaps Chrome) will change their User-Agent to say they're on Linux if you have them ask for the 'desktop' version of a site, but still report the Android values in their Sec-CH-UA headers (and say that they aren't a mobile device in Sec-CH-UA-Mobile, which is fair enough). It's hard to object to this behavior in a world where User-Agent sniffing is one way that websites decide on regular versus 'mobile' versions.

My use of Sec-CH-UA checks so far here on Wandering Thoughts has turned up several sorts of bad behavior in crawlers (so far). As I sort of expected, the most common behavior is crawlers that claim to be Chrome in their User-Agent (or something derived from it) but don't supply any Sec-CH-UA headers; this is now a straightforward bad idea even if you mention your crawler in your User-Agent. Some crawlers report one Chrome version in Sec-CH-UA but another one in their User-Agent, usually with the User-Agent version being older. I suspect that these crawlers are based on Chromium and periodically update their Chromium version, but statically configure their User-Agent and don't update it. Some of these crawlers also report a different platform between Sec-CH-UA-Platform and their User-Agent (so far all of them have been running on macOS but saying they were Windows 10 or 11 machines in their User-Agent). The third case is things that report they are headless Chrome in their Sec-CH-UA header (and I reject them).

(This is where the Internet Archive gets a dishonorable mention; currently their crawling often has mismatched User-Agent and Sec-CH-UA headers. Sometimes they have a special marker in the User-Agent and sometimes it's just mismatched Chrome information.)

I've also seen some weird cases so far where a crawler provided Sec-CH-UA headers despite claiming to be Firefox in its User-Agent. My data so far is incomplete, but some of these have had mismatches between Sec-CH-UA-Platform and the User-Agent, while another claimed to be Chrome 88 (which in theory is before Chrome supported them) while saying it was Firefox 120 in its User-Agent. I've improved my logging and error reporting so I may get slightly better data on this in a while.

At the same time, checking Sec-CH-UA headers (and checking them against User-Agent headers) will definitely not defeat all bad crawlers. Some crawlers are clearly using either real browsers or software that fakes everything together properly. I suspect the latter because the most recent case involves a horde of IPs claiming to be Chrome 142 on macOS 10.15.7, which I doubt is so universal a configuration (especially on datacenter VPSes and servers). As with email spam, all of this is a constant race of heuristics against the bad actors.

(It's hard to judge my new Sec-CH-UA checks compared to my existing header checks because of check ordering. If I was sufficiently energetic I'd try to do all of the checks before rejecting anything and log all failed checks, but as it is I do checks one by one and reject (or redirect with Critical-CH) at the first failed one.)

Moving local package changes to a new Ubuntu release with dgit

By: cks
23 December 2025 at 23:55

Suppose, not entirely hypothetically, that you've made local changes to an Ubuntu package on one Ubuntu release, such as 22.04 ('jammy'), and now you want to move to another Ubuntu release such as 24.04 ('noble'). If you're working with straight 'apt-get source' Ubuntu source packages, this is done by tediously copying all of your patches over (hopefully the package uses quilt) to duplicate and recreate your 22.04 work.

If you're using dgit, this is much easier. Partly this is because dgit is based on Git, but partly this is because dgit has an extremely convenient feature where it can have several different releases in the same Git repository. So here's what we want to do, assuming you have a dgit repository for your package already.

(For safety you may want to do this in a copy of your repository. I make rsync'd copies of Git repositories all the time for stuff like this.)

Our first step is to fetch the new 24.04 ('noble') version of the package into our dgit repository as a new dgit branch, and then check out the branch:

dgit fetch -d ubuntu noble,-security,-updates
dgit checkout noble,-security,-updates

We could do this in one operation but I'd rather do it in two, in case there are problems with the fetch.

The Git operation we want to do now is to cherry-pick (also) our changes to the 22.04 version of the package onto the 24.04 version of the package. If this goes well the changes will apply cleanly and we're done. However, there is a complication. If we've followed the usual process for making dgit-based local changes, the last commit on our 22.04 version is an update to debian/changelog. We don't want that change, because we need to do our own 'gbp dch' on the 24.04 version after we've moved our own changes over to make our own 24.04 change to debian/changelog (among other things, the 22.04 changelog change has the wrong version number for the 24.04 package).

In general, cherry-picking all our local changes is 'git cherry-pick old-upstream..old-local'. To get all but the last change, we want 'old-local~' instead. Dgit has long and somewhat obscure branch names; its upstream for our 22.04 changes is 'dgit/dgit/jammy,-security,-updates' (ie, the full 'suite' name we had to use with 'dgit clone' and 'dgit fetch'), while our local branch is 'dgit/jammy,-security,-updates'. So our full command, with a 'git log' beforehand to be sure we're getting what we want, is:

git log dgit/dgit/jammy,-security,-updates..dgit/jammy,-security,-updates~
git cherry-pick dgit/dgit/jammy,-security,-updates..dgit/jammy,-security,-updates~

(We've seen this dgit/dgit/... stuff before when doing 'gbp dch'.)

Then we need to make our debian/changelog update. Here, as an important safety tip, don't blindly copy the command you used while building the 22.04 package, using 'jammy,...' in the --since argument, because that will try to create a very confused changelog of everything between the 22.04 version of the package and the 24.04 version. Instead, you obviously need to update it to your new 'noble' 24.04 upstream, making it:

gbp dch --since dgit/dgit/noble,-security,-updates --local .cslab. --ignore-branch --commit

('git reset --hard HEAD~' may be useful if you make a mistake here. As they say, ask me how I know.)

If the cherry-pick doesn't apply cleanly, you'll have to resolve that yourself. If the cherry-pick applies cleanly but the result doesn't build or perhaps doesn't work because the code has changed too much, you'll be using various ways to modify and update your changes. But at least this is a bunch easier than trying to sort out and update a quilt-based patch series.

Appendix: Dealing with Ubuntu package updates

Based on this conversation, if Ubuntu releases a new version of the package, what I think I need to do is to use 'dgit fetch' and then explicitly rebase:

dgit fetch -d ubuntu

You have to use '-d ubuntu' here or 'dgit fetch' gets confused and fails. There may be ways to fix this with git config settings, but setting them all is exhausting and if you miss one it explodes, so I'm going to have to use '-d ubuntu' all the time (unless dgit fixes this someday).

Dgit repositories don't have an explicit Git upstream set, so I don't think we can use plain rebase. Instead I think we need the more complicated form:

git rebase dgit/dgit/jammy,-security,-updates dgit/jammy,-security,-updates

(Until I do it for real, these arguments are speculative. I believe they should work if I understand 'git rebase' correctly, but I'm not completely sure. I might need the full three argument form and to make the 'upstream' a commit hash.)

Then, as above, we need to drop our debian/changelog change and redo it:

git reset --hard HEAD~
gbp dch --since dgit/dgit/jammy,-security,-updates --local .cslab. --ignore-branch --commit

(There may be a clever way to tell 'git rebase' to skip the last change, or you can do an interactive rebase (with '-i') instead of a non-interactive one and delete it yourself.)

Early notes about using dgit on Ubuntu (LTS)

By: cks
23 December 2025 at 04:25

I recently read Ian Jackson's Debian’s git transition (via) and had a reaction:

I would really like to be able to patch and rebuild Ubuntu packages from a git repository with our local changes (re)based on top of upstream git. It would be much better than quilt'ing and debuild'ing .dsc packages (I have non-complimentary opinions on the Debian source package format). This news gives me hope that it'll be possible someday, but especially for Ubuntu I have no idea how soon or how well documented it will be.

(It could even be better than RPMs.)

The subsequent discussion got me to try out dgit, especially since it had an attractive dgit-user(7) manual page that gave very simple directions on how to make a local change to an upstream package. It turns out that things aren't entirely smooth on Ubuntu, but they're workable.

The starting point is 'dgit clone', but on Ubuntu you currently get to use special arguments that aren't necessary on Debian:

dgit clone -d ubuntu dovecot jammy,-security,-updates

(You don't have to do this on a machine running 'jammy' (Ubuntu 22.04); it may be more convenient to do it from another one, perhaps with a more up to date dgit.)

The latest Ubuntu package for something may be in either their <release>-security or their <release>-updates 'suite', so you need both. I think this is equivalent to what 'apt-get source' gets you, but you might want to double check. Once you've gotten the source in a Git repository, you can modify it and commit those modifications as usual, for example through Magit. If you have an existing locally patched version of the package that you did with quilt, you can import all of the quilt patches, either one by one or all at once and then using Magit's selective commits to sort things out.

Having made your modifications, whether tentative or otherwise, you can now automatically modify debian/changelog:

gbp dch --since dgit/dgit/jammy,-security,-updates --local .cslab. --ignore-branch --commit

(You might want to use -S for snapshots when testing modifications and builds, I don't know. Our practice is to use --local to add a local suffix on the upstream package number, so we can keep our packages straight.)

The special bit is the 'dgit/dgit/<whatever you used in dgit clone>', which tells gbp-dch (part of the gbp suite of stuff) where to start the changelog from. Using --commit is optional; what I did was to first run 'gbp dch' without it, then use 'git diff' to inspect the resulting debian/changelog changes, and then 'git restore debian/changelog' and re-run it with a better set of options until eventually I added the '--commit'.

You can then install build-deps (if necessary) and build the binary packages with the dgit-user(7) recommended 'dpkg-buildpackage -uc -b'. Normally I'd say that you absolutely want to build source packages too, but since you have a Git repository with the state frozen that you can rebuild from, I don't think it's necessary here.

(After the build finishes you can admire 'git status' output that will tell you just how many files in your source tree the Debian or Ubuntu package building process modified. One of the nice things about using Git and building from a Git repository is that you can trivially fix them all, rather than the usual set of painful workarounds.)

The dgit-user(7) manual page suggests but doesn't confirm that if you're bold, you can build from a tree with uncommitted changes. Personally, even if I was in the process of developing changes I'd commit them and then make liberal use of rebasing, git-absorb, and so on to keep updating my (committed) changes.

It's not clear to me how to integrate upstream updates (for example, a new Ubuntu update to the Dovecot package) with your local changes. It's possible that 'dgit pull' will automatically rebase your changes, or give you the opportunity to do that. If not, you can always do another 'dgit clone' and then manually import your Git changes as patches.

(A disclaimer: at this point I've only cloned, modified, and built one package, although it's a real one we use. Still, I'm sold; the ability to reset the tree after a build is valuable all by itself, never mind having a better way than quilt to handle making changes.)

❌
❌