❌

Reading view

There are new articles available, click to refresh the page.

Another thing V7 Unix gave us is environment variables

By: cks

Simon Tatham recently wondered "Why is PATH called PATH?". This made me wonder the closely related question of when environment variables appeared in Unix, and the answer is that the environment and environment variables appeared in V7 Unix as another of the things that made it so important to Unix history (also).

Up through V6, the exec system call and family of system calls took two arguments, the path and the argument list; we can see this in both the V6 exec(2) manual page and the implementation of the system call in the kernel. As bonus trivia, it appears that the V6 exec() limited you to 510 characters of arguments (and probably V1 through V5 had a similarly low limit, but I haven't looked at their kernel code).

In V7, the exec(2) manual page now documents a possible third argument, and the kernel implementation is much more complex, plus there's an environ(5) manual page about it. Based on h/param.h, V7 also had a much higher size limit on the combined sized of arguments and environment variables, which isn't all that surprising given the addition of the environment. Commands like login.c were updated to put some things into the new environment; login sets a default $PATH and a $HOME, for example, and environ(5) documents various other uses (which I haven't checked in the source code).

This implies that the V7 shell is where $PATH first appeared in Unix, where the manual page describes it as 'the search path for commands'. This might make you wonder how the V6 shell handled locating commands, and where it looked for them. The details are helpfully documented in the V6 shell manual page, and I'll just quote what it has to say:

If the first argument is the name of an executable file, it is invoked; otherwise the string `/bin/' is prepended to the argument. (In this way most standard commands, which reside in `/bin', are found.) If no such command is found, the string `/usr' is further prepended (to give `/usr/bin/command') and another attempt is made to execute the resulting file. (Certain lesser-used commands live in `/usr/bin'.)

('Invoked' here is carrying some extra freight, since this may not involve a direct kernel exec of the file. An executable file that the kernel didn't like would be directly run by the shell.)

I suspect that '$PATH' was given such as short name (instead of a longer, more explicit one) simply as a matter of Unix style at the time. Pretty much everything in V7 was terse and short in this style for various reasons, and verbose environment variable names would have reduced that limited exec argument space.

Python argparse and the minor problem of a variable valid argument count

By: cks

Argparse is the standard Python module for handling arguments to command line programs, and because for small programs, Python makes using things outside the standard library quite annoying, it's the one I use in my Python based utility programs. Recently I found myself dealing with a little problem where argparse doesn't have a good answer, partly because you can't nest argument groups.

Suppose, not hypothetically, that you have a program that can properly take zero, two, or three command line arguments (which are separate from options), and the command line arguments are of different types (the first is a string and the second two are numbers). Argparse makes it easy to handle having either two or three arguments, no more and no less; the first two arguments have no nargs set, and then the third sets 'nargs="?"'. However, as far as I can see argparse has no direct support for handling the zero-argument case, or rather for forbidding the one-argument one.

(If the first two arguments were of the same type we could easily gather them together into a two-element list with 'nargs=2', but they aren't, so we'd have to tell argparse that both are strings and then try the 'string to int' conversion of the second argument ourselves, losing argparse's handling of it.)

If you set all three arguments to 'nargs="?"' and give them usable default values, you can accept zero, two, or three arguments, and things will work if you supply only one argument (because the second argument will have a usable default). This is the solution I've adopted for my particular program because I'm not stubborn enough to try to roll my own validation on top of argparse, not for a little personal tool.

If argparse supported nested groups for arguments, you could potentially make a mutually exclusive argument group that contained two sub-groups, one with nothing in it and one that handled the two and three argument case. This would require argparse not only to support nested groups but to support empty nested groups (and not ignore them), which is at least a little bit tricky.

Alternately, argparse could support a global specification of what numbers of arguments are valid. Or it could support a 'validation' callback that is called with information about what argparse detected and which could signal errors to argparse that argparse handled in its standard way, giving you uniform argument validation and error text and so on.

Unix had good reasons to evolve since V7 (and had to)

By: cks

There's a certain sort of person who feels that the platonic ideal of Unix is somewhere around Research Unix V7 and it's almost all been downhill since then (perhaps with the exception of further Research Unixes and then Plan 9, although very few people got their hands on any of them). For all that I like Unix and started using it long ago when it was simpler (although not as far back as V7), I reject this view and think it's completely mistaken.

V7 Unix was simple but it was also limited, both in its implementation (which often took shortcuts (also, also, also) and in its overall features (such as short filenames). Obviously V7 didn't have networking, but even things that most people think of as perfectly reasonable and good Unix features like '#!' support for shell scripts in the kernel and processes being in multiple groups at once. That V7 was a simple and limited system meant that its choices were to grow to meet people's quite reasonable needs or to fall out of use.

(Some of these needs were for features and some of them were for performance. The original V7 filesystem was quite simple but also suffered from performance issues, ones that often got worse over time.)

I'll agree that the path that the growth of Unix has taken since V7 is not necessarily ideal; we can all point to various things about modern Unixes that we don't like. Any particular flaws came about partly because people don't necessarily make ideal decisions and partly because we haven't necessarily had perfect understandings of the problems when people had to do something, and then once they'd done something they were constrained by backward compatibility.

(In some ways Plan 9 represents 'Unix without the constraint of backward compatibility', and while I think there are a variety of reasons that it failed to catch on in the world, that lack of compatibility is one of them. Even if you had access to Plan 9, you had to be fairly dedicated to do your work in a Plan 9 environment (and that was before the web made it worse).)

PS: It's my view that the people who are pushing various Unixes forward aren't incompetent, stupid, or foolish. They're rational and talented people who are doing their best in the circumstances that they find themselves. If you want to throw stones, don't throw them at the people, throw them at the overall environment that constrains and shapes how everything in this world is pushed to evolve. Unix is far from the only thing shaped in potentially undesirable ways by these forces; consider, for example, C++.

(It's also clear that a lot of people involved in the historical evolution of BSD and other Unixes were really quite smart, even if you don't like, for example, the BSD sockets API.)

Mostly stopping GNU Emacs from de-iconifying itself when it feels like it

By: cks

Over on the Fediverse I had a long standing GNU Emacs gripe:

I would rather like to make it so that GNU Emacs never un-iconifies itself when it completes (Lisp-level) actions. If I have Emacs iconified I want it to stay that way, not suddenly appear under my mouse cursor like an extremely large modal popup. (Modal popups suck, they are a relic of single-tasking windowing environments.)

For those of you who use GNU Emacs and have never been unlucky enough to experience this, if you start some long operation in GNU Emacs and then decide to iconify it to get it out of your face, a lot of the time GNU Emacs will abruptly pop itself back open when it finishes, generally with completely unpredictable timing so that it disrupts whatever else you switched to in the mean time.

(This only happens in some X environments. In others, the desktop or window manager ignores what Emacs is trying to do and leaves it minimized in your taskbar.)

To cut straight to the answer, you can avoid a lot of this with the following snippet of Emacs Lisp:

(add-to-list 'display-buffer-alist '(t nil (inhibit-switch-frame . t)))

I believe that this has some side effects but that these side effects will generally be that Emacs doesn't yank around your mouse focus or suddenly raise windows to be on top of everything.

GNU Emacs doesn't have a specific function that it calls to de-iconify a frame, what Emacs calls a top level window. Instead, the deiconification happens in C code inside C-level functions like raise-frame and make-frame-visible, which also do other things and which are called from many places. For instance, one of make-frame-visible's jobs is actually displaying the frame's X level window if it doesn't already exist on the screen.

(There's an iconify-or-deiconify-frame function but if you look that's a Lisp function that calls make-frame-visible. It's only used a little bit in the Emacs Lisp code base.)

A determined person could probably hook these C-level functions through advice-add to make them do nothing if they were called on an existing, mapped frame that was just iconified. That would be the elegant way to do what I want. The inelegant way is to discover, via use of the Emacs Lisp debugger, that everything I seem to care about is going through 'display-buffer' (eventually calling window--maybe-raise-frame), and that display-buffer's behavior can be customized to not 'switch frames', which will wind up causing things to not call window--maybe-raise-frame and not de-iconify GNU Emacs windows on me.

To understand display-buffer-alist I relied on Demystifying Emacs’s Window Manager. My addition to display-buffer-alist has three elements:

  • the t tells display-buffer to always use this alist entry.
  • the nil tells display-buffer that I don't have any special action functions I want to use here and it should just use its regular ones. I think an empty list might be more proper here, but nil works.
  • the '(inhibit-switch-frame . t)' sets the important customization, which will be merged with any other things set by other (matching) alist entries.

The net effect is that 'display-buffer' will see 'inhibit-switch-frame' set for every buffer it's asked to switch to, and so will not de-iconify, raise, or otherwise monkey around with frame things in the process of displaying buffers. It's possible that this will have undesirable side effects in some circumstances, but as far as I can tell things like 'speedbar' and 'C-x 5 <whatever>' still work for me afterward, so new frames are getting created when I want them to be.

(I could change the initial 't' to something more complex, for example to only apply this to MH-E buffers, which is where I mostly encounter the problem. See Demystifying Emacs’s Window Manager for a discussion of how to do this based on the major mode of the buffer.)

To see if you're affected by this, you can run the following Emacs Lisp in the scratch buffer and then immediately minimize or iconify the window.

(progn
  (sleep-for 5)
  (display-buffer "*scratch*"))

If you're affected, the Emacs window will pop back open in a few seconds (five or less, depending on how fast you minimized the window). If the Emacs window stays minimized or iconified, your desktop environment is probably overriding whatever Emacs is trying to do.

For me this generally happens any time some piece of Emacs Lisp code is taking a long time to get a buffer ready for display and then calls 'display-buffer' at the end to show the buffer. One trigger for this is if the buffer to be displayed contains a bunch of unusual Unicode characters (possibly ones that my font doesn't have anything for). The first time the characters are used, Emacs will apparently stall working out how to render them and then de-iconify itself if I've iconified it out of impatience.

(It's quite possible that there's a better way to do this, and if so I'd love to know about it.)

Sending drawing commands to your display server versus sending images

By: cks

One of the differences between X and Wayland is that in the classical version of X you send drawing commands to the server while in Wayland you send images; this can be called server side rendering versus client side rendering. Client side rendering doesn't preclude a 'network transparent' display protocol, but it does mean that you're shipping around images instead of drawing commands. Is this less efficient? In thinking about it recently, I realized that the answer is that it depends on a number of things.

Let's start out by assuming that the display server and the display clients are equally powerful and capable as far as rendering the graphics goes, so the only question is where the rendering happens (and what makes it better to do it in one place instead of another). The factors that I can think of are:

  • How many different active client (machines) there are; if there are enough, the active client machines have more aggregate rendering capacity than the server does. But probably you don't usually have all that many different clients all doing rendering at once (that would be a very busy display).

  • The number of drawing commands as compared to the size of the rendered result. In an extreme case in favor of client side rendering, a client executes a whole bunch of drawing commands in order to render a relatively small image (or window, or etc). In an extreme case the other way, a client can send only a few drawing commands to render a large image area.
  • The amount of input data the drawing commands need compared to the output size of the rendered result. An extreme case in favour of client side rendering is if the client is compositing together a (large) stack of things to produce a single rendered result.
  • How efficiently you can encode (and decode) the rendered result or the drawing commands (and their inputs). There's a tradeoff of space used to encoding and decoding time, where you may not be able to afford aggressive encoding because it gets in the way of fast updates.

    What these add up to is the aggregate size of the drawing commands and all of the inputs that they need relative to the rendered result, possibly cleverly encoded on both sides.

  • How much changes from frame to frame and how easily you can encode that in some compact form. Encoding changes in images is a well studied thing (we call it 'video'), but a drawing command model might be able to send only a few commands to change a little bit of what it sent previously for an even bigger saving.

    (This is affected by how a server side rendering server holds the information from clients. Does it execute their draw commands then only retain the final result, as X does, or does it hold their draw commands and re-execute them whenever it needs to re-render things? Let's assume it holds the rendered result, so you can draw over it with new drawing commands rather than having to send a new full set of 'draw this from now onward' commands.)

    A pragmatic advantage of client side rendering is that encoding image to image changes can be implemented generically after any style of rendering; all you need is to retain a copy of the previous frame (or perhaps more frames than that, depending). In a server rendering model, the client needs specific support for determining a set of drawing operations to 'patch' the previous result, and this doesn't necessarily cooperate with an immediate mode approach where the client regenerates the entire set of draw commands from scratch any time it needs to re-render a frame.

I was going to say that the network speed is important too but while it matters, what I think it does is magnifies or shrinks the effect of the relative size of drawing commands compared to the final result. The faster and lower latency your network is, the less it matters if you ship more data in aggregate. On a slow network, it's much more important.

There's probably other things I'm missing, but even with just these I've wound up feeling that the tradeoffs are not as simple and obvious as I believed before I started thinking about it.

(This was sparked by an offhand Fediverse remark and joke.)

Getting decent error reports in Bash when you're using 'set -e'

By: cks

Suppose that you have a shell script that's not necessarily complex but is at least long. For reliability, you use 'set -e' so that the script will immediately stop on any unexpected errors from commands, and sometimes this happens. Since this isn't supposed to happen, it would be nice to print some useful information about what went wrong, such as where it happened, what the failing command's exit status was, and what the command was. The good news is that if you're willing to make your script specifically a Bash script, you can do this quite easily.

The Bash trick you need is:

trap 'echo "Exit status $? at line $LINENO from: $BASH_COMMAND"' ERR

This uses three Bash features: the special '$LINENO' and '$BASH_COMMAND' environment variables (which have the command executed just before the trap and the line number), and the special 'ERR' Bash 'trap' condition that causes your 'trap' statement to be invoked right when 'set -e' is causing your script to fail and exit.

Using 'ERR' instead of 'EXIT' (or '0' if you're a traditionalist like me) is necessary in order to get the correct line number in Bash. If you switch this to 'trap ... EXIT', the line number that Bash will report is the line that the 'trap' was defined on, not the line that the failing command is on (although the command being executed remains the same). This makes a certain amount of sense from the right angle; the shell is currently on that line as it's exiting.

As far as I know, no other version of the Bourne shell can do all of this. The OpenBSD version of /bin/sh has a '$LINENO' variable and 'trap ... 0' preserves its value (instead of resetting it to the line of the 'trap'), but it has no access to the current command. The FreeBSD version of /bin/sh resets '$LINENO' to the line of your 'trap ... 0', so the best you can do is report the exit status. Dash, the Ubuntu 24.04 default /bin/sh, doesn't have '$LINENO', effectively putting you in the same situation as FreeBSD.

(On Fedora, /bin/sh is Bash, and the Fedora version of Bash supports all of 'trap .. ERR', $LINENO, and $BASH_COMMAND even when invoked as '#!/bin/sh' by your script. You probably shouldn't count on this; if you want Bash, use '#!/bin/bash'.)

NFS v4 delegations on a Linux NFS server can act as mandatory locks

By: cks

Over on the Fediverse, I shared an unhappy learning experience:

Linux kernel NFS: we don't have mandatory locks.
Also Linux kernel NFS: if the server has delegated a file to a NFS client that's now not responding, good luck writing to the file from any other machine. Your writes will hang.

NFS v4 delegations are an feature where the NFS server, such as your Linux fileserver, hands a lot of authority over a particular file over to a client that is using that file. There are various sorts of delegations, but even a basic read delegation will force the NFS server to recall the delegation if anything else wants to write to the file or to remove it. Recalling a delegation requires notifying the NFS v4 client that it has lost the delegation and then having the client accept and respond to that. NFS v4 clients have to respond to the loss of a delegation because they may be holding local state that needs to be flushed back to the NFS server before the delegation can be released.

(After all the NFS v4 server promised the client 'this file is yours to fiddle around with, I will consult you before touching it'.)

Under some circumstances, when the NFS v4 server is unable to contact the NFS v4 client, it will simply sit there waiting and as part of that will not allow you to do things that require the delegation to be released. I don't know if there's a delegation recall timeout, although I suspect that there is, and I don't know how to find out what the timeout is, but whatever the value is, it's substantial (it may be the 90 second 'default lease time' from nfsd4_init_leases_net(), or perhaps the 'grace', also probably 90 seconds, or perhaps the two added together).

(90 seconds is not what I consider a tolerable amount of time for my editor to completely freeze when I tell it to write out a new version of the file. When NFS is involved, I will typically assume that something has gone badly wrong well before then.)

As mentioned, the NFS v4 RFC also explicitly notes that NFS v4 clients may have to flush file state in order to release their delegation, and this itself may take some time. So even without an unavailable client machine, recalling a delegation may stall for some possibly arbitrary amount of time (depending on how the NFS v4 server behaves; the RFC encourages NFS v4 servers to not be hasty if the client seems to be making a good faith effort to clear its state). Both the slow client recall and the hung client recall can happen even in the absence of any actual file locks; in my case, the now-unavailable client merely having read from the file was enough to block things.

This blocking recall is effectively a mandatory lock, and it affects both remote operations over NFS and local operations on the fileserver itself. Short of waiting out whatever timeout applies, you have two realistic choices to deal with this (the non-realistic choice is to reboot the fileserver). First, you can bring the NFS client back to life, or at least something that's at its IP address and responds to the server with NFS v4 errors. Second, I believe you can force everything from the client to expire through /proc/fs/nfsd/clients/<ID>, by writing 'expire' to the client's 'ctl' file. You can find the right client ID by grep'ing for something in all of the clients/*/info files.

Discovering this makes me somewhat more inclined than before to consider entirely disabling 'leases', the underlying kernel feature that is used to implement these NFS v4 delegations (I discovered how to do this when investigating NFS v4 client locks on the server). This will also affect local processes on the fileserver, but that now feels like a feature since hung NFS v4 delegation recalls will stall or stop even local operations.

Projects can't be divorced from the people involved in them

By: cks

Among computer geeks, myself included, there's a long running optimistic belief that projects can be considered in isolation and 'evaluated on their own merits', divorced from the specific people or organizations that are involved with them and the culture that they have created. At best, this view imagines that we can treat everyone involved in the development of something as a reliable Vulcan, driven entirely by cold logic with no human sentiment involved. This is demonstrably false (ask anyone about the sharp edge of Linus Torvalds' emails), but convenient, at least for people with privilege.

(A related thing is considering projects in isolation from the organizations that create and run them, for example ignoring that something is from 'killed by' Google.)

Over time, I have come to understand and know that this is false, much like other things I used to accept. The people involved with a project bring with them attitudes and social views, and they create a culture through their actions, their expressed views, and even their presence. Their mere presence matters because it affects other people, and how other people will or won't interact with the project.

(To put it one way, the odds that I will want to be involved in a project run by someone who openly expresses their view that bicyclists are the scum of the earth and should be violently run off the road are rather low, regardless of how they behave within the confines of the project. I'm not a Vulcan myself and so I am not going to be able to divorce my interactions with this person from my knowledge that they would like to see me and my bike club friends injured or dead.)

You can't divorce a project from its culture or its people (partly because the people create and sustain that culture); the culture and the specific people are entwined into how 'the project' (which is to say, the crowd of people involved in it) behaves, and who it attracts and repels. And once established, the culture of a project, like the culture of anything, is very hard to change, partly because it acts as a filter for who becomes involved in the project. The people who create a project gather like-minded people who see nothing wrong with the culture and often act to perpetuate it, unless the project becomes so big and so important that other people force their way in (usually because a corporation is paying them to put up with the initial culture).

(There is culture everywhere. C++ has a culture (or several), for example, as does Rust. Are they good cultures? People have various opinions that I will let you read about yourself.)

Realizing we needed two sorts of alerts for our temperature monitoring

By: cks

We have a long standing system to monitor the temperatures of our machine rooms and alert us if there are problems. A recent discussion about the state of the temperature in one of them made me realize that we want to monitor and alert for two different problems, and because they're different we need two different sorts of alerts in our monitoring system.

The first, obvious problem is a machine room AC failure, where the AC shuts off or becomes almost completely ineffective. In our machine rooms, an AC failure causes a rapid and sustained rise in temperature to well above its normal maximum level (which is typically reached just before the AC starts its next cooling cycle). AC failures are high priority issues that we want to alert about rapidly, because we don't have much time before machines start to cook themselves (and they probably won't shut themselves down before the damage has been done).

The second problem is an AC unit that can't keep up with the room's heat load; perhaps its filters are (too) clogged, or it's not getting enough cooling from the roof chillers, or various other mysterious AC reasons. The AC hasn't failed and it is still able to cool things to some degree and keep the temperature from racing up, but over time the room's temperature steadily drifts upward. Often the AC will still be cycling on and off to some degree and we'll see the room temperature vary up and down as a result; at other things the room temperature will basically reach a level and more or less stay there, presumably with the AC running continuously.

One issue we ran into is that a fast triggering alert that was implicitly written for the AC failure case can wind up flapping up and down if insufficient AC has caused the room to slowly drift close to its triggering temperature level. As the AC works (and perhaps cycles on and off), the room temperature will shift above and then back below the trigger level, and the alert flaps.

We can't detect both situations with a single alert, so we need at least two. Currently, the 'AC is not keeping up' alert looks for sustained elevated temperatures with the temperature always at or above a certain level over (much) more time than the AC should take to bring it down, even if the AC has to avoid starting for a bit of time to not cycle too fast. The 'AC may have failed' alert looks for high temperatures over a relatively short period of time, although we may want to make this an average over a short period of time.

(The advantage of an average is that if the temperature is shooting up, it may trigger faster than a 'the temperature is above X for Y minutes' alert. The drawback is that an average can flap more readily than a 'must be above X for Y time' alert.)

Checklists are hard (but still a good thing)

By: cks

We recently had a big downtime at work where part of the work was me doing a relatively complex and touchy thing. Naturally I made a checklist, but also naturally my checklist turned out to be incomplete, with some things I'd forgotten and some steps that weren't quite right or complete. This is a good illustration that checklists are hard to create.

Checklists are hard partly because they require us to try to remember, reconstruct, and understand everything in what's often a relatively complex system that is too big for us to hold in our mind. If your understanding is incomplete you can overlook something and so leave out a step or a part of a step, and even if you write down a step you may not fully remember (and record) why the step has to be there. My view is that this is especially likely in system administration where we may have any number of things that have been quietly sitting in the corner for some time, working away without problems, and so they've slipped out of our minds.

(For example, one of the issues that we ran into in this downtime was not remembering all of the hosts that ran crontab jobs that used one particular filesystem. Of course we thought we did know, so we didn't try to systematically look for such crontab jobs.)

To get a really solid checklist you have to be able to test it, much like all documentation needs testing. Unfortunately, a lot of the checklists I write (or don't write) are for one-off things that we can't really test in advance for various reasons, for example because they involve a large scale change to our live systems (that requires a downtime). If you're lucky you'll realize that you don't know something or aren't confident in something while writing the checklist, so you can investigate it and hopefully get it right, but some of the time you'll be confident you understand the problem but you're wrong.

Despite any imperfections, checklists are still a good thing. An imperfect written down checklist is better than relying on your memory and mind on the fly almost all of the time (the rare exceptions are when you wouldn't even dare do the operation without a checklist but an imperfect checklist tempts you into doing it and fumbling).

(You can try to improve the situation by keeping notes on what was missed in the checklist and then saving or publishing these notes somewhere. You can review these after the fact notes on what was missed in this specific checklist if you have to do the thing again, or look for specific types of things you tend to overlook and should specifically check for the next time you're making a checklist that touches on some area.)

A logic to Apache accepting query parameters for static files

By: cks

One of my little web twitches is the lax handling of unknown query parameters. As part of this twitch I've long been a bit irritated that Apache accepts query parameters even on static files, when they definitely have no meaning at all. You could say that this is merely Apache being accepting in general, but recently I noticed a combination of Apache features that can provide an additional reason for Apache to do this.

Apache has various features to redirect from old URLs on your site to new URLs, such as Redirect and RewriteRule. As covered in the relevant documentation for each of them, these rewrites preserve query parameters (although for RewriteRule you can turn that off with the QSD flag). This behavior makes sense in a lot of cases; if you've moved an application from one URL to another (or from one host to another) and it uses query parameters, you almost certainly want the query parameters to carry over with the HTTP redirection that people using old URLs will get.

(Here by 'an application' I mean anything that accepts and acts on query parameters. It might be a CGI, a PHP page or set of pages, a reverse proxy to something else, a Django application implemented with mod_wsgi, or various other things.)

A lot of the time if you use a redirect in Apache on URLs for an application, you'll be sending people to the new location of that application or its replacement. However, some of the time you'll be redirecting from an application to a static page, for example a page that says "this application has gone away". At least by default, your redirection from the application to the static page will carry query parameters along with it, and it would be a bad experience (for the people visiting and you) if the default result was that Apache served some sort of error page because it received query parameters on a static file.

(A closely related change is replacing a single-URL application, such as a basic CGI, with a static web page. Maybe the whole thing is no longer supported, or maybe everything now has a single useful response regardless of query parameters. Here again you can legitimately receive query parameters on a static file.)

Realizing this made me more sympathetic to Apache's behavior of accepting query parameters on static files. It's a relatively reasonable pragmatic choice even if (like me) you're not one of the people who feel unknown query parameters should always be ignored (which is the de facto requirement on the modern web, so my feelings about it are irrelevant).

Why Ubuntu 24.04's ls can show a puzzling error message on NFS filesystems

By: cks

Suppose that you're on Ubuntu 24.04, using NFS v4 filesystems mounted from a Linux NFS fileserver, and at some point you do a 'ls -l' or a 'ls -ld' of something you don't own. You may then be confused and angered:

; /bin/ls -ld ckstst
/bin/ls: ckstst: Permission denied
drwx------ 64 ckstst [...] 131 Jul 17 12:06 ckstst

(There are situations where this doesn't happen or doesn't repeat, which I don't understand but which I'm assuming are NFS caching in action.)

If you apply strace to the problem, you'll find that the failing system call is listxattr(2), which is trying to list 'extended attributes'. On Ubuntu 24.04, ls comes from Coreutils, and Coreutils apparently started using listxattr() in version 9.4.

The Linux NFS v4 code supports extended attributes (xattrs), which are from RFC 8276; they're supported in both the client and the server since mid-2020 if I'm reading git logs correctly. Both the normal Ubuntu 22.04 LTS and 24.04 LTS server kernels are recent enough to include this support on both the server and clients, and I don't believe there's any way to turn just them off in the kernel server (although if you disable NFS v4.2 they may disappear too).

However, the NFS v4 server doesn't treat listxattr() operations the way the kernel normally does. Normally, the kernel will let you do listxattr() on an object (a directory, a file, etc) that you don't have read permissions on, just as it will let you do stat() on it. However, the NFS v4 server code specifically requires that you have read access to the object. If you don't, you get EACCES (no second S).

(The sausage is made in nfsd_listxattr() in fs/nfsd/vfs.c, specifically in the fh_verify() call that uses NFSD_MAY_READ instead of NFSD_MAY_NOP, which is what eg GETATTR uses.)

In January of this year, Coreutils applied a workaround to this problem, which appeared in Coreutils 9.6 (and is mentioned in the release notes).

Normally we'd have found this last year, but we've been slow to roll out Ubuntu 24.04 LTS machines and apparently until now no one ever did a 'ls -l' of unreadable things on one of them (well, on a NFS mounted filesystem).

(This elaborates on a Fediverse post. Our patch is somewhat different than the official one.)

Two tools I've been using to look into my web traffic volume

By: cks

These days, there's an unusually large plague of web crawlers, many of them attributed to LLM activities and most of them acting anonymously, with forged user agents and sometimes widely distributed source IPs. Recently I've been using two tools more and more to try to identify and assess suspicious traffic sources.

The first tool is Anarcat's asncounter. Asncounter takes IP addresses, for example from your web server logs, and maps them to ASNs (roughly who owns an IP address) and to CIDR netblocks that belong to those ASNs (a single ASN can have a lot of netblocks). This gives you information like:

count   percent ASN     AS
1460    7.55    24940   HETZNER-AS, DE
[...]
count   percent prefix  ASN     AS
1095    5.66    66.249.64.0/20  15169   GOOGLE, US
[...]
85      0.44    49.13.0.0/16    24940   HETZNER-AS, DE
85      0.44    65.21.0.0/16    24940   HETZNER-AS, DE
82      0.42    138.201.0.0/16  24940   HETZNER-AS, DE
71      0.37    135.181.0.0/16  24940   HETZNER-AS, DE
68      0.35    65.108.0.0/16   24940   HETZNER-AS, DE
[...]

While Hetzner is my biggest traffic source by ASN, it's not my biggest source by 'prefix' (a CIDR netblock), because this Hetzner traffic is split up across a bunch of their networks. Since most software operates by CIDR netblocks, not by ASNs, this difference can be important (and unfortunate if you want to block all traffic from a particular ASN).

The second tool is grepcidr. Grepcidr will let you search through a log file, such as your web server logs, for traffic from any particular netblock (or a group of netblocks), such as Google's '66.249.64.0/20'. This lets me find out what sort of requests came from a potentially suspicious network block, for example 'grepcidr 49.13.0.0/16 /var/log/...'. If what I see looks suspicious and has little or no legitimate traffic, I can consider taking steps against that netblock.

Asncounter is probably not (yet) packaged in your Linux distribution. Grepcidr may be, but if it's not it's a C program and simple to compile.

(It wouldn't be too hard to put together an 'asngrep' that would cut out the middleman, but I've so far not attempted to do this.)

PS: Both asncounter and grepcidr can be applied to other sorts of logs with IP addresses, for example sources of SSH brute force password scans. But my web logs are all that I've used them for so far.

People want someone to be responsible for software that fails

By: cks

There are various things in the open source tech news these days, like bureaucratic cybersecurity risk assessment request to open source projects, maintainers rejecting the current problematic approach to security issues (also), and 'software supply chain security'. One of my recent thoughts as a result of all of this is that the current situation is fundamentally unsustainable, and one part of it is because people are increasingly going to require someone to be held responsible for software that fails and does damage ('damage' in an abstract sense; people know it when they see it).

This isn't anything unique or special with software. People feel the same way about buildings, bridges, vehicles, food, and anything else that actually matters to leading a regular life, and eventually they've managed to turn that feeling into concrete results for most things. Software has so far had a long period of not being held to account, but then once upon a time so did food and food safety and food has always been very important to people while software spent a long time not being visibly a big deal (or, if you prefer, not being as visibly slipshod as it is today, when a lot more people are directly exposed to a lot more software and thus to its failings).

The bottom line is that people don't consider it (morally) acceptable when no one is held responsible for either negligence or worse, deliberate choices that cause harm. A field can only duck and evade their outrage for so long; sooner or later it stops being able to shrug and walk away. Software is now systematically important in the world, which means that its failings can do real harm, and people have noticed.

(Which is to say that an increasing number of people have been harmed by software and didn't like it, and the number and frequency is only going to go up.)

There are a lot of ways that this could happen, with the EU CRA being only one of them; as various drafts of the EU CRA has shown, there are a lot of ways that things could go badly in the process. And it could also be that the forces of unbridled pure-profit capitalism will manage to fight this off no matter how much people want it, as they're busy doing with other things in the world (see, for example, the LLM crawler plague). But if companies do fight this off I don't think we're going to enjoy that world very much for multiple reasons, and people's desire for this is still going to very much be there. The days of people's indifference are over and one way or another we're going to have to deal with that. Both our software and our profession will be shaped by how we do.

Doing web things with CGIs is mostly no longer a good idea

By: cks

Recently I saw Serving 200 million requests per day with a cgi-bin (via, and there's a follow-up), which talks about how fast modern CGIs can be in compiled languages like Rust and Go (Rust more so than Go, because Go has a runtime that it has to start every time a Go program is executed). I'm a long standing fan of CGIs (and Wandering Thoughts, this blog, runs as a CGI some of the time), but while I admire these articles, I think that you mostly shouldn't consider trying to actually write a CGI these days.

Where and how CGI programs shine is when they have a simple deployment and development model. You write a little program, you put the little program somewhere, and it just works (and it's not going to be particularly slow these days). The programs run only when they get used, and if you're using Apache, you can also make these little programs run as the user who owns that web area instead of the web server user.

Where CGI programs fall down today is that they're unpopular, no longer well supported in various programming environments and frameworks, and they don't integrate with various other tools because these days the tools expect to operate as HTTP (reverse) proxies in front of your HTTP service (for example, Anubis for anti-crawler protections). It's easy to write, for example, a Go HTTP based web service; you can find lots of examples of how to do it (and the pieces are part of Go's standard library). If you want to write a Go CGI, you're actually in luck because Go put that in the standard library, but you're not going to find anywhere near as many examples and of course you won't get that integration with other HTTP reverse proxy tools. Other languages are not necessarily going to be as friendly as Go (including Python, which has removed the 'cgi' standard library package in 3.13).

(Similarly, many modern web servers are less friendly to CGIs than Apache is and will make you assemble more pieces to run them, reducing a number of the deployment advantages of CGIs.)

Only running these 'backend' HTTP server programs when they're needed is not easy today (although it's possible with systemd), so if you have a lot of little things that you can't bundle together into one server program, CGIs may still make sense despite what is generally the extra hassle of developing and running them. But otherwise, a HTTP based service that you run behind your general purpose web server is what modern web development is steering you toward and it's almost certainly going to be the easiest path.

(There's also a lot of large scale software support for deploying things that are HTTP services, with things like load balancers and smart routing frontends and so on and so forth, never mind containers and orchestration environments. If you want to use CGIs in this environment you basically get to add in a little web server as the way the outside world invokes them.)

Improving my GNU Emacs 'which-key' experience with a Ctrl-h binding

By: cks

One of the GNU Emacs packages that I use is which-key (which is now in GNU Emacs v30). I use it because I don't always remember specific extended key bindings (ones that start with a prefix) that are available to me, especially in MH-E, which effectively has its own set of key bindings for a large collection of commands. Which-key gives me a useful but minimal popup prompt about what's available and I can usually use that to navigate to what I want. Recently I read Omar AntolΓ­n's The case against which-key: a polemic, which didn't convince me but did teach me two useful tricks in one.

The first thing I learned from it, which I could have already known if I was paying attention, was the Ctrl-h keybinding that's available in extended key bindings to get some additional help. In stock GNU Emacs this Ctrl-h information is what I consider only somewhat helpful, and with which-key turned on it's basically not helpful at all from what I can see (which may partly explain why I didn't pay it any attention before).

The second thing is a way to make this Ctrl-h key binding useful in combination with which-key, using embark and in fact I believe a number of my other minibuffer completion things. That is to bind Ctrl-h to an Embark function designed for this:

(setq prefix-help-command #'embark-prefix-help-command)

As illustrated in the article, using Ctrl-h with this binding effectively switches your multi-key entry over to minibuffer completion, complete with all of the completion add-ons you have configured. If you've got things like Vertico, Marginalia, and Orderless configured, this is an excellent way to pick through the available bindings to figure out what you want; with my configuration I get the key itself, the ELisp function name, and the ELisp help summary (and then I can cursor down and up through the list).

The Embark version is too much information if I just need a little reminder of what's possible; that's what the basic which-key display is great for. But if the basic information isn't enough, the Embark binding for Ctrl-h is a great supplement, and it even reaches through multi-key sequences (which is something that which-key doesn't do, at least in my setup, and I have some of them in MH-E).

People still use our old-fashioned Unix login servers

By: cks

Every so often I think about random things, and today's random thing was how our environment might look if it was rebuilt from scratch as a modern style greenfield development. One of the obvious assumptions is that it'd involve a lot of use of containers, which led me to wondering how you handle traditional Unix style login servers. This is a relevant issue for us because we have such traditional login servers and somewhat to our surprise, they still see plenty of use.

We have two sorts of login servers. There's effectively one general purpose login server that people aren't supposed to do heavy duty computation on (and which uses per-user CPU and RAM limits to help with that), and four 'compute' login servers where they can go wild and use up all of the CPUs and memory they can get their hands on (with no guarantees that there will be any, those machines are basically first come, first served; for guaranteed CPUs and RAM people need to use our SLURM cluster). Usage of these servers has declined over time, but they still see a reasonable amount of use, including by people who have only recently joined the department (as graduate students or otherwise).

What people log in to our compute servers to do probably hasn't changed much, at least in one sense; people probably don't log in to a compute server to read their mail with their favorite text mode mail reader (yes, we have Alpine and Mutt users). What people use the general purpose 'application' login server for likely has changed a fair bit over time. It used to be that people logged in to run editors, mail readers, and other text and terminal based programs. However, now a lot of logins seem to be done either to SSH to other machines that aren't accessible from the outside world or to run the back-ends of various development environments like VSCode. Some people still use the general purpose login server for traditional Unix login things (me included), but I think it's rarer these days.

(Another use of both sorts of servers is to run cron jobs; various people have various cron jobs on one or the other of our login servers. We have to carefully preserve them when we reinstall these machines as part of upgrading Ubuntu releases.)

PS: I believe the reason people run IDE backends on our login servers is because they have their code on our fileservers, in their (NFS-mounted) home directories. And in turn I suspect people put the code there partly because they're going to run the code on either or both of our SLURM cluster or the general compute servers. But in general we're not well informed about what people are using our login servers for due to our support model.

The development version of OpenZFS is sometimes dangerous, illustrated

By: cks

I've used OpenZFS on my office and home desktops (on Linux) for what is a long time now, and over that time I've consistently used the development version of OpenZFS, updating to the latest git tip on a regular basis (cf). There have been occasional issues but I've said, and continue to say, that the code that goes into the development version is generally well tested and I usually don't worry too much about it. But I do worry somewhat, and I do things like read every commit message for the development version and I sometimes hold off on updating my version if a particular significant change has recently landed.

But, well, sometimes things go wrong in a development version. As covered in Rob Norris's An (almost) catastrophic OpenZFS bug and the humans that made it (and Rust is here too) (via), there was a recently discovered bug in the development version of OpenZFS that could or would have corrupted RAIDZ vdevs. When I saw the fix commit go by in the development version, I felt extremely lucky that I use mirror vdevs, not raidz, and so avoided being affected by this.

(While I might have detected this at the first scrub after some data was corrupted, the data would have been gone and at a minimum I'd have had to restore it from backups. Which I don't currently have on my home desktop.)

In general this is a pointed reminder that the development version of OpenZFS isn't perfect, no matter how long I and other people have been lucky with it. You might want to think twice before running the development version in order to, for example, get support for the very latest kernels that are used by distributions like Fedora. Perhaps you're better off delaying your kernel upgrades a bit longer and sticking to released branches.

I don't know if this is going to change my practices around running the development version of OpenZFS on my desktops. It may make me more reluctant to update to the very latest version on my home desktop; it would be straightforward to have that run only time-delayed versions of what I've already run through at least one scrub cycle on my office desktop (where I have backups). And I probably won't switch to the next release version when it comes out, partly because of kernel support issues.

What OSes we use here (as of July 2025)

By: cks

About five years ago I wrote an entry on what OSes we were using at the time. Five years is both a short time and a long time here, and in that time some things have changed.

Our primary OS is still Ubuntu LTS; it's our default and we use it on almost everything. On the one hand, these days 'almost everything' covers somewhat more ground than it did in 2020, as some machines have moved from OpenBSD to Ubuntu. On the other hand, as time goes by I'm less and less confident that we'll still be using Ubuntu in five years, because I expect Canonical to start making (more) unfortunate and unacceptable changes any day now. Our most likely replacement Linux is Debian.

CentOS is dead here, killed by a combination of our desire to not have two Linux variants to deal with and CentOS Stream. We got rid of the last of our CentOS machines last year. Conveniently, our previous commercial anti-spam system vendor effectively got out of the business so we didn't have to find a new Unix that they supported.

We're still using OpenBSD, but it's increasingly looking like a legacy OS that's going to be replaced by FreeBSD as we rebuild the various machines that currently run OpenBSD. Our primary interests are better firewall performance and painless mirrored root disks, but if we're going to run some FreeBSD machines and it can do everything OpenBSD can, we'd like to run fewer Unixes so we'll probably replace all of the OpenBSD machines with FreeBSD ones over time. This is a shift in progress and we'll see how far it goes, but I don't expect the number of OpenBSD machines we run to go up any more; instead it's a question of how far down the number goes.

(Our opinions about not using Linux for firewalls haven't changed. We like PF, it's just we like FreeBSD as a host for it more than OpenBSD.)

We continue to not use containers so we don't have to think about a separate, minimal Linux for container images.

There are a lot of research groups here and they run a lot of machines, so research group machines are most likely running a wide assortment of Linuxes and Unixes. We know that Ubuntu (both LTS and non-LTS) is reasonably popular among research groups, but I'm sure there are people with other distributions and probably some use of FreeBSD, OpenBSD, and so on. I believe there may be a few people still using Solaris machines.

(My office desktop continues to run Fedora, but I wouldn't run it on any production server due to the frequent distribution version updates. We don't want to be upgrading distribution versions every six months.)

Overall I'd say we've become a bit more of an Ubuntu LTS monoculture than we were before, but it's not a big change, partly because we were already mostly Ubuntu. Given our views on things like firewalls, we're probably never going to be all-Ubuntu or all-Linux.

(Maybe) understanding how to use systemd-socket-proxyd

By: cks

I recently read systemd has been a complete, utter, unmitigated success (via among other places), where I found a mention of an interesting systemd piece that I'd previously been unaware of, systemd-socket-proxyd. As covered in the article, the major purpose of systemd-socket-proxyd is the bridge between systemd dynamic socket activation and a conventional programs that listens on some socket, so that you can dynamically activate the program when a connection comes in. Unfortunately the systemd-socket-proxyd manual page is a little bit opaque about how it works for this purpose (and what the limitations are). Even though I'm familiar with systemd stuff, I had to think about it for a bit before things clicked.

A systemd socket unit activates the corresponding service unit when a connection comes in on the socket. For simple services that are activated separately for each connection (with 'Accept=yes'), this is actually a templated unit, but if you're using it to activate a regular daemon like sshd (with 'Accept=no') it will be a single .service unit. When systemd activates this unit, it will pass the socket to it either through systemd's native mechanism or an inetd-compatible mechanism using standard input. If your listening program supports either mechanism, you don't need systemd-socket-proxyd and your life is simple. But plenty of interesting programs don't; they expect to start up and bind to their listening socket themselves. To work with these programs, systemd-socket-proxyd accepts a socket (or several) from systemd and then proxies connections on that socket to the socket your program is actually listening to (which will not be the official socket, such as port 80 or 443).

All of this is perfectly fine and straightforward, but the question is, how do we get our real program to be automatically started when a connection comes in and triggers systemd's socket activation? The answer, which isn't explicitly described in the manual page but which appears in the examples, is that we make the socket's .service unit (which will run systemd-socket-proxyd) also depend on the .service unit for our real service with a 'Requires=' and an 'After='. When a connection comes in on the main socket that systemd is doing socket activation for, call it 'fred.socket', systemd will try to activate the corresponding .service unit, 'fred.service'. As it does this, it sees that fred.service depends on 'realthing.service' and must be started after it, so it will start 'realthing.service' first. Your real program will then start, bind to its local socket, and then have systemd-socket-proxyd proxy the first connection to it.

To automatically stop everything when things are idle, you set systemd-socket-proxyd's --exit-idle-time option and also set StopWhenUnneeded=true on your program's real service unit ('realthing.service' here). Then when systemd-socket-proxyd is idle for long enough, it will exit, systemd will notice that the 'fred.service' unit is no longer active, see that there's nothing that needs your real service unit any more, and shut that unit down too, causing your real program to exit.

The obvious limitation of using systemd-socket-proxyd is that your real program no longer knows the actual source of the connection. If you use systemd-socket-proxyd to relay HTTP connections on port 80 to an nginx instance that's activated on demand (as shown in the examples in the systemd-socket-proxyd manual page), that nginx sees and will log all of the connections as local ones. There are usage patterns where this information will be added by something else (for example, a frontend server that is a reverse proxy to a bunch of activated on demand backend servers), but otherwise you're out of luck as far as I know.

Another potential issue is that systemd's idea of when the .service unit for your real program has 'started' and thus it can start running systemd-socket-proxyd may not match when your real program actually gets around to setting up its socket. I don't know if systemd-socket-proxyd will wait and try a bit to cope with the situation where it gets started a bit faster than your real program can get its socket ready.

(Systemd has ways that your real program can signal readiness, but if your program can use these ways it may well also support being passed sockets from systemd as a direct socket activated thing.)

Linux 'exportfs -r' stops on errors (well, problems)

By: cks

Linux's NFS export handling system has a very convenient option where you don't have to put all of your exports into one file, /etc/exports, but can instead write them into a bunch of separate files in /etc/exports.d. This is very convenient for allowing you to manage filesystem exports separately from each other and to add, remove, or modify only a single filesystem's exports. Also, one of the things that exportfs(8) can do is 'reexport' all current exports, synchronizing the system state to what is in /etc/exports and /etc/exports.d; this is 'exportfs -r', and is a handy thing to do after you've done various manipulations of files in /etc/exports.d.

Although it's not documented and not explicit in 'exportfs -v -r' (which will claim to be 'exporting ...' for various things), I have an important safety tip which I discovered today: exportfs does nothing on a re-export if you have any problems in your exports. In particular, if any single file in /etc/exports.d has a problem, no files from /etc/exports.d get processed and no exports are updated.

One potential problem with such files is syntax errors, which is fair enough as a 'problem'. But another problem is that they refer to directories that don't exist, for example because you have lingering exports for a ZFS pool that you've temporarily exported (which deletes the directories that the pool's filesystems may have previously been mounted on). A missing directory is an error even if the exportfs options include 'mountpoint', which only does the export if the directory is a mount point.

When I stubbed my toe on this I was surprised. What I'd vaguely expected was that the error would cause only the particular file in /etc/exports.d to not be processed, and that it wouldn't be a fatal error for the entire process. Exportfs itself prints no notices about this being a fatal problem, and it will happily continue to process other files in /etc/exports.d (as you can see with 'exportfs -v -r' with the right ordering of where the problem file is) and claim to be exporting them.

Oh well, now I know and hopefully it will stick.

Systemd user units, user sessions, and environment variables

By: cks

A variety of things in typical graphical desktop sessions communicate through the use of environment variables; for example, X's $DISPLAY environment variable. Somewhat famously, modern desktops run a lot of things as systemd user units, and it might be nice to do that yourself (cf). When you put these two facts together, you wind up with a question, namely how the environment works in systemd user units and what problems you're going to run into.

The simplest case is using systemd-run to run a user scope unit ('systemd-run --user --scope --'), for example to run a CPU heavy thing with low priority. In this situation, the new scope will inherit your entire current environment and nothing else. As far as I know, there's no way to do this with other sorts of things that systemd-run will start.

Non-scope user units by default inherit their environment from your user "systemd manager". I believe that there is always only a single user manager for all sessions of a particular user, regardless of how you've logged in. When starting things via 'systemd-run', you can selectively pass environment variables from your current environment with 'systemd-run --user -E <var> -E <var> -E ...'. If the variable is unset in your environment but set in the user systemd manager, this will unset it for the new systemd-run started unit. As you can tell, this will get very tedious if you want to pass a lot of variables from your current environment into the new unit.

You can manipulate your user "systemd manager environment block", as systemctl describes it in Environment Commands. In particular, you can export current environment settings to it with 'systemctl --user import-environment VAR VAR2 ...'. If you look at this with 'systemctl --user show-environment', you'll see that your desktop environment has pushed a lot of environment variables into the systemd manager environment block, including things like $DISPLAY (if you're on X). All of these environment variables for X, Wayland, DBus, and so on are probably part of how the assorted user units that are part of your desktop session talk to the display and so on.

You may now see a little problem. What happens if you're logged in with a desktop X session, and then you go elsewhere and SSH in to your machine (maybe with X forwarding) and try to start a graphical program as a systemd user unit? Since you only have a single systemd manager regardless of how many sessions you have, the systemd user unit you started from your SSH session will inherit all of the environment variables that your desktop session set and it will think it has graphics and open up a window on your desktop (which is hopefully locked, and in any case it's not useful to you over SSH). If you import the SSH session's $DISPLAY (or whatever) into the systemd manager's environment, you'll damage your desktop session.

For specific environment variables, you can override or remove them with 'systemd-run --user -E ...' (for example, to override or remove $DISPLAY). But hunting down all of the session environment variables that may trigger undesired effects is up to you, making systemd-run's user scope units by far the easiest way to deal with this.

(I don't know if there's something extra-special about scope units that enables them and only them to be passed your entire environment, or of this is simply a limitation in systemd-run that it doesn't try to implement this for anything else.)

The reason I find all of this regrettable is that it makes putting applications and other session processes into their own units much harder than it should be. Systemd-run's scope units inherit your session environment but can't be detached, so at a minimum you have extra systemd-run processes sticking around (and putting everything into scopes when some of them might be services is unaesthetic). Other units can be detached but don't inherit your environment, requiring assorted contortions to make things work.

PS: Possibly I'm missing something obvious about how to do this correctly, or perhaps there's an existing helper that can be used generically for this purpose.

The easiest way to interact with programs is to run them in terminals

By: cks

I recently wrote about a new little script of mine, which I use to start programs in terminals in a way that I can interact with them (to simplify it). Much of what I start with this tool doesn't need to run in a terminal window at all; the actual program will talk directly to the X server or arrange to talk to my Firefox or the like. I could in theory start them directly from my X session startup script, as I do with other things.

The reason I haven't put these things in my X session startup is that running things in shell sessions in terminal windows is the easiest way to interact with them in all sorts of ways. It's trivial to stop the program or restart it, to look at its output, to rerun it with slightly different arguments if I need to, it automatically inherits various aspects of my current X environment, and so on. You can do all of these things with programs in ways other than using shell sessions in terminals, but it's generally going to be more awkward.

(For instance, on systemd based Linuxes, I could make some of these programs into systemd user services, but I'd still have to use systemd commands to manipulate them. If I run them as standalone programs started from my X session script, it's even more work to stop them, start them again, and so on.)

For well established programs that I expect to never restart or want to look at output from, I'll run them from my X session startup script. But for new programs, like these, they get to spend a while in terminal windows because that's the easiest way. And some will be permanent terminal window occupants because they sometimes produce (text) output.

On the one hand, using terminal windows for this is simple and effective, and I could probably make it better by using a multi-tabbed terminal program, with one tab for each program (or the equivalent in a regular terminal program with screen or tmux). On the other hand, it feels a bit sad that in 2025, our best approach for flexible interaction with a program and monitoring its output is 'put it in a terminal'.

(It's also irritating that with some programs, the easiest and best way to make sure that they really exit when you want them to shut down, rather than "helpfully" lingering on in various ways, is to run them from a terminal and then Ctrl-C them when you're done with them. I have to use a certain video conferencing application that is quite eager to stay running if you tell it to 'quit', and this is my solution to it. Someday I may have to figure out how to put it in a systemd user unit so that it can't stage some sort of great escape into the background.)

Filesystems and the problems of exposing their internal features

By: cks

Modern filesystems often have a variety of sophisticated features that go well beyond standard POSIX style IO, such as transactional journals of (all) changes and storing data in compressed form. For certain usage cases, it could be nice to get direct access to those features; for example, so your web server could potentially directly serve static files in their compressed form, without having the kernel uncompress them and then the web server re-compress them (let's assume we can make all of the details work out in this sort of situation, which isn't a given). But filesystems only very rarely expose this sort of thing to programs, even through private interfaces that don't have to be standardized by the operating system.

One of the reasons for filesystems to not do this is that they don't want to turn what are currently internal filesystem details into an API (it's not quite right to call them only an 'implementation' detail, because often the filesystem has to support the resulting on-disk structures more or less forever). Another issue is that the implementation inside the kernel is often not even written so that the necessarily information could be provided to a user-level program, especially efficiently.

Even when exposing a feature doesn't necessarily require providing programs with internal information from the filesystem, filesystems may not want to make promises to user space about what they do and when they do it. One place this comes up is the periodic request that filesystems like ZFS expose some sort of 'transaction' feature, where the filesystem promises that either all of a certain set of operations are visible or none of them are. Supporting such a feature doesn't just require ZFS or some other filesystem to promise to tell you when all of the things are durably on disk; it also requires the filesystem to not make any of them visible early, despite things like memory pressure or the filesystem's other natural activities.

Sidebar: Filesystem compression versus program compression

When you start looking, how ZFS does compression (and probably how other filesystems do it) is quite different from how programs want to handle compressed data. A program such as a web server needs a compressed stream of data that the recipient can uncompress as a single (streaming) thing, but this is probably not what the filesystem does. To use ZFS as an example of filesystem behavior, ZFS compresses blocks independently and separately (typically in 128 Kbyte blocks), may use different compression schemes for different blocks, and may not compress a block at all. Since ZFS reads and writes blocks independently and has metadata for each of them, this is perfectly fine for it but obviously is somewhat messy for a program to deal with.

Operating system kernels could return multiple values from system calls

By: cks

In yesterday's entry, I talked about how Unix's errno is so limited partly because of how the early Unix kernels didn't return multiple values from system calls. It's worth noting that this isn't a limitation in operating system kernels and typical system call interfaces; instead, it's a limitation imposed by C. If anything, it's natural to return multiple values from system calls.

Typically, system call interfaces use CPU registers because it's much easier (and usually faster) for the kernel to access (user) CPU register values than it is to read or write things from and to user process memory. If you can pass system call arguments in registers, you do so, and similarly for returning results. Most CPU architectures have more than one register that you could put system call results into, so it's generally not particularly hard to say that your OS returns results in the following N CPU registers (quite possibly the registers that are also used for passing arguments).

Using multiple CPU registers for system call return values was even used by Research Unix on the PDP-11, for certain system calls. This is most visible in versions that are old enough to document the PDP-11 assembly versions of system calls; see, for example, the V4 pipe(2) system call, which returns the two ends of the pipe in r0 and r1. Early Unix put errno error codes and non-error results in the same place not because it had no choice but because it was easier that way.

(Because I looked it up, V7 returned a second value in r1 in pipe(), getuid(), getgid(), getpid(), and wait(). All of the other system calls seem to have only used r0; if r1 was unused by a particular call, the generic trap handling code preserved it over the system call.)

I don't know if there's any common operating system today with a system call ABI that routinely returns multiple values, but I suspect not. I also suspect that if you were designing an OS and a system call ABI today and were targeting it for a modern language that directly supported multiple return values, you would probably put multiple return values in your system call ABI. Ideally, including one for an error code, to avoid anything like errno's limitations; in fact it would probably be the first return value, to cope with any system calls that had no ordinary return value and simply returned success or some failure.

What is going on in Unix with errno's limited nature

By: cks

If you read manual pages, such as Linux's errno(3), you'll soon discover an important and peculiar seeming limitation of looking at errno. To quote the Linux version:

The value in errno is significant only when the return value of the call indicated an error (i.e., -1 from most system calls; -1 or NULL from most library functions); a function that succeeds is allowed to change errno. The value of errno is never set to zero by any system call or library function.

This is also more or less what POSIX says in errno, although in standards language that's less clear. All of this is a sign of what has traditionally been going on behind the scenes in Unix.

The classical Unix approach to kernel system calls doesn't return multiple values, for example the regular return value and errno. Instead, Unix kernels have traditionally returned either a success value or the errno value along with an indication of failure, telling them apart in various ways (such as the PDP-11 return method). At the C library level, the simple approach taken in early Unix was that system call wrappers only bothered to set the C level errno if the kernel signaled an error. See, for example, the V7 libc/crt/cerror.s combined with libc/sys/dup.s, where the dup() wrapper only jumps to cerror and sets errno if the kernel signals an error. The system call wrappers could all have explicitly set errno to 0 on success, but they didn't.

The next issue is that various C library calls may make a number of system calls themselves, some of which may fail without the library call itself failing. The classical case is stdio checking to see whether stdout is connected to a terminal and so should be line buffered, which was traditionally implemented by trying to do a terminal-only ioctl() to the file descriptors, which would fail with ENOTTY on non-terminal file descriptors. Even if stdio did a successful write() rather than only buffering your output, the write() system call wrapper wouldn't change the existing ENOTTY errno value from the failed ioctl(). So you can have a fwrite() (or printf() or puts() or other stdio call) that succeeds while 'setting' errno to some value such as ENOTTY.

When ANSI C and POSIX came along, they inherited this existing situation and there wasn't much they could do about it (POSIX was mostly documenting existing practice). I believe that they also wanted to allow a situation where POSIX functions were implemented on top of whatever oddball system calls you wanted to have your library code do, even if they set errno. So the only thing POSIX could really require was the traditional Unix behavior that if something failed and it was documented to set errno on failure, you could then look at errno and have it be meaningful.

(This was what existing Unixes were already mostly doing and specifying it put minimal constraints on any new POSIX environments, including POSIX environments on top of other operating systems.)

(This elaborates on a Fediverse post of mine, and you can run into this in non-C languages that have true multi-value returns under the right circumstances.)

On sysadmins (not) changing (OpenSSL) cipher suite strings

By: cks

Recently I read Apps shouldn’t let users enter OpenSSL cipher-suite strings by Frank Denis (via), which advocates for providing at most a high level interface to people that lets them express intentions like 'forward secrecy is required' or 'I have to comply with FIPS 140-3'. As a system administrator, I've certainly been guilty of not keeping OpenSSL cipher suite strings up to date, so I have a good deal of sympathies for the general view of trusting the clients and the libraries (and also possibly the servers). But at the same time, I think that this approach has some issues. In particular, if you're only going to set generic intents, you have to trust that the programs and libraries have good defaults. Unfortunately, historically time when system administrators have most reached for setting specific OpenSSL cipher suite strings was when something came up all of a sudden and they didn't trust the library or program defaults to be up to date.

The obvious conclusion is that an application or library that wants people to only set high level options needs to commit to agility and fast updates so that it always has good defaults. This needs more than just the upstream developers making prompt updates when issues come up, because in practice a lot of people will get the program or library through their distribution or other packaging mechanism. A library that really wants people to trust it here needs to work with distributions to make sure that this sort of update can rapidly flow through, even for older distribution versions with older versions of the library and so on.

(For obvious reasons, people are generally pretty reluctant to touch TLS libraries and would like to do it as little as possible, leaving it to specialists and even then as much as possible to the upstream. Bad things can and have happened here.)

If I was doing this for a library, I would be tempted to give the library two sets of configuration files. One set, the official public set, would be the high level configuration that system administrators were supposed to use to express high level intents, as covered by Frank Denis. The other set would be internal configuration that expressed all of those low level details about cipher suite preferences, what cipher suites to use when, and so on, and was for use by the library developers and people packaging and distributing the library. The goal is to make it so that emergency cipher changes can be shipped as relatively low risk and easily backported internal configuration file changes, rather than higher risk (and thus slower to update) code changes. In an environment with reproducible binary builds, it'd be ideal if you could rebuild the library package with only the configuration files changed and get library shared objects and so on that were binary identical to the previous versions, so distributions could have quite high confidence in newly-built updates.

(System administrators who opted to edit these second set of files themselves would be on their own. In packaging systems like RPM and Debian .debs, I wouldn't even have these files marked as 'configuration files'.)

How you can wind up trying to allocate zero bytes in C

By: cks

One of the reactions I saw to my entry on malloc(0) being allowed to return NULL is to wonder why you'd ever do a zero-size allocation in the first place. Unfortunately, it's relatively easy to stumble into this situation with simple code in certain sorts of not particularly uncommon situations. The most obvious one is if you're allocating memory for a variable size object, such as a Python tuple or a JSON array. In a simple C implementation these will typically have a fixed struct that contains a pointer to a C memory block with either the actual elements or an array of pointers to them. The natural way to set this up is to write code that winds up calling 'malloc(nelems * sizeof(...))' or something like that, like this:

array_header *
alloc_array(unsigned int nelems)
{
   array_header *h;
   h = malloc(sizeof(array_header))
   if (h == NULL) return NULL;

   /* get space for the element pointers except oops */
   h->data = malloc(nelems * sizeof(void *));
   if (h->data == NULL) {
      free(h);
      return NULL;
   }

   h->nelems = nelems;
   /* maybe some other initialization */

   return h;
}

(As a disclaimer, I haven't tried to compile this C code because I'm lazy, so it may contain mistakes.)

Then someone asks your code to create an empty tuple or JSON array and on some systems, things will explode because nelems will be 0 and you will wind up doing 'malloc(0)' and that malloc() will return NULL, as it's allowed to, and your code will think it's out of memory. You can obviously prevent this from happening, but it requires more code and thus requires you to have thought of the possibility.

(Allocating C strings doesn't have this problem because you always need one byte for the terminating 0 byte, but it can come up with other forms of strings where you track the length explicitly.)

One tricky bit about this code is that it will only ever go wrong in an obvious way on some uncommon systems. On most systems today, 'malloc(0)' returns a non-NULL result, usually because the allocator rounds up the amount of memory you asked for to some minimum size. So you can write this code and have it pass all of your tests on common platforms and then some day someone reports that it fails on, for example, AIX.

(It's possible that modern C linters and checkers will catch this; I'm out of touch with the state of the art there.)

As a side note, if malloc(0) returns anything other than NULL, I believe that each call is required to return a unique pointer (see eg the POSIX description of malloc()). I believe that these unique pointers don't have to point to actual allocated memory; they could point to some large reserved arena and be simply allocated in sequence, with a free() of them effectively doing nothing. But it's probably simpler to have your allocator round the size up and return real allocated memory, since then you don't have handle things like the reserved arena running out of space.

The "personal computer" model scales better than the "terminal" model

By: cks

In an aside in a recent entry, I said that one reason that X terminals faded away is that what I called the "personal computer" model of computing had some pragmatic advantages over the "terminal" model. One of them is that broadly, the personal computer model scales better, even though sometimes it may be more expensive or less capable at any given point in time. But first, let me define my terms. What I mean by the "personal computer" model is one where computing resources are distributed, where everyone is given a computer of some sort and is expected to do much of their work with that computer. What I mean by the "terminal" model is where most computing is done on shared machines, and the objects people have are simply used to access those shared machines.

The terminal model has the advantage that the devices you give each individual person can be cheaper, since they don't need to do as much. It has the potential disadvantage that you need some number of big shared machines for everyone to do their work on, and those machines are often expensive. However, historically, some of the time those big shared servers (plus their terminals) have been less expensive than getting everyone their own computer that was capable enough. So the "terminal" model may win at any fixed point in both time and your capacity needs.

The problem with the terminal model is those big shared resources, which become an expensive choke point. If you want to add some more terminals, you need to also budget for more server capacity. If some of your people turn out to need more power than you initially expected, you're going to need more server capacity. And so on. The problem is that your server capacity generally has to be bought in big, expensive units and increments, a problem that has come up before.

The personal computer model is potentially more expensive up front but it's much easier to scale it, because you buy computer capacity in much smaller units. If you get more people, you get each of them a personal computer. If some of your people need more power, you get them (and just them) more capable, more expensive personal computers. If you're a bit short of budget for hardware updates, you can have some people use their current personal computers for longer. In general, you're free to vary things on a very fine grained level, at the level of individual people.

(Of course you may still have some shared resources, like backups and perhaps shared disk space, but there are relatively fine grained solutions for that too.)

PS: I don't know if big compute is cheaper than a bunch of small compute today, given that we've run into various limits in scaling up things like CPU performance, power and heat limits, and so on. There are "cloud desktop" offerings from various providers, but I'm not sure these are winners based on the hardware economics alone, plus today you'd need something to be the "terminal" as well and that thing is likely to be a capable computer itself, not the modern equivalent of an X terminal..

How history works in the version of the rc shell that I use

By: cks

Broadly, there have been three approaches to command history in Unix shells. In the beginning there was none, which was certainly simple but which led people to be unhappy. Then csh gave us in-memory command history, which could be recalled and edited with shell builtins like '!!' but which lasted only as long as that shell process did. Finally, people started putting 'readline style' interactive command editing into shells, which included some history of past commands that you could get back with cursor-up, and picked up the GNU Readline feature of a $HISTORY file. Broadly speaking, the shell would save the in-memory (readline) history to $HISTORY when it exited and load the in-memory (readline) history from $HISTORY when it started.

I use a reimplementation of rc, the shell created by Tom Duff, and my version of the shell started out with a rather different and more minimal mechanism for history. In the initial release of this rc, all the shell itself did was write every command executed to $history (if that variable was set). Inspecting and reusing commands from a $history file was left up to you, although rc provided a helper program that could be used in a variety of ways. For example, in a terminal window I commonly used '-p' to print the last command and then either copied and pasted it with the mouse or used an rc function I wrote to repeat it directly.

(You didn't have to set $history to the same file in every instance of rc. I arranged to have a per-shell history file that was removed when the shell exited, because I was only interested in short term 'repeat a previous command' usage of history.)

Later, the version of rc that I use got support for GNU Readline and other line editing environments (and I started using it). GNU Readline maintains its own in-memory command history, which is used for things like cursor-up to the previous line. In rc, this in-memory command history is distinct from the $history file history, and things can get confusing if you mix the two (for example, cursor-up to an invocation of your 'repeat the last command' function won't necessarily repeat the command you expect).

It turns out that at least for GNU Readline, the current implementation in rc does the obvious thing; if $history is set when rc starts, the commands from it are read into GNU Readline's in-memory history. This is one half of the traditional $HISTORY behavior. Rc's current GNU Readline code doesn't attempt to save its in-memory history back to $history on exit, because if $history is set the regular rc code has already been recording all of your commands there. Rc otherwise has no shell builtins to manipulate GNU Readline's command history, because GNU Readline and other line editing alternatives are just optional extra features that have relatively minimal hooks into the core of rc.

(In theory this allows thenshell to inject a synthetic command history into rc on startup, but it requires thenshell to know exactly how I handle my per-shell history file.)

Sidebar: How I create per-shell history in this version of rc

The version of rc that I use doesn't have an 'initialization' shell function that runs when the shell is started, but it does support a 'prompt' function that's run just before the prompt is printed. So my prompt function keeps track of the 'expected shell PID' in a variable and compares it to the actual PID. If there's a mismatch (including the variable being unset), the prompt function goes through a per-shell initialization, including setting up my per-shell $history value.

A new little shell script to improve my desktop environment

By: cks

Recently on the Fediverse I posted a puzzle about a little shell script:

A silly little Unix shell thing that I've vaguely wanted for ages but only put together today. See if you can guess what it's for:

#!/bin/sh
trap 'exec $SHELL' 2
"$@"
exec $SHELL

(The use of this is pretty obscure and is due to my eccentric X environment.)

The actual version I now use wound up slightly more complicated, and I call it 'thenshell'. What it does (as suggested by the name) is to run something and then after the thing either exits or is Ctrl-C'd, it runs a shell. This is pointless in normal circumstances but becomes very relevant if you use this as the command for a terminal window to run instead of your shell, as in 'xterm -e thenshell <something>'.

Over time, I've accumulated a number of things I want to run in my eccentric desktop environment, such as my system for opening URLs from remote machines and my alert monitoring. But some of the time I want to stop and restart these (or I need to restart them), and in general I want to notice if they produce some output, so I've been running them in terminal windows. Up until now I've had to manually start a terminal and run these programs each time I restart my desktop environment, which is annoying and sometimes I forget to do it for something. My new 'thenshell' shell script handles this; it runs whatever and then if it's interrupted or exits, starts a shell so I can see things, restart the program, or whatever.

Thenshell isn't quite a perfect duplicate of the manual version. One obvious limitation is that it doesn't put the command into the shell's command history, so I can't just cursor-up and hit return to restart it. But this is a small thing compared to having all of these things automatically started for me.

(Actually, I think I might be able to get this into a version of thenshell that knows exactly how my shell and my environment handle history, but it would be more than a bit of a hack. I may still try it, partly because it would be nifty.)

Current cups-browsed seems to be bad for central CUPS print servers

By: cks

Suppose, not hypothetically, that you have a central CUPS print server, and that people also have Linux desktops or laptops that they point at your print server to print to your printers. As of at least Ubunut 24.04, if you're doing this you probably want to get people to turn off and disable cups-browsed on their machines. If you don't, your central print server may see a constant flood of connections from client machines running cups-browsed. You're probably running it, as I believe that cups-browsed is installed and activated by default these days in most desktop Linux environments.

(We didn't really notice this in prior Ubuntu versions, although it's possible cups-browsed was always doing something like this and what's changed in the Ubuntu 24.04 version is that it's doing it more and faster.)

I'm not entirely sure why this happens, and I'm also not sure what the CUPS requests typically involve, but one pattern that we see is that such clients will make a lot of requests to the CUPS server's /admin/ URL. I'm not sure what's in these requests, because CUPS immediately rejects them as unauthenticated. Another thing we've seen is frequent attempts to get printer attributes for printers that don't exist and that have name patterns that look like local printers. One of the reason that the clients are hitting the /admin/ endpoint may be to somehow add these printers to our CUPS server, which is definitely not going to work.

(We've also seen signs that some Ubuntu 24.04 applications can repeatedly spam the CUPS server, probably with status requests for printers or print jobs. This may be something enabled or encouraged by cups-browsed.)

My impression is that modern Linux desktop software, things like cups-browsed included, is not really spending much time thinking about larger scale, managed Unix environments where there are a bunch of printers (or at least print queues), the 'print server' is not on your local machine and not run by you, anything random you pick up through broadcast on the local network is suspect, and so on. I broadly sympathize with this, because such environments are a small minority now, but it would be nice if client side CUPS software didn't cause problems in them.

(I suspect that cups-browsed and its friends are okay in an environment where either the 'print server' is local or it's operated by you and doesn't require authentication, there's only a few printers, everyone on the local network is friendly and if you see a printer it's definitely okay to use it, and so on. This describes a lot of Linux desktop environments, including my home desktop.)

Tape drives (and robots) versus hard disk drives, and volume

By: cks

In a conversation on the Fediverse, I had some feelings on tapes versus disks:

I wish tape drives and tape robots were cheaper. At work economics made us switch to backups on HDDs, and apart from other issues it quietly bugs me that every one of them bundles in a complex read-write mechanism on top of the (magnetic) storage that's what we really want.

(But those complex read/write mechanisms are remarkably inexpensive due to massive volume, while the corresponding tape drive+robot read/write mechanism is ... not.)

(I've written up our backup system, also.)

As you can read in many places, hard drives are mechanical marvels, made to extremely fine tolerances, high performance targets, and startlingly long lifetimes (all things considered). And you can get them for really quite low prices.

At a conceptual level, an LTO tape system is the storage medium (the LTO tape) separated from the read/write head and the motors (the tape drive). When you compare this to hard drives, you get to build and buy the 'tape drive' portion only once, instead of including a copy in each instance of the storage medium (the tapes). In theory this should make the whole collection a lot cheaper. In practice it only does so once you have a quite large number of tapes, because the cost of tape drives (and tape robots to move tapes in and out of the drives) is really quite high (and has been for a relatively long time).

There are probably technology challenges and complexities that come with the tape drive operating in an unsealed and less well controlled environment than hard disk mechanisms. But it's hard to avoid the assumption that a lot of the price difference has to do with the vast difference in volume. We make hard drives and thus all of their components in high volume, and have for decades, so there's been a lot of effort spent on making them inexpensively and in bulk. Tape drives are a specialty item with far lower production volumes and are sold to much less price sensitive buyers (as compared to consumer level hard drives, which have a lot of parts in common with 'enterprise' HDDs).

I understand all of this but it still bugs me a bit. It's perfectly understandable but inelegant.

Some notes on X terminals in their heyday

By: cks

I recently wrote about how the X Window System didn't immediately have (thin client) X terminals. X terminals are now a relatively obscure part of history and it may not be obvious to people today why they were a relatively significant deal at the time. So today I'm going to add some additional notes about X terminals in their heyday, from their introduction around 1989 through the mid 1990s.

One of the reactions to my entry that I've seen is to wonder if there was much point to X terminals, since it seems like they should be close to much more functional normal computers and all you'd save is perhaps storage. Practically this wasn't the case in 1989 when they were introduced; NCD's initial models cost substantially less than, say, a Sparcstation 1 (also introduced in 1989), it appears less than half the cost of even a diskless Sparcstation 1. I believe that one reason for this is that memory was comparatively more expensive in those days and X terminals could get away with much, much less of it, since they didn't need to run a Unix kernel and enough of a Unix user space to boot up the X server (and I believe that some or all of the software was run directly from ROM instead of being loaded into precious RAM).

(The NCD16 apparently started at 1 MByte of RAM and the NCD19 at 2 MBytes, for example. You could apparently get a Sparcstation 1 with that little memory but you probably didn't want to use it for much.)

In one sense, early PCs were competition for X terminals in that they put computation on people's desks, but in another sense they weren't, because you couldn't use them as an inexpensive way to get Unix on people's desks. There eventually was at least one piece of software for this, DESQview/X, but it appeared later and you'd have needed to also buy the PC to run it on, as well as a 'high resolution' black and white display card and monitor. Of course, eventually the march of PCs made all of that cheap, which was part of the diminishing interest in X terminals in the later part of the 1990s and onward.

(I suspect that one reason that X terminals had lower hardware costs was that they probably had what today we would call a 'unified memory system', where the framebuffer's RAM was regular RAM instead of having to be separate because it came on a separate physical card.)

You might wonder how well X terminals worked over the 10 MBit Ethernet that was all you had at the time. With the right programs it could work pretty well, because the original approach of X was that you sent drawing commands to the X server, not rendered bitmaps. If you were using things that could send simple, compact rendering commands to your X terminal, such as xterm, 10M Ethernet could be perfectly okay. Anything that required shipping bitmapped graphics could be not as impressive, or even not something you'd want to touch, but for what you typically used monochrome X for between 1989 and 1995 or so, this was generally okay.

(Today many things on X want to ship bitmaps around, even for things like displaying text. But back in the day text was shipped as, well, text, and it was the X server that rendered the fonts.)

When looking at the servers you'd need for a given number of diskless Unix workstations or X terminals, the X terminals required less server side disk space but potentially more server side memory and CPU capacity, and were easier to administer. As noted by some commentators here, you might also save on commercial software licensing costs if you could license it only for your few servers instead of your lots of Unix workstations. I don't know how the system administration load actually compared to a similar number of PCs or Macs, but in my Unix circles we thought we scaled much better and could much more easily support many seats (and many potential users if you had, for example, many more students than lab desktops).

My perception is that what killed off X terminals as particularly attractive, even for Unix places, was that on the one hand the extra hardware capabilities PCs needed over X terminals kept getting cheaper and cheaper and on the other hand people started demanding more features and performance, like decent colour displays. That brought the X terminal 'advantage' more or less down to easier administration, and in the end that wasn't enough (although some X terminals and X 'thin client' setups clung on quite late, eg the SunRay, which we had some of in the 2000s).

Of course that's a Unix centric view. In a larger view, Unix was displaced on the desktop by PCs, which naturally limited the demand for both X terminals and dedicated Unix workstations (which were significantly marketed toward the basic end of Unix performance, and see also). By no later than the end of the 1990s, PCs were better basic Unix workstations than the other options and you could use them to run other software too if you wanted to, so they mostly ran over everything else even in the remaining holdouts.

(We ran what were effectively X terminals quite late, but the last few generations were basic PCs running LTSP not dedicated hardware. All our Sun Rays got retired well before the LTSP machines.)

(I think that the 'personal computer' model has or at least had some significant pragmatic advantages over the 'terminal' model, but that's something for another entry.)

Some bits on malloc(0) in C being allowed to return NULL

By: cks

One of the little traps in standard C and POSIX is that malloc(0) is allowed to return NULL instead of a pointer. This makes people unhappy for various reasons. Today I wound up reading 017. malloc(0) & realloc(…, 0) β‰  0, which runs through a whole collection of Unix malloc() versions and finds almost none of them which return NULL on malloc(0) except for some Unix System V releases that ship with an optional 'fast' malloc library that does return NULL on zero-sized allocations. Then AT&T wrote the System V Interface Definition and requires this 'fast malloc' behavior, except that actual System V releases (probably) didn't behave this way unless you explicitly used the fast malloc instead of the standard one.

(Apparently AIX may behave this way, eg, and it's old enough to have influenced POSIX and C. But I suspect that AIX got this behavior by making the System V fast malloc their only malloc, possibly when the SVID nominally required this behavior. AIX may have wound up weird but IBM didn't write it from scratch.)

When I read all of this today and considered what POSIX had done, one of my thoughts was about non-Unix C compilers (partly because I'd recently heard about the historical Whitesmiths C compiler source code being released). C was standardized at a time when C was being increasingly heavily used on various personal computers, including in environments that were somewhat hostile to it, and also other non-Unix environments. These C implementations used their own standard libraries, including malloc(), so maybe they had adopted the NULL return behavior.

As far as I can tell, Whitesmiths' malloc() doesn't have this behavior (also). However, I did find this in the MS-DOS version of Manx Aztec C, or at least it's in version 5.2a; the two earlier versions also available have a simpler malloc() that always rounds up, like the Whitesmiths malloc(). My memory is that you could get the Manx Aztec C compiler for the Amiga with library source, but I'm not particularly good at poking around the Amiga image available so I was unable to spot it if it's included in that version, and I haven't looked at the other Aztec C versions.

(I wouldn't be surprised if a number of 1980s non-Unix C compilers had this behavior, but I don't know where to find good information on this. If someone has written a comprehensive history page on malloc(0) that covers non-Unix C compilers, I haven't found it.)

On systems with small amounts of memory, one reason to specifically make your malloc() return NULL for 0-sized allocations is to reduce memory usage if someone makes a number of such allocations through some general code path that deals with variable-sized objects. Otherwise you'd have to consume some minimum amount of memory even for these useless allocations.

PS: Minix version 1 also rounds up the size of malloc(0).

(Yes, I got nerd-sniped by this and my own curiosity.)

Compute GPUs can have odd failures under Linux (still)

By: cks

Back in the early days of GPU computation, the hardware, drivers, and software were so relatively untrustworthy that our early GPU machines had to be specifically reserved by people and that reservation gave them the ability to remotely power cycle the machine to recover it (this was in the days before our SLURM cluster). Things have gotten much better since then, with things like hardware and driver changes so that programs with bugs couldn't hard-lock the GPU hardware. But every so often we run into odd failures where something funny is going on that we don't understand.

We have one particular SLURM GPU node that has been flaky for a while, with the specific issue being that every so often the NVIDIA GPU would throw up its hands and drop off the PCIe bus until we rebooted the system. This didn't happen every time it was used, or with any consistent pattern, although some people's jobs seemed to regularly trigger this behavior. Recently I dug up a simple to use GPU stress test program, and when this machine's GPU did its disappearing act this Saturday, I grabbed the machine, rebooted it, ran the stress test program, and promptly had the GPU disappear again. Success, I thought, and since it was Saturday, I stopped there, planning to repeat this process today (Monday) at work, while doing various monitoring things.

Since I'm writing a Wandering Thoughts entry about it, you can probably guess the punchline. Nothing has changed on this machine since Saturday, but all today the GPU stress test program could not make the GPU disappear. Not with the same basic usage I'd used Saturday, and not with a different usage that took the GPU to full power draw and a reported temperature of 80C (which was a higher temperature and power draw than the GPU had been at when it disappeared, based on our Prometheus metrics). If I'd been unable to reproduce the failure at all with the GPU stress program, that would have been one thing, but reproducing it once and then not again is just irritating.

(The machine is an assembled from parts one, with an RTX 4090 and a Ryzen Threadripper 1950X in an X399 Taichi motherboard that is probably not even vaguely running the latest BIOS, seeing as the base hardware was built many years ago, although the GPU has been swapped around since then. Everything is in a pretty roomy 4U case, but if the failure was consistent we'd have assumed cooling issues.)

I don't really have any theories for what could be going on, but I suppose I should try to find a GPU stress test program that exercises every last corner of the GPU's capabilities at full power rather than using only one or two parts at a time. On CPUs, different loads light up different functional units, and I assume the same is true on GPUs, so perhaps the problem is in one specific functional unit or a combination of them.

(Although this doesn't explain why the GPU stress test program was able to cause the problem on Saturday but not today, unless a full reboot didn't completely clear out the GPU's state. Possibly we should physically power this machine off entirely for long enough to dissipate any lingering things.)

The X Window System didn't immediately have X terminals

By: cks

For a while, X terminals were a reasonably popular way to give people comparatively inexpensive X desktops. These X terminals relied on X's network transparency so that only the X server had to run on the X terminal itself, with all of your terminal windows and other programs running on a server somewhere and just displaying on the X terminal. For a long time, using a big server and a lab full of X terminals was significantly cheaper than setting up a lab full of actual workstations (until inexpensive and capable PCs showed up). Given that X started with network transparency and X terminals are so obvious, you might be surprised to find out that X didn't start with them.

In the early days, X ran on workstations. Some of them were diskless workstations, and on some of them (especially the diskless ones), you would log in to a server somewhere to do a lot of your more heavy duty work. But they were full workstations, with a full local Unix environment and you expected to run your window manager and other programs locally even if you did your real work on servers. Although probably some people who had underpowered workstations sitting around experimented with only running the X server locally, with everything else done remotely (except perhaps the window manager).

The first X terminals arrived only once X was reasonably well established as the successful cross-vendor Unix windowing system. NCD, who I suspect were among the first people to make an X terminal, was founded only in 1987 and of course didn't immediately ship a product (it may have shipped its first product in 1989). One indication of the delay in X terminals is that XDM was only released with X11R3, in October of 1988. You technically didn't need XDM to have an X terminal, but it made life much easier, so its late arrival is a sign that X terminals didn't arrive much before then.

(It's quite possible that the possibility for an 'X terminal' was on people's minds even in the early days of X. The Bell Labs Blit was a 'graphical terminal' that had papers written and published about it sometime in 1983 or 1984, and the Blit was definitely known in various universities and so on. Bell Labs even gave people a few of them, which is part of how I wound up using one for a while. Sadly I'm not sure what happened to it in the end, although by now it would probably be a historical artifact.)

(This entry was prompted by a comment on a recent entry of mine.)

PS: A number of people seem to have introduced X terminals in 1989; I didn't spot any in 1988 or earlier.

Sidebar: Using an X terminal without XDM

If you didn't have XDM available or didn't want to have to rely on it, you could give your X terminal the ability to open up a local terminal window that ran a telnet client. To start up an X environment, people would telnet into their local server, set $DISPLAY (or have it automatically set by the site's login scripts), and start at least their window manager by hand. This required your X terminal to not use any access control (at least when you were doing the telnet thing), but strong access control wasn't exactly an X terminal feature in the first place.

My pragmatic view on virtual screens versus window groups

By: cks

I recently read z3bra's 2014 Avoid workspaces (via) which starts out with the tag "Virtual desktops considered harmful". At one level I don't disagree with z3bra's conclusion that you probably want flexible groupings of windows, and I also (mostly) don't use single-purpose virtual screens. But I do it another way, which I think is easier than z3bra's (2014) approach.

I've written about how I use virtual screens in my desktop environment, although a bit of that is now out of date. The short summary is that I mostly have a main virtual screen and then 'overflow' virtual screens where I move to if I need to do something else without cleaning up the main virtual screen (as a system administrator, I can be quite interrupt-driven or working on more than one thing at once). This sounds a lot like window groups, and I'm sure I could do it with them in another window manager. The advantage to me of fvwm's virtual screens is that it's very easy to move windows from one to another.

If I start a window in one virtual screen, for what I think is going to be one purpose, and it turns out that I need it for another purpose too, on another virtual screen, I don't have to fiddle around with, say, adding or changing its tags. Instead I can simply grab it and move it to the new virtual screen (or, for terminal windows and some others, iconify them on one screen, switch screens, and deiconify them). This makes it fast, fluid, and convenient to shuffle things around, especially for windows where I can do this by iconifying and deiconify them.

This is somewhat specific to (fvwm's idea of) virtual screens, where the screens have a spatial relationship to each other and you can grab windows and move them around to change their virtual screen (either directly or through FvwmPager). In particular, I don't have to switch between virtual screens to drag a window on to my current one; I can grab it in a couple of ways and yank it to where I am now.

In other words, it's the direct manipulation of window grouping that makes this work so nicely. Unfortunately I'm not sure how to get direct manipulation of currently not visible windows without something like virtual screens or virtual desktops. You could have a 'show all windows' feature, but that still requires bouncing between that all-windows view (to tag in new windows) and your regular view. Maybe that would work fluidly enough, especially with today's fast graphics.

Quick numbers on how common HTTP/2 is on our departmental web server

By: cks

Our general purpose departmental web server has supported HTTP/2 for a while. When we added HTTP/2 support it was basically because it was there; HTTP/2 was the new and shiny thing, our Apache configuration could support it, and so it seemed like a friendly gesture to turn HTTP/2 on. Until now, I've never looked at the statistics for how many HTTP requests use HTTP/2 and how many use other HTTP versions.

Our general purpose web server supports both HTTP access and HTTPS access, unless people opt to forcefully redirect their own pages from one to the other (we have plenty of old pages with mixed content problems, so we can't do such a redirection globally). However, these days that may not be much of an issue and browsers may force HTTPS on the initial connection, which will succeed with our server. I mention all of this because unfortunately our logs don't let me see how many requests are HTTP versus HTTPS. In some environments I could assume that all HTTP/2.0 requests were HTTPS, but the standard Ubuntu Apache HTTP/2 configuration enables h2c so I believe we can do HTTP/2.0 over HTTP connections without any sign of this in our current logs.

The overall number is that about 55% of the requests are HTTP/2.0 and all but a tiny trace of the remaining 45% are HTTP/1.1. However, this isn't uniform. For instance, we've somehow become a load bearing source of commonly used ML training data, and requests for this data are about 70% HTTP/2.0. Meanwhile, a URL hierarchy that maps to our anonymous FTP area sees much less activity, probably much of it from automated crawlers, and only 21% of the requests were HTTP/2.0.

If I look at the claimed User-Agents for HTTP/1.1 requests, some things jump out. A lot of requests come from 'pytorch/vision', along with 'GoogleOther', GPTBot, something claiming to be Chrome 83, PetalBot, Applebot, no User-Agent at all, Scrapy, and a whole menagerie of other crawlers. Actual probably authentic browser user agent values are mostly absent, which isn't a really big surprise since I think browsers aggressively do HTTP/2.0 these days.

(A lot of those 'pytorch/vision' requests were for that commonly used ML training data, but they seem to have been dwarfed by the HTTP/2.0 requests from browsers.)

Given even this cursory log analysis, I suspect that for our web server, HTTP/1.1 requests are significantly correlated with access from non-browsers, including crawlers (both overt and covert). Again this isn't really a surprise if modern browsers are trying to use HTTP/2 as much as possible, since most people are running modern browsers (especially Chrome).

What I've observed about Linux kernel WireGuard on 10G Ethernet so far

By: cks

I wrote about a performance mystery with WireGuard on 10G Ethernet, and since then I've done additional measurements with results that both give some clarity and leave me scratching my head a bit more. So here is what I know about the general performance characteristics of Linux kernel WireGuard on a mixture of Ubuntu 22.04 and 24.04 servers with stock settings, and using TCP streams inside the WireGuard tunnels (because the high bandwidth thing we care about runs over TCP).

  • CPU performance is important even when WireGuard isn't saturating the CPU.

  • CPU performance seems to be more important on the receiving side than on the sending side. If you have two machines, one faster than the other, you get more bandwidth sending a TCP stream from the slower machine to the faster one. I don't know if this is an artifact of the Linux kernel implementation or if the WireGuard protocol requires the receiver to do more work than the sender.

  • There seems to be a single-peer bandwidth limit (related to CPU speeds). You can increase the total WireGuard bandwidth of a given server by talking to more than one peer.

  • When talking to a single peer, there's both a unidirectional bandwidth limit and a bidirectional bandwidth limit. If you send and receive to a single peer at once, you don't get the sum of the unidirectional send and unidirectional receive; you get less.

  • There's probably also a total WireGuard bandwidth that, in our environment, falls short of 10G bandwidth (ie, a server talking WireGuard to multiple peers can't saturate its 10G connection, although maybe it could if I had enough peers in my test setup).

The best performance between a pair of WireGuard peers I've gotten is from two servers with Xeon E-2226G CPUs; these can push their 10G Ethernet to about 850 MBytes/sec of WireGuard bandwidth in one direction and about 630 MBytes/sec in each direction if they're both sending and receiving. These servers (and other servers with slower CPUs) can basically saturate their 10G-T network links with plain (non-WireGuard) TCP.

If I was to build a high performance 'WireGuard gateway' today, I'd build it with a fast CPU and dual 10G networks, with WireGuard traffic coming in (and going out) one 10G interface and the resulting gatewayed traffic using the other. WireGuard on fast CPUs can run fast enough that a single 10G interface could limit total bandwidth under the right (or wrong) circumstances; segmenting WireGuard and clear traffic onto different interfaces avoids that.

(A WireGuard gateway that only served clients at 1G or less would likely be perfectly fine with a single 10G interface and reasonably fast CPUs. But I'd want to test how many 1G clients it took to reach the total WireGuard bandwidth limit on a 10G WireGuard server before I was completely confident about that.)

I feel open source has turned into two worlds

By: cks

One piece of open source news of the time interval is that the sole maintainer of libxml2 will no longer be treating security issues any differently than bugs (also, via Fediverse discussions). In my circles, the reaction to this has generally been positive, and it's seen as an early sign of more of this to come, as more open source maintainers revolt. I have various thoughts on this, but in light of what I wrote about open source moral obligation and popularity, one thing this incident has crystallized for me is that I draw an increasingly sharp distinction between corporate use of open source software and people's cooperative use of it.

Obvious general examples of the latter are the Debian Linux distribution and BSD distributions like OpenBSD and FreeBSD. These are independent open source projects that are maintained by volunteers (although some of them are paid to work on the project). Everyone is working together in cooperation and the result is no one's product or owned object. And at the small scale, everyone who incorporates libxml2, some Python module, or whatever into a personal open source thing is part of this cooperative sphere.

(Ubuntu is not, because Ubuntu is Canonical's. Fedora is probably not really, for all that it has volunteers working on it; it lives and dies at Red Hat's whim, and Red Hat has already demonstrated with CentOS that that whim can change drastically.)

Corporate use of open source software is things like corporations deciding to make libxml2 a security sensitive, load bearing part of their products. Yes, the license allows them to do that and allows them to not support libxml2, but I feel that it's qualitatively different that the personal cooperative sphere of open source, and as a result the social rules are different. You might not want to leave Debian (which is fundamentally people) in the lurch over a security issue, but if a corporation shows up with a security issue, well, you tap the sign. They're not in open source as a cooperative venture, they are using it to make money. Corporations are not like people, even if they employ people who make 'people open source' noises.

Existing open source licenses, practices, and culture don't draw this distinction (and it would be hard to for licenses), but I think we're going to see an increasing amount of it in the future. Corporate use of open source under the current regime is an increasingly bad deal for the open source people involved, so I don't think the current situation is sustainable. Even if licenses don't change, everything else can.

(See also 'software supply chain security', especially "I am not a supplier".)

A performance mystery with Linux WireGuard on 10G Ethernet

By: cks

As a followup on discovering that WireGuard can saturate a 1G Ethernet (on Linux), I set up WireGuard on some slower servers here that have 10G networking. This isn't an ideal test but it's more representative of what we would see with our actual fileservers, since I used spare fileserver hardware. What I got out of it was a performance and CPU usage mystery.

What I expected to see was that WireGuard performance would top out at some level above 1G as the slower CPUs on both the sending and the receiving host ran into their limits, and I definitely wouldn't see them drive the network as fast as they could without WireGuard. What I actually saw was that WireGuard did hit a speed limit but the CPU usage didn't seem to saturate, either for kernel WireGuard processing or for the iperf3 process. These machines can manage to come relatively close to 10G bandwidth with bare TCP, while with WireGuard they were running around 400 MBytes/sec of on the wire bandwidth (which translates to somewhat less inside the WireGuard connection, due to overheads).

One possible explanation for this is increased packet handling latency, where the introduction of WireGuard adds delays that keep things from running at full speed. Another possible explanation is that I'm running into CPU limits that aren't obvious from simple tools like top and htop. One interesting thing is that if I do a test in both directions at once (either an iperf3 bidirectional test or two iperf3 sessions, one in each direction), the bandwidth in each direction is slightly over half the unidirectional bandwidth (while a bidirectional test without WireGuard runs at full speed in both directions at once). This certainly makes it look like there's a total WireGuard bandwidth limit in these servers somewhere; unidirectional traffic gets basically all of it, while bidirectional traffic splits it fairly between each direction.

I looked at 'perf top' on the receiving 10G machine and kernel spin lock stuff seems to come in surprisingly high. I tried having a 1G test machine also send WireGuard traffic to the receiving 10G test machine at the same time and the incoming bandwidth does go up by about 100 Mbytes/sec, so perhaps on these servers I'm running into a single-peer bandwidth limitation. I can probably arrange to test this tomorrow.

(I can't usefully try both of my 1G WireGuard test machines at once because they're both connected to the same 1G switch, with a 1G uplink into our 10G switch fabric.)

PS: The two 10G servers are running Ubuntu 24.04 and Ubuntu 22.04 respectively with standard kernels; the faster server with more CPUs was the 'receiving' server here, and is running 24.04. The two 1G test servers are running Ubuntu 24.04.

Linux kernel WireGuard can go 'fast' on decent hardware

By: cks

I'm used to thinking of encryption as a slow thing that can't deliver anywhere near to network saturation, even on basic gigabit Ethernet connections. This is broadly the experience we see with our current VPN servers, which struggle to turn in more than relatively anemic bandwidth with OpenVPN and L2TP, and so for a long time I assumed it would also be our experience with WireGuard if we tried to put anything serious behind it. I'd seen the 2023 Tailscale blog post about this but discounted it as something we were unlikely to see; as their kernel throughput on powerful sounding AWS nodes was anemic by 10G standards, so I assumed our likely less powerful servers wouldn't even get 1G rates.

Today, for reasons beyond the scope of this entry, I wound up wondering how fast we could make WireGuard go. So I grabbed a couple of spare servers we had with reasonably modern CPUs (by our limited standards), put our standard Ubuntu 24.04 on them, and took a quick look to see how fast I could make them go over 1G networking. To my surprise, the answer is that WireGuard can saturate that 1G network with no particularly special tuning, and the system CPU usage is relatively low (4.5% on the client iperf3 side, 8% on the server iperf3 side; each server has a single Xeon E-2226G). The low usage suggests that we could push well over 1G of WireGuard bandwidth through a 10G link, which means that I'm going to set one up for testing at some point.

While the Xeon E-2226G is not a particularly impressive CPU, it's better than the CPUs our NFS fileservers have (the current hardware has Xeon Silver 4410Ys). But I suspect that we could sustain over 1G of WireGuard bandwidth even on them, if we wanted to terminate WireGuard on the fileservers instead of on a 'gateway' machine with a fast CPU (and a 10G link).

More broadly, I probably need to reset my assumptions about the relative speed of encryption as compared to network speeds. These days I suspect a lot of encryption methods can saturate a 1G network link, at least in theory, since I don't think WireGuard is exceptionally good in this respect (as I understand it, encryption speed wasn't particularly a design goal; it was designed to be secure first). Actual implementations may vary for various reasons so perhaps our VPN servers need some tuneups.

(The actual bandwidth achieved inside WireGuard is less than the 1G data rate because simply being encrypted adds some overhead. This is also something I'm going to have to remember when doing future testing; if I want to see how fast WireGuard is driving the underlying networking, I should look at the underlying networking data rate, not necessarily WireGuard's rate.)

My views on the choice of name for SMTP senders to use in TLS SNI

By: cks

TLS SNI (Server Name Indication) is something that a significant minority of sending mail servers use when they do TLS with SMTP. One of the reasons that it's not used more generally is apparently that there's confusion about what TLS SNI name to use. Based on our logs, in practice essentially everyone using TLS SNI uses the MX target name as the SNI name; if something is MX'd to 'inbound.example.org', then sending mailers with send the SNI name 'inbound.example.org'.

One other option for the TLS SNI name is the domain name of the recipient. Using the TLS SNI of the recipient domain would let SMTP frontends route the connection to an appropriate backend in still encrypted form, although you'd have to use a custom protocol. If you then required a matching name in the server TLS certificate, you'd also have assurance that you were delivering the email to a mail server that should handle that domain's mail, rather than having someone intercept your DNS MX query and provide their own server as the place to send 'example.org' mail. However, this has some problems.

First, it means that a sending mailer couldn't aggregate email messages to multiple target domains all hosted by the same MX target into a single connection. I suspect that this isn't a big issue and most email isn't aggregated this way in the first place. More importantly, if the receiving server's TLS certificate had to match the SNI it received, you would need to equip your inbound mail servers with a collection of potentially high value TLS certificates for your bare domain names. The 'inbound.example.org' server would need a TLS server certificate for 'example.org' (and maybe 'example.net' if you had both and handled both in the same inbound server). In the current state of TLS, I don't believe this TLS certificate could be scoped down so that it couldn't be used for HTTPS.

This would also be troublesome for outsourcing your email handling to someone. If you outsourced your email to example.org, you would have to give them a TLS certificate (or a series of TLS certificates) for your bare domain, so they could handle email for it with the right TLS certificate.

Sending the target MX name as the TLS SNI name is less useful for some things and is more prone to DNS interception and other ways to tamper with DNS results (assuming there's no DNSSEC in action). But it has the virtue that it's simple to implement on both sides and you don't have to give your inbound mail server access to potentially sensitive TLS certificates. It can have a TLS certificate just for its own name, where the only thing this can really be used for is mail just because that's the only thing the server does. And of course it works well with outsourced mail handling, because the MX target server only needs a TLS certificate for its own name, which its organization should be able to easily provide.

Arguably part of the confusion here is an uncertainly over what using TLS SNI during SMTP STARTTLS is supposed to achieve and what security properties we want from the resulting TLS connection. In HTTPS TLS, using SNI has a clear purpose; it lets you handle multiple websites, each with their own TLS certificate, on a single IP, and you want the TLS certificate to be valid for the TLS SNI name. I don't think we have any such clarity for TLS SNI and general server TLS certificate validation in SMTP. For example, are we using TLS certificates to prove that the SMTP server you're talking to is who you think it is (the MX target name), or that it's authorized to handle email for a particular domain (the domain name of the destination)?

(This elaborates on a Fediverse post in a discussion of this area.)

Revisiting ZFS's ZIL, separate log devices, and writes

By: cks

Many years ago I wrote a couple of entries about ZFS's ZIL optimizations for writes and then an update for separate log devices. In completely unsurprising news, OpenZFS's behavior has changed since then and gotten simpler. The basic background for this entry is the flow of activity in the ZIL (ZFS Intent Log).

When you write data to a ZFS filesystem, your write will be classified as 'indirect', 'copied', or 'needcopy'. A 'copied' write is immediately put into the in-memory ZIL even before the ZIL is flushed to disk, a 'needcopy' write will be put into the in-memory ZIL if a (filesystem) sync() or fsync() happens and then written to disk as part of the ZIL flush, and an 'indirect' write will always be written to its final place in the filesystem even if the ZIL is flushed to disk, with the ZIL just containing a pointer to the regular location (although at that point the ZIL flush depends on those regular writes). ZFS keeps metrics on how much you have of all of these, and they're potentially relevant in various situations.

As of the current development version of OpenZFS (and I believe for some time in released versions), how writes are classified is like this, in order:

  1. If you have 'logbias=throughput' set or the write is an O_DIRECT write, it is an indirect write.
  2. If you don't have a separate log device and the write is equal to or larger than zfs_immediate_write_sz (32 KBytes by default), it is an indirect write.

  3. If this is a synchronous write, it is a 'copied' write, including if your filesystem has 'sync=always' set.

  4. Otherwise it's a 'needcopy' write.

If your system is doing normal IO (well, normal writes) and you don't have a separate log device, large writes are indirect writes and small writes are 'needcopy' writes. This keeps both of them out of the in-memory ZIL. However, on our systems I see a certain volume of 'copied' writes, suggesting that some programs or ZFS operations force synchronous writes. This seems to be especially common on our ZFS based NFS fileservers, but it happens to some degree even on the ZFS fileserver that mostly does local IO.

The corollary to this is that if you do have a separate log device and you don't do O_DIRECT writes (and don't set logbias=throughput), all of your writes will go to your log device during ZIL flushes, because they'll fall through the first two cases and into case three or four. If you have a sufficiently high write volume combined with ZIL flushes, this may increase the size of a separate log device that you want and also make you want one that has a high write bandwidth (and can commit things to durable storage rapidly).

(We don't use any separate log devices for various reasons and I don't have well informed views of when you should use them and what sort of device you should use.)

Once upon a time (when I wrote my old entry), there was a zil_slog_limit tunable that pushed some writes back to being indirect writes even if you had a separate log device, under somewhat complex circumstances. That was apparently removed in 2017 and was partly not working even before then (also).

Will (more) powerful discrete GPUs become required in practice in PCs?

By: cks

One of the slow discussions I'm involved in over on the Fediverse started with someone wondering what modern GPU to get to run Linux on Wayland (the current answer is said to be an Intel Arc B580, if you have a modern distribution version). I'm a bit interested in this question but not very much, because I've traditionally considered big discrete GPU cards to be vast overkill for my needs. I use an old, text-focused type of X environment and I don't play games, so apart from needing to drive big displays at 60Hz (or maybe someday better than that), it's been a long time since I needed to care about how powerful my graphics was. These days I use 'onboard' graphics whenever possible, which is to say the modest GPU that Intel and AMD now integrate on many CPU models.

(My office desktop has more or less the lowest end discrete AMD GPU with suitable dual outputs that we could find at the time because my CPU didn't have onboard graphics. My current home desktop uses what is now rather old onboard Intel graphics.)

However, graphics aren't the only thing you can do with GPUs these days (and they haven't been for some time). Increasingly, people do a lot of GPU computing (and not just for LLMs; darktable can use your GPU for image processing on digital photographs). In the old days, this GPU processing was basically not worth even trying on your typical onboard GPU (darktable basically laughed at my onboard Intel graphics), and my impression is that's still mostly the case if you want to do serious work. If you're serious, you want a lot of GPU memory, a lot of GPU processing units, and so on, and you only really get that on dedicated discrete GPUs.

You'll probably always be able to use a desktop for straightforward basic things with only onboard graphics (if only because of laptop systems that have price, power, and thermal limits that don't allow for powerful, power-hungry, and hot GPUs). But that doesn't necessarily mean that it will be practical to be a programmer or system administrator without a discrete GPU that can do serious computing, or at least that you'll enjoy it very much. I can imagine a future where your choices are to have a desktop with a good discrete GPU so that you can do necessary (GPU) computation, bulk analysis, and so on locally, or to remote off to some GPU-equipped machine to do the compute-intensive side of your work.

(An alternate version of this future is that CPU vendors stuff more and more GPU compute capacity into CPUs and the routine GPU computation keeps itself to within what the onboard GPU compute units can deliver. After all, we're already seeing CPU vendors include dedicated GPU computation capacity that's not intended for graphics.)

Even if discrete GPUs don't become outright required, it's possible that they'll become so useful and beneficial that I'll feel the need to get one; not having one would be workable but clearly limiting. I might feel that about a discrete GPU today if I did certain sorts of things, such as large scale photo or video processing.

I don't know if I believe in this general future, where a lot of important things require (good) GPU computation in order to work decently well. It seems a bit extreme. But I've been quite wrong about technology trends in the past that similarly felt extreme, so nowadays I'm not so sure of my judgment.

What would a multi-user web server look like? (A thought experiment)

By: cks

Every so often my thoughts turn to absurd ideas. Today's absurd idea is sparked by my silly systemd wish for moving processes between systemd units, which in turn was sparked by a local issue with Apache CGIs (and suexec). This got me thinking about what a modern 'multi-user' web server would look like, where by multi-user I mean a web server that's intended to serve content operated by many different people (such as many different people's CGIs). Today you can sort of do this for CGIs through Apache suexec, but as noted this has limits.

The obvious way to implement this would be to run a web server process for every different person's web area and then reverse proxy to the appropriate process. Since there might be a lot of people and not all of them are visited very often, you would want these web server processes to be started on demand and then shut down automatically after a period of inactivity, rather than running all of the time (on Linux you could sort of put this together with systemd socket units). These web server processes would run as appropriate Unix UIDs, not as the web server UID, and on Linux under appropriate systemd hierarchies with appropriate limits set.

(Starting web server units through systemd would also mean that your main web server process didn't have to be privileged or have a privileged helper, as Apache does with suexec. You could have the front end web server do the process starting and supervision itself, but then it would also need the privileges to change UIDs and the support for setting other per-user context information, some of which is system dependent.)

Although I'm not entirely fond of it, the simplest way to communicate between the main web server and the per-person web server would be through HTTP. Since HTTP reverse proxies are widely supported, this would also allow people to choose what program they'd use as their 'web server', rather than your default. However, you'd want to provide a default simple web server to handle static files, CGIs, and maybe PHP (which would be even simpler than my idea of a modern simple web server).

The main (or front-end) web server would still want to have a bunch of features like global rate limiting, since it's the only thing in a position to see aggregate requests across everyone's individual server. If you wanted to make life more complicated but also potentially more convenient, you could chose different protocols to handle different people's areas. One person could be handled via a HTTP reverse proxy, but another person might be handled through FastCGI because they purely use PHP and that's most convenient for them (provided that their FastCGI server could handle being started on demand and then stopping later).

While I started thinking of this in the context of personal home pages and personal CGIs, as we support on our main web server, you could also use this for having different people and groups manage different parts of your URL hierarchy, or even different virtual hosts (by making the URL hierarchy of the virtual host that was handed to someone be '(almost) everything').

With a certain amount of work you could probably build this today on Linux with systemd (Unix) socket activation, although I don't know what front-end or back-end web server you'd want to use. To me, it feels like there's a certain elegance to the 'everyone gets their own web server running under their own UID, go wild' aspect of this, rather than having to try to make one web server running as one UID do everything.

Some thoughts on GNOME's systemd dependencies and non-Linux Unixes

By: cks

One of the pieces of news of the time interval is (GNOME is) Introducing stronger dependencies on systemd (via). Back in the old days, GNOME was a reasonably cross-platform Unix desktop environment, one that you could run on, for example, FreeBSD. I believe that's been less and less true over time already (although the FreeBSD handbook has no disclaimers), but GNOME adding more relatively hard dependencies on systemd really puts a stake in it, since systemd is emphatically Linux-only.

It's possible that people in FreeBSD will do more and more work to create emulations of systemd things that GNOME uses, but personally I think this is quixotic and not sustainable. The more likely outcome is that FreeBSD and other Unixes will drop GNOME entirely, retaining only desktop environments that are more interested in being cross-Unix (although I'm not sure what those are these days; it's possible that GNOME is simply more visible in its Linux dependency).

An aspect of this shift that's of more interest to me is that GNOME is the desktop environment and (I believe) GUI toolkit that has been most vocal about wanting to drop support for X, and relatively soon. The current GDM has apparently already dropped support for starting non-Wayland sessions, for example (at least on Fedora, although it's possible that Fedora has been more aggressive than GNOME itself recommends). This loss of X support in GNOME has been a real force pushing for Wayland, probably including Wayland on FreeBSD. However, if FreeBSD no longer supports GNOME, the general importance of Wayland for FreeBSD may go down. Wayland's importance would especially go down if the general Unix desktop world splits into one camp that is increasingly Linux-dependent due to systemd and Wayland requirements, and another camp that is increasingly 'old school' non-systemd and X only. This second camp would become what you'd find on FreeBSD and other non-Linux Unixes.

(Despite what Wayland people may tell you, there are still a lot of desktop environments that have little or no Wayland support.)

However, this leaves the future of X GUI applications that use GTK somewhat up in the air. If GTK is going to seriously remain a cross-platform thing and the BSDs are effectively only doing X, then GTK needs to retain X support and GTK based applications will work on FreeBSD (at least as much as they ever do). But if the GNOME people decide that 'cross-platform' for GTK doesn't include X, the BSDs would be stuck in an awkward situation. One possibility is that there are enough people using FreeBSD (and X) with GTK applications that they would push the GNOME developers to keep GTK's X support.

(I care about this because I want to keep using X for as long as possible. One thing that would force me to Wayland is if important programs such as Firefox stop working on X because the GUI toolkits they use have dropped X support. The more pressure there is from FreeBSD people to keep the X support in toolkits, the better for me.)

Python argparse has a limitation on argument groups that makes me sad

By: cks

Argparse is the straightforward standard library module for handling command line arguments, with a number of nice features. One of those nice features is groups of mutually exclusive arguments. If people can only give one of '--quiet' and '--verbose' and both together make no sense, you can put them in a mutually exclusive group and argparse will check for you and generate an appropriate error. However, mutually exclusive groups have a little limitation that makes me sad.

Suppose, not hypothetically, that you have a Python program that has some timeouts. You'd like people using the program to be able to adjust the various sorts of timeouts away from their default values and also to be able to switch it to a mode where it never times out at all. Generally it makes no sense to adjust the timeouts and also to say not to have any timeouts, so you'd like to put these in a mutually exclusive group. If you have only a single timeout, this works fine; you can have a group with '--no-timeout' and '--timeout <TIME>' and it works. However, if you have multiple sorts of timeouts that people may want adjust all of, this doesn't work. If you put all of the options in a single mutually exclusive group, people can only adjust one timeout, not several of them. What you want is for the '--no-timeouts' switch to be mutually exclusive with a group of all of the timeout switches.

Unfortunately, if you read the current argparse documentation, you will find this note:

Changed in version 3.11: Calling add_argument_group() or add_mutually_exclusive_group() on a mutually exclusive group is deprecated. These features were never supported and do not always work correctly. The functions exist on the API by accident through inheritance and will be removed in the future.

You can nest a mutually exclusive group inside a regular group, and there are some uses for this. But you can't nest any sort of group inside a mutually exclusive group (or a regular group inside of a regular group). At least not officially, and there are apparently known issues with doing so that won't ever be fixed, so you probably shouldn't do it at all.

Oh well, it would have been nice.

(I suspect one reason that this isn't officially supported is that working out just what was conflicting with what in a pile of nested groups (and what error message to emit) might be a bit complex and require explicit code to handle this case.)

As an extended side note, checking this by hand isn't necessarily all that easy. If you have something, such as timeouts, that have a default value but can be changed by the user, the natural way to set them up in argparse is to make the argparse default value your real default value and then use the value argparse sets in your program. If the person running the program used the switch, you'll get their value, and if not you'll get your default value, and everything works out. Unfortunately this usage makes it difficult or impossible to see if the person running your program explicitly gave a particular switch. As far as I know, argparse doesn't expose this information, so at a minimum you have to know what your default value is and then check to see if the current value is different (and this doesn't catch the admittedly unlikely case of the person using the switch with the default value).

Potential issues in running your own identity provider

By: cks

Over on the Fediverse, Simon Tatham had a comment about (using) cloud identity providers that's sparked some discussion. Yesterday I wrote about the facets of identity providers. Today I'm sort of writing about why you might not want to run your own identity provider, despite the hazards of depending on the security of some outside third party. I'll do this by talking about what I see as being involved in the whole thing.

The hardcore option is to rely on no outside services at all, not even for multi-factor authentication. This pretty much reduces your choices for MFA down to TOTP and perhaps WebAuthn, either with devices or with hardware keys. And of course you're going to have to manage all aspects of your MFA yourself. I'm not sure if there's capable open source software here that will let people enroll multiple second factors, handle invalidating one, and so on.

One facet of being an identity provider is managing identities. There's a wide variety of ways to do this; there's Unix accounts, LDAP databases, and so on. But you need a central system for it, one that's flexible enough to cope with with real world, and that system is load bearing and security sensitive. You will need to keep it secure and you'll want to keep logs and audit records, and also backups so you can restore things if it explodes (or go all the way to redundant systems for this). If the identity service holds what's considered 'personal information' in various jurisdictions, you'll need to worry about an attacker being able to bulk-extract that information, and you'll need to build enough audit trails so you can tell to what extent that happened. Your identity system will need to be connected to other systems in your organization so it knows when people appear and disappear and can react appropriately; this can be complex and may require downstream integrations with other systems (either yours or third parties) to push updates to them.

Obviously you have to handle primary authentication yourself (usually through passwords). This requires you to build and operate a secure password store as well as a way of using it for authentication, either through existing technology like LDAP or something else (this may or may not be integrated with your identity service software, as passwords are often considered part of the identity). Like the identity service but more so, this system will need logs and audit trails so you can find out when and how people authenticated to it. The log and audit information emitted by open source software may not always meet your needs, in which case you may wind up doing some hacks. Depending on how exposed this primary authentication service is, it may need its own ratelimiting and alerting on signs of potential compromised accounts or (brute force) attacks. You will also definitely want to consider reacting in some way to accounts that pass primary authentication but then fail second-factor authentication.

Finally, you will need to operate the 'identity provider' portion of things, which will probably do either or both of OIDC and SAML (but maybe you (also) need Kerberos, or Active Directory, or other things). You will have to obtain the software for this, keep it up to date, worry about its security and the security of the system or systems it runs on, make sure it has logs and audit trails that you capture, and ideally make sure it has ratelimits and other things that monitor for and react to signs of attacks, because it's likely to be a fairly exposed system.

If you're a sufficiently big organization, some or all of these services probably need to be redundant, running on multiple servers (perhaps in multiple locations) so the failure of a single server doesn't lock you out of everything. In general, all of these expose you to all of the complexities of running your own servers and services, and each and all of them are load bearing and highly security sensitive, which probably means that you should be actively paying attention to them more or less all of the time.

If you're lucky you can find suitable all-in-one software that will handle all the facets you need (identity, primary authentication, OIDC/SAML/etc IdP, and perhaps MFA authentication) in a way that works for you and your organization. If not, you're going to have to integrate various different pieces of software, possibly leaving you with quite a custom tangle (this is our situation). The all in one software generally seems to have a reputation of being pretty complex to set up and operate, which is not surprising given how much ground it needs to cover (and how many protocols it may need to support to interoperate with other systems that want to either push data to it or pull data and authentication from it). As an all-consuming owner of identity and authentication, my impression is that such software is also something that's hard to add to an existing environment after the fact and hard to swap out for anything else.

(So when you pick an all in one open source software for this, you really have to hope that it stays good, reliable software for many years to come. This may mean you need to build up a lot of expertise before you commit so that you really understand your choices, and perhaps even do pilot projects to 'kick the tires' on candidate software. The modular DIY approach is more work but it's potentially easier to swap out the pieces as you learn more and your needs change.)

The obvious advantage of a good cloud identity provider is that they've already built all of these systems and they have the expertise and infrastructure to operate them well. Much like other cloud services, you can treat them as a (reliable) black box that just works. Because the cloud identity provider works at a much bigger scale than you do, they can also afford to invest a lot more into security and monitoring, and they have a lot more visibility into how attackers work and so on. In many organizations, especially smaller ones, looking after your own identity provider is a part time job for a small handful of technical people. In a cloud identity provider, it is the full time job of a bunch of developers, operations, and security specialists.

(This is much like the situation with email (also). The scale at which cloud providers operates dwarfs what you can manage. However, your identity provider is probably more security sensitive and the quality difference between doing it yourself and using a cloud identity provider may not be as large as it is with email.)

Thinking about facets of (cloud) identity providers

By: cks

Over on the Fediverse, Simon Tatham had a comment about cloud identity providers, and this sparked some thoughts of my own. One of my thoughts is that in today's world, a sufficiently large organization may have a number of facets to its identity provider situation (which is certainly the case for my institution). Breaking up identity provision into multiple facets can leave it not clear if and to what extend you could be said to be using a 'cloud identity provider'.

First off, you may outsource 'multi-factor authentication', which is to say your additional factor, to a specialist SaaS provider who can handle the complexities of modern MFA options, such as phone apps for push-based authentication approval. This SaaS provider can turn off your ability to authenticate, but they probably can't authenticate as a person all by themselves because you 'own' the first factor authentication. Well, unless you have situations where people only authenticate via their additional factor and so your password or other first factor authentication is bypassed.

Next is the potential distinction between an identity provider and an authentication source. The identity provider implements things like OIDC and SAML, and you may have to use a big one in order to get MFA support for things like IMAP. However, the identity provider can delegate authenticating people to something else you run using some technology (which might be OIDC or SAML but also could be something else). In some cases this delegation can be quite visible to people authenticating; they will show up to the cloud identity provider, enter their email address, and wind up on your web-based single sign on system. You can even have multiple identity providers all working from the same authentication source. The obvious exposure here is that a compromised identity provider can manufacture attested identities that never passed through your authentication source.

Along with authentication, someone needs to be (or at least should be) the 'system of record' as to what people actually exist within your organization, what relevant information you know about them, and so on. Your outsourced MFA SaaS and your (cloud) identity providers will probably have their own copies of this data where you push updates to them. Depending on how systems consume the IdP information and what other data sources they check (eg, if they check back in with your system of record), a compromised identity provider could invent new people in your organization out of thin air, or alter the attributes of existing people.

(Small IdP systems often delegate both password validation and knowing who exists and what attributes they have to other systems, like LDAP servers. One practical difference is whether the identity provider system asks you for the password or whether it sends you to something else for that.)

If you have no in-house authentication or 'who exists' identity system and you've offloaded all of these to some external provider (or several external providers that you keep in sync somehow), you're clearly at the mercy of that cloud identity provider. Otherwise, it's less clear and a lot more situational as to when you could be said to be using a cloud identity provider and thus how exposed you are. I think one useful line to look at is to ask whether a particular identity provider is used by third party services or if it's only used to for that provider's own services. Or to put it in concrete terms, as an example, do you use Github identities only as part of using Github, or do you authenticate other things through your Github identities?

(With that said, the blast radius of just a Github (identity) compromise might be substantial, or similarly for Google, Microsoft, or whatever large provider of lots of different services that you use.)

A silly systemd wish for moving new processes around systemd units

By: cks

Linux cgroups offer a bunch of robust features for limiting resource usage and handling resource contention between different groups of processes, which you can use to implement things like per-user memory and CPU resource limits. On a systemd based system, which is to say basically almost all Linuxes today, systemd more or less completely owns the cgroup hierarchy and using cgroups for resource limits requires that the processes involved be placed inside relevant systemd units, and for that matter that the systemd units exist.

Unfortunately, the mechanisms for doing this are a little bit under-developed. If you're dealing with something that goes through PAM and for which putting processes into user slices based on the UID running them is the right answer, you can use pam_systemd (which we do for various reasons). If you want a different hierarchy and things go through PAM, you can perhaps write a PAM session module that does this, copying code from pam_systemd, but I don't know if there's anything for that today. If you have processes that are started in ways that don't go through PAM, as far as I know you're currently out of luck. One case that's quite relevant for us is Apache CGI processes run through suexec.

It would be nice to be able to do better, since the odds that everything that starts processes will pick up the ability to talk to systemd to set up slices, sessions, and so on for them seem rather low. Some things have specific magic support for this, but I don't think the process is very documented and I believe it requires that things change how they start programs (so eg suexec would have to know how to do this). This means that what I'm wishing for is a daemon that would be given some sort of rules and use them to move processes between systemd slices and other units, possibly creating things like user sessions on the fly. Then you could write a rule that said 'if a process is in the Apache system cgroup and its UID isn't <X>, put it in a slice in a user hierarchy'.

An extra problem is that this daemon probably wouldn't be perfect, since it would have to react to processes after they'd appeared rather than intercept their creation; some processes could slip through the cracks or otherwise do weird things. This would make it sort of a hack, rather than something that I suspect anyone would want as a proper feature.

(I don't know if a kernel LSM could make this more reliable by intercepting and acting on certain things, like setuid() calls.)

PS: Possibly the correct answer is to persuade the Apache people to make suexec consult PAM, even if the standard suexec PAM stack does nothing. Then you could in theory add pam_systemd or whatever there. It appears that Debian may have had a custom patch for this at one but I believe they gave it up years and years ago.

Adding your own attributes to Python functions and Python typing

By: cks

Every so often I have some Python code where I have a collection of functions and along with the functions, some additional information about them. For example, the functions might implement subcommands and there might be information about help text, the number of command line arguments, and so on. There are a variety of approaches for this, but a very simple one I've tended to use is to put one or more additional attributes on the functions. This looks like:

def dosomething(....):
  [....]
dosomething._cmdhelp_ = "..."

(These days you might use a decorator on the function instead of explicitly attaching attributes, but I started doing this before decorators were as much of a thing in Python.)

Unfortunately, as I've recently discovered this pattern is one that (current) Python type checkers don't really like. For perfectly rational reasons, Python type checkers like to verify that every attribute you're setting on something actually exists and isn't, say, a typo. Functions don't normally have random user-defined attributes and so type checkers will typically complain about this (as I found out in the course of recent experiments with new type checkers).

For new code, there are probably better patterns these days, such as writing a decorator that auto-registers the subcommand's function along with its help text, argument information, and so on. For existing code, this is a bit annoying, although I can probably suppress the warnings. It would be nice if type checkers understood this idiom but adding your own attributes to individual functions (or other standard Python types) is probably so rare there's no real point.

(And it's not as if I'm adding this attribute to all functions in my program, only to the ones that implement subcommands.)

The moral that I draw from this is that old code that I may want to use type-inferring type checkers on (cf) may have problems beyond missing type hints and needing some simple changes to pacify the type checkers. It's probably not worth doing a big overhaul of such code to modernize it. Alternately, perhaps I want to make the code simpler and less tricky, even though it's more verbose to write (for example, explicitly listing all of the subcommand functions along with their help text and other things in a dict). The more verbose version will be easier for me in the future (or my co-workers) to follow, even if it's less clever and more typing up front.

You could automate (some) boilerplate Go error handling with a formatter

By: cks

Recently I read [ On Β¦ No ] syntactic support for error handling, where the Go team have decided that they're not going to do anything with error handling (contrary to my fears). One part of it sparked a thought about making it less annoying to write basic, boilerplate error handling, and I briefly said something on the Fediverse:

A thought on Go error handling: you could reduce the annoyance of writing the boiler plate by creating a code formatter (along the lines of goimports) that added the standard 'if (err != nil) {...}' stuff to any unhandled 'err' return in your code. Then you'd write code with just 'a, err := Thing(); b, err := Thing2(a)' and so on, and the code formatter would fill in things for you.

This thought was inspired by the model of goimports; my impression is that almost everyone uses goimports to automatically update import statements as part of (auto-)formatting their code (certainly I do). If goimports can update imports as part of 'formatting', then in theory we can extend it to add boilerplate 'check and return' error handling. This would take a set of code like this:

a, err := Thing()
b, err := Thing2(a)
return Thing3(b)

This code would be turned into:

a, err := Thing()
if (err != nil) {
  return <zero value>, err
}
b, err := Thing2(a)
if (err != nil) {
  return <zero value>, err
}
return Thing3(b)

(Here by '<zero value>' I mean some suitable zero value, like 'nil', not the literal text '<zero value>'.)

This would only happen if the error return was assigned to a variable and then the variable was unused (before potentially being reassigned). If you wrote 'a, _ := Thing()', there would be no error check added, and if you re-ran the formatter on the post-formatter code it wouldn't do anything because all of the error variables are used. Determining if the error variable was used would need some control flow analysis, and determining what to return and the syntax of creating appropriate zero values for non-error return values would take some function signature and type analysis.

(As an extra trick, if you had an 'if (err != nil) {return ...}' block already, the formatter could copy the 'return ...' into the blocks it added.)

A Go code formatter isn't the only place you could implement this feature. These days many people would be able to use it if it was a 'code action' in a LSP server such as gopls, which already supports other code actions. As a LSP server code action, you could easily apply it selectively only to a function (or a section of a function), rather than having to trust the formatter to run over your entire file. You could also have a LSP server code action that added the boilerplate error check immediately after a call, so you'd write the function call line and then pick 'add boilerplate error check' at the end.

I don't know if this is a good idea, either as a standalone formatter or as a LSP code feature. I'm not a big user of gopls code actions (I mostly ignore them) and I'm not sure I'd be happy writing code that looked like the starting point. But perhaps some people would be, especially if it was an 'expand this function call to have boilerplate error checks' in gopls.

(I don't write enough Go code to have strong feelings about the boilerplate error handling. At my scale, hand-writing the standard 'if (err != nil)' stuff is fine, and I not infrequently want to do something with the error.)

Python type checkers work in different ways and can check different things

By: cks

For all of the time so far that I've been poking at Python's type checking, I've known that there was more than one program for type checking but I've basically ignored that and used mypy. My understanding was that mypy was the first Python type checker and the only fully community-based one, with the other type checkers the product of corporations and sometimes at least partially tied to things like Microsoft's efforts to get everyone hooked on VSCode, and I assumed that the type checkers mostly differed in things like speed and what they integrated with. Recently, I read Pyrefly vs. ty: Comparing Python’s Two New Rust-Based Type Checkers (via) and discovered that I was wrong about this, and at least some Python type checkers work quite differently from mypy in ways that matter to me.

Famously (for those who've used it), mypy really wants you to add explicit typing information to your code. I believe it has some ability to deduce types for you, but at least for me it doesn't do very much to our programs without types (although part of this is that I need to turn on '--check-untyped-defs'). Other type checkers are more willing to be aggressive about deducing types from your code without explicit typing information. This is potentially interesting to me because we have a lot of code without types at work and we'll probably never add explicit type hints to it. Being able to use type checking to spot potential errors in this un-hinted code would be useful, if the various type checkers can understand the code well enough.

(In quick experiments, some of the type checkers need some additional hints, like explicitly initializing objects with the right types instead of 'None' or adding asserts to tell them that values are set. In theory they could deduce this stuff from code flow analysis, although in some cases it might need relatively sophisticated value propagation.)

Discovering this means that I'm at least going to keep my eye on the alternate type checkers, and maybe add some little bits to our programs to make the type checkers happier with things. These are early days for both of the new ones from the article and my experiments suggest that some of their deduced typing is off some of the time, but I can hope that will improve with more development.

(There are also some idioms that bits of our code use that probably will never be fully accepted by type checkers, but that's another entry.)

PS: My early experiments didn't turn up anything in the code that I tried it on, but then this code is already running stably in production. It would be a bit weird to discover a significant type confusion bug in any of it at this point. Still, checking is reassuring, especially about sections of the code that aren't exercised very often, such as error handling.

I have divided (and partly uninformed) views on OpenTelemetry

By: cks

OpenTelemetry ('OTel') is one of the current in things in the broad metrics and monitoring space. As I understand it, it's fundamentally a set of standards (ie, specifications) for how things can emit metrics, logs, and traces; the intended purpose is (presumably) so that people writing programs can stop having to decide if they expose Prometheus format metrics, or Influx format metrics, or statsd format metrics, or so on. They expose one standard format, OpenTelemetry, and then everything (theoretically) can consume it. All of this has come on to my radar because Prometheus can increasingly ingest OpenTelemetry format metrics and we make significant use of Prometheus.

If OpenTelemetry is just another metrics format that things will produce and Prometheus will consume just as it consumes Prometheus format metrics today, that seems perfectly okay. I'm pretty indifferent to the metrics formats involved, presuming that they're straightforward to generate and I never have to drop everything and convert all of our things that generate (Prometheus format) metrics to generating OpenTelemetry metrics. This would be especially hard because OpenTelemtry seems to require either Protobuf or (complex) JSON, while the Prometheus metrics format is simple text.

However, this is where I start getting twitchy. OpenTelemetry certainly gives off the air of being a complex ecosystem, and on top of that it also seems to be an application focused ecosystem, not a system focused one. I don't think that metrics are as highly regarded in application focused ecosystems as logs and traces are, while we care a lot about metrics and not very much about the others, at least in an OpenTelemtry context. To the extent that OpenTelemtry diverts people away from producing simple, easy to use and consume metrics, I'm going to wind up being unhappy with it. If what 'OpenTelemtry support' turns out to mean in practice is that more and more things have minimal metrics but lots of logs and traces, that will be a loss for us.

Or to put it another way, I worry that an application focused OpenTelemetry will pull the air away from the metrics focused things that I care about. I don't know how realistic this worry is. Hopefully it's not.

(Partly I'm underinformed about OpenTelemetry because, as mentioned I often feel disconnected from the mainstream of 'observability', so I don't particularly try to keep up with it.)

Things are different between system and application monitoring

By: cks

We mostly run systems, not applications, due to our generally different system administration environment. Many organizations instead run applications. Although these applications may be hosted on some number of systems, the organizations don't care about the systems, not really; they care about how the applications work (and the systems only potentially matter if the applications have problems). It's my increasing feeling that this has created differences in the general field of monitoring such systems (as well as alerting), which is a potential issue for us because most of the attention is focused on the application area of things.

When you run your own applications, you get to give them all of the 'three pillars of observability' (metrics, traces, and logs, see here for example). In fact, emitting logs is sort of the default state of affairs for applications, and you may have to go out of your way to add metrics (my understanding is that traces can be easier). Some people even process logs to generate metrics, something that's supported by various log ingestion pipelines these days. And generally you can send your monitoring output to wherever you want, in whatever format you want, and often you can do things like structuring them.

When what you run is systems, life is a lot different. Your typical Unix system will most easily provide low level metrics about things. To the extent that the kernel and standard applications emit logs, these logs come in a variety of formats that are generally beyond your control and are generally emitted to only a few places, and the overall logs of what's happening on the system are often extremely incomplete (partly because 'what's happening on the system' is a very high volume thing). You can basically forget about having traces. In the modern Linux world of eBPF it's possible to do better if you try hard, but you'll probably be building custom tooling for your extra logs and traces so they'd better be sufficiently important (and you need the relevant expertise, which may include reading kernel and program source code).

The result is that for people like us who run systems, our first stop for monitoring is metrics and they're what we care most about; our overall unstructured logs are at best a secondary thing, and tracing some form of activity is likely to be something done only to troubleshoot problems. Meanwhile, my strong impression is that application people focus on logs and if they have them, traces, with metrics only a distant and much less important third (especially in the actual applications, since metrics can be produced by third party tools from their logs).

(This is part of why I'm so relatively indifferent to smart log searching systems. Our central syslog server is less about searching logs and much more about preserving them in one place for investigations.)

The types of TLS we see when sending email to other people (as of May 2025)

By: cks

This is a companion to my entry on the types of TLS seen for incoming email; this is for May 2025 because that's when the data I'm using comes from. This data covers nine days and about 12,800 external mail deliveries that originated from people (instead of from things like mail forwarding), and I'm going to be rounding numbers off for my own reasons.

Of those external deliveries, almost all of them used some form of TLS; basically 99% (call it 12,675, although it turns out a bunch of those should be ignored for reasons beyond the scope of this entry). Almost all of the 'real' messages used TLS 1.3; 89% used TLS 1.3 and 11% used TLS 1.2, with no other TLS versions used. Interestingly, the outgoing top level signature schemes are different than the incoming ones:

  5240  X=TLS1.3:ECDHE_SECP256R1
  2530  X=TLS1.3:ECDHE_X25519
   660  X=TLS1.2:ECDHE_SECP256R1
   220  X=TLS1.2:ECDHE_X25519
    18  X=TLS1.2:RSA
     5  X=TLS1.2:ECDHE_SECP384R1
     4  X=TLS1.2:ECDHE_SECP521R1
     2  X=TLS1.3:ECDHE_SECP384R1
     2  X=TLS1.2:DHE_CUSTOM2048

The destinations that used TLS 1.2 are much more assorted than I would have expected and they make me wonder about the cipher preferences that our Ubuntu 22.04 version of Exim is telling servers about. On the other hand, some of the surprising ones are symmetrical; for example, people at Amazon appear to genuinely be using TLS 1.2 both when receiving email from us and when sending email to their correspondents here (amazonses.com uses TLS 1.3 for outgoing email, but amazon.com doesn't seem to).

We send a lot of email to some places. One of them is hosted by Microsoft, and uses TLS 1.3 with ECDHE SECP256R1; another is GMail, which (with us) uses TLS 1.3 with ECDHE X25519. These distort the overall outgoing statistics, although there is probably a similar effect for our incoming email. Now that I'm poking at these logs, I've wound up feeling that a real analysis would have to look at the organizations running the MX targets of the domains (which is too much work for me for now).

The non-TLS destinations are mostly some groups within the university who seem to still have (old) mail servers that haven't been set up with TLS for various reasons. There are a few outside organizations, including a university or two that surprise me.

(These statistics are less interesting than I was hoping, but so it goes sometimes. I don't know unless I go look.)

The types of TLS seen on our external SMTP MX (as of May 2025)

By: cks

Back in April 2023 I did some statistics on what versions of TLS our external SMTP email gateway was seeing. Today, for reasons outside of the scope of this entry, I feel like revisiting those numbers to show how things have changed (somewhat). As with the first set of numbers, these cover the previous nine days of data to us, a fairly large computer science department in a fairly large university.

(Conveniently, our external SMTP gateway is still running Ubuntu 22.04, as it was two years ago, and so it still has the same selection of TLS versions and cipher suites and so on.)

Over the past nine days, we've received 90,938 email messages, of which 74,256 used some version of TLS, so roughly 82% of our incoming email is encrypted and 18% isn't. This latter number (and percent) is not really great, but it is what it is and it's substantially smaller than it was two years ago. What Exim reports for TLS versions breaks down as follows:

 58596  X=TLS1.3   [79%]
 15509  X=TLS1.2   [21%]
   135  X=TLS1.0
    16  X=TLS1.1

The TLS 1.0 email appears to be all or almost all from spammers, and I think it's mostly from on particularly prolific spam source (that revolves IPs, sending domains, and so on). The TLS 1.1 email is from a small handful of places, and anyway there's almost none of it. The TLS 1.2 email is a substantial portion and appears to come from a number of places, including some hosting providers, various significant universities, ieee.org, lists.ubuntu.com, and others. If we ignore various machines within the university, the non-TLS email appears to be mostly but not entirely spammers; one notable real non-TLS source is Air Canada.

The heartening thing to me is that in two years, incoming TLS versions have switched so that TLS 1.3 is dominant and TLS 1.2 has shrunk to only a fifth. I'm not sure I would have guessed that things would change that fast.

Exim conveniently formats its TLS information so I can show a top level view of the broad signature schemes in use:

 48011  X=TLS1.3:ECDHE_X25519
 11774  X=TLS1.2:ECDHE_SECP256R1
  9968  X=TLS1.3:ECDHE_SECP384R1
  1799  X=TLS1.2:ECDHE_X25519
  1631  X=TLS1.2:ECDHE_SECP521R1
   617  X=TLS1.3:ECDHE_SECP256R1
   159  X=TLS1.2:ECDHE_SECP384R1
   146  X=TLS1.2:RSA
   129  X=TLS1.0:ECDHE_SECP256R1
    12  X=TLS1.1:ECDHE_SECP521R1
     4  X=TLS1.1:RSA
     4  X=TLS1.0:RSA
     2  X=TLS1.0:ECDHE_SECP521R1

Pretty clearly, in TLS 1.3 ECDHE with X25519 has decisively won (at least in inbound SMTP) over the NIST curves, although there are still a decent number of holdouts. This dominance isn't there in TLS 1.2, where instead ECDHE with X25519 is a minority position and the dominant one is SECP256R1.

Overall there were 31 different full cipher suites used, and so I'll give a little (partial) breakdown by protocol:

 24653  X=TLS1.3: ECDHE_X25519 RSA_PSS_RSAE_SHA256 AES_256_GCM: 256
 23083  X=TLS1.3: ECDHE_X25519 RSA_PSS_RSAE_SHA256 AES_128_GCM: 128
  9968  X=TLS1.3: ECDHE_SECP384R1 RSA_PSS_RSAE_SHA256 AES_256_GCM: 256
   617  X=TLS1.3: ECDHE_SECP256R1 RSA_PSS_RSAE_SHA256 AES_256_GCM: 256
   275  X=TLS1.3: ECDHE_X25519 RSA_PSS_RSAE_SHA512 AES_256_GCM: 256

  8199  X=TLS1.2: ECDHE_SECP256R1 RSA_SHA512 AES_256_GCM: 256
  1978  X=TLS1.2: ECDHE_SECP256R1 RSA_SHA256 AES_128_GCM: 128
  1590  X=TLS1.2: ECDHE_SECP521R1 RSA_SHA512 AES_256_GCM: 256
  1428  X=TLS1.2: ECDHE_SECP256R1 RSA_SHA512 AES_128_GCM: 128
   738  X=TLS1.2: ECDHE_X25519 RSA_PSS_RSAE_SHA256 AES_128_GCM: 128
 [... 21 different ones in total ...]
     1  X=TLS1.2: RSA AES_128_CBC SHA1: 128

    12  X=TLS1.1: ECDHE_SECP521R1 RSA_SHA1 AES_256_CBC SHA1: 256
     4  X=TLS1.1: RSA AES_256_CBC SHA1: 256

   129  X=TLS1.0: ECDHE_SECP256R1 RSA_SHA1 AES_256_CBC SHA1: 256
     4  X=TLS1.0: RSA AES_256_CBC SHA1: 256
     2  X=TLS1.0: ECDHE_SECP521R1 RSA_SHA1 AES_256_CBC SHA1: 256

(This has been slightly reformatted from how Exim presents its ciphers to create more word breaks.)

The least used cipher is the TLS 1.2 one above that was used once. It's somewhat amusing to me that TLS 1.3 has such an even division between X25519 ECDHE with 128-bit or 256-bit AES GCM (which was also there two years ago).

Sidebar: the TLS 1.2 RSA ciphers

Here they are:

    97  X=TLS1.2:RSA__AES_256_CBC__SHA1:256
    48  X=TLS1.2:RSA__AES_256_GCM:256
     1  X=TLS1.2:RSA__AES_128_CBC__SHA1:128

As before, I don't know how horrified I should be.

In POSIX, you can theoretically use inode zero

By: cks

When I wrote about the length of file names in early Unix, I noted that inode numbers were unsigned 16-bit integers and thus you could only have at most 65,536 inodes in any given filesystem. Over on the Fediverse, JdeBP correctly noted that I had an off by one error. The original Unix directory entry format used a zero inode number to mark deleted entries, which meant that you couldn't actually use inode zero for anything (not even the root directory of the filesystem, which needed a non-zero inode number in order to have a '.' entry).

(Contrary to what I said in that Fediverse thread, I think that V7 and earlier may not have actually had a zero inode. The magic happens in usr/sys/h/param.h in the itod() and itoo() macros. These give a result for inode 0, but I suspect they're never supposed to be used; if I'm doing it right, inode 1 is at offset 0 within block 2.)

Since I'm the sort of person that I am, this made me wonder if you could legally use inode zero today in a POSIX compliant system. The Single Unix Specification, which is more or less POSIX, sets out that ino_t is some unsigned integer type, but it doesn't constrain its value. Instead, inode numbers are simply called the 'file serial number' in places like sys/stat.h and dirent.h, and the stat() family of functions, readdir() and posix_getdents() don't put any restrictions on the inode numbers except that st_dev and st_ino together uniquely identify a file. In the normal way to read standards, anything not explicitly commented on is allowed, so you're allowed to return a zero for the inode value in these things (provided that there is only one per st_dev, or at least that all of them are the same file, hardlinked together).

On Linux, I don't think there's any official restrictions on whether there can be a zero inode in some weird filesystem (eg, also), although kernel and user-space code may make assumptions. FreeBSD doesn't document any restrictions in stat(2) or getdirentries(2). The OpenBSD stat(2) and getdents(2) manual pages are similarly silent about whether or not the inode number can be zero.

(The tar archive format doesn't include inode numbers. The cpio archive format does, but I can't find a mention of a zero inode value having special meaning. The POSIX pax archive format is similarly silent; both cpio and pax use the inode number only as part of recording hardlinks.)

Whether it would be a good idea to make a little filesystem that returned a zero inode value for some file is another question. Enterprising people can try it and see, which these days might be possible with various 'filesystem in user space' features (although these might filter and restrict the inode numbers that you can use). My personal expectation is that there are a variety of things that expect non-zero inode numbers for any real file and might use a zero inode number as, for example, a 'no such file' signal value.

The length of file names in early Unix

By: cks

If you use Unix today, you can enjoy relatively long file names on more or less any filesystem that you care to name. But it wasn't always this way. Research V7 had 14-byte filenames, and the System III/System V lineage continued this restriction until it merged with BSD Unix, which had significantly increased this limit as part of moving to a new filesystem (initially called the 'Fast File System', for good reasons). You might wonder where this unusual number came from, and for that matter, what the file name limit was on very early Unixes (it was 8 bytes, which surprised me; I vaguely assumed that it had been 14 from the start).

I've mentioned before that the early versions of Unix had a quite simple format for directory entries. In V7, we can find the directory structure specified in sys/dir.h (dir(5) helpfully directs you to sys/dir.h), which is so short that I will quote it in full:

#ifndef	DIRSIZ
#define	DIRSIZ	14
#endif
struct	direct
{
    ino_t    d_ino;
    char     d_name[DIRSIZ];
};

To fill in the last blank, ino_t was a 16-bit (two byte) unsigned integer (and field alignment on PDP-11s meant that this structure required no padding), for a total of 16 bytes. This directory structure goes back to V4 Unix. In V3 Unix and before, directory entries were only ten bytes long, with 8 byte file names.

(Unix V4 (the Fourth Edition) was when the kernel was rewritten in C, so that may have been considered a good time to do this change. I do have to wonder how they handled the move from the old directory format to the new one, since Unix at this time didn't have multiple filesystem types inside the kernel; you just had the filesystem, plus all of your user tools knew the directory structure.)

One benefit of the change in filename size is that 16-byte directory entries fit evenly in 512-byte disk blocks (or other powers-of-two buffer sizes). You never have a directory entry that spans two disk blocks, so you can deal with directories a block at a time. Ten byte directory entries don't have this property; eight-byte ones would, but then that would leave space for only six character file names, and presumably that was considered too small even in Unix V1.

PS: That inode numbers in V7 (and earlier) were 16-bit unsigned integers does mean what you think it means; there could only be at most 65,536 inodes in a single classical V7 filesystem. If you needed more files, you had better make more filesystems. Early Unix had a lot of low limits like that, some of them quite hard-coded.

My blocking of some crawlers is an editorial decision unrelated to crawl volume

By: cks

Recently I read a lobste.rs comment on one of my recent entries that said, in part:

Repeat after me everyone: the problem with these scrapers is not that they scrape for LLM’s, it’s that they are ill-mannered to the point of being abusive. LLM’s have nothing to do with it.

This may be some people's view but it is not mine. For me, blocking web scrapers here on Wandering Thoughts is partly an editorial decision of whether I want any of my resources or my writing to be fed into whatever they're doing. I will certainly block scrapers for doing what I consider an abusive level of crawling, and in practice most of the scrapers that I block come to my attention due to their volume, but I will block low-volume scrapers because I simply don't like what they're doing it for.

Are you a 'brand intelligence' firm that scrapes the web and sells your services to brands and advertisers? Blocked. In general, do you charge for access to whatever you're generating from scraping me? Probably blocked. Are you building a free search site for a cause (and with a point of view) that I don't particularly like? Almost certainly blocked. All of this is an editorial decision on my part on what I want to be even vaguely associated with and what I don't, not a technical decision based on the scraping's effects on my site.

I am not going to even bother trying to 'justify' this decision. It's a decision that needs no justification to some and to others, it's one that can never be justified. My view is that ethics matter. Technology and our decisions of what to do with technology are not politically neutral. We can make choices, and passively not doing anything is a choice too.

(I could say a lot of things here, probably badly, but ethics and politics are in part about what sort of a society we want, and there's no such thing as a neutral stance on that. See also.)

I would block LLM scrapers regardless of how polite they are. The only difference them being politer would make is that I would be less likely to notice (and then block) them. I'm probably not alone in this view.

Our Grafana and Loki installs have quietly become 'legacy software' here

By: cks

At this point we've been running Grafana for quite some time (since late 2018), and (Grafana) Loki for rather less time and on a more ad-hoc and experimental basis. However, over time both have become 'legacy software' here, by which I mean that we (I) have frozen their versions and don't update them any more, and we (I) mostly or entirely don't touch their configurations any more (including, with Grafana, building or changing dashboards).

We froze our Grafana version due to backward compatibility issues. With Loki I could say that I ran out of enthusiasm for going through updates, but part of it was that Loki explicitly deprecated 'promtail' in favour of a more complex solution ('Alloy') that seemed to mostly neglect the one promtail feature we seriously cared about, namely reading logs from the systemd/journald complex. Another factor was it became increasingly obvious that Loki was not intended for our simple setup and future versions of Loki might well work even worse in it than our current version does.

Part of Grafana and Loki going without updates and becoming 'legacy' is that any future changes in them would be big changes. If we ever have to update our Grafana version, we'll likely have to rebuild a significant number of our current dashboards, because they use panels that aren't supported any more and the replacements have a quite different look and effect, requiring substantial dashboard changes for the dashboards to stay decently usable. With Loki, if the current version stopped working I'd probably either discard the idea entirely (which would make me a bit sad, as I've done useful things through Loki) or switch to something else that had similar functionality. Trying to navigate the rapids of updating to a current Loki is probably roughly as much work (and has roughly as much chance of requiring me to restart our log collection from scratch) as moving to another project.

(People keep mentioning VictoriaLogs (and I know people have had good experiences with it), but my motivation for touching any part of our Loki environment is very low. It works, it hasn't eaten the server it's on and shows no sign of doing that any time soon, and I'm disinclined to do any more work with smart log collection until a clear need shows up. Our canonical source of history for logs continues to be our central syslog server.)

Intel versus AMD is currently an emotional decision for me

By: cks

I recently read Michael Stapelberg's My 2025 high-end Linux PC. One of the decisions Stapelberg made was choosing an Intel (desktop) CPU because of better (ie lower) idle power draw. This is a perfectly rational decision to make, one with good reasoning behind it, and also as I read the article I realized that it was one I wouldn't have made. Not because I don't value idle power draw; like Stapelberg's machine but more so, my desktops spend most of their time essentially idle. Instead, it was because I realized (or confirmed my opinion) that right now, I can't stand to buy Intel CPUs.

I am tired of all sorts of aspects of Intel. I'm tired of their relentless CPU product micro-segmentation across desktops and servers, with things like ECC allowed in some but not all models. I'm tired of their whole dance of P-cores and E-cores, and also of having to carefully read spec sheets to understand the P-core and E-core tradeoffs for a particular model. I'm tired of Intel just generally being behind AMD and repeatedly falling on its face with desperate warmed over CPU refreshes that try to make up for its process node failings. I'm tired of Intel's hardware design failure with their 13th and 14th generation CPUs (see eg here). I'm sure AMD Ryzens have CPU errata too that would horrify me if I knew, but they're not getting rubbed in my face the way the Intel issue is.

At this point Intel has very little going for its desktop CPUs as compared to the current generation AMD Ryzens. Intel CPUs have better idle power levels, and may have better single-core burst performance. In absolute performance I probably won't notice much difference, and unlike Stapelberg I don't do the kind of work where I really care about build speed (and if I do, I have access to much more powerful machines). As far as the idle power goes, I likely will notice the better idle power level (some of the time), but my system is likely to idle at lower power in general than Stapelberg's will, especially at home where I'll try to use the onboard graphics if at all possible (so I won't have the (idle) power price of a GPU card).

(At work I need to drive two 4K displays at 60Hz and I don't think there are many motherboards that will do that with onboard graphics, even if the CPU's built in graphics system is up to it in general.)

But I don't care about the idle power issue. If or when I build a new home desktop, I'll eat the extra 20 watts or so of idle power usage for an AMD CPU (although this may vary in practice, especially with screens blanked). And I'll do it because right now I simply don't want to give Intel my money.

My GNU Emacs settings for the vertico package (as of mid 2025)

By: cks

As covered in my Emacs packages, vertico is one of the third party Emacs packages that I have installed to modify how minibuffer completion works for me, or at least how it looks. In my experience, vertico took a significant amount of customization before I really liked it (eventually including some custom code), so I'm going to write down some notes about why I made various settings.

Vertico itself is there to always show me a number of the completion targets, as a help to narrowing in on what I want; I'm willing to trade vertical space during completion for a better view of what I'm navigating around. It's not the only way to do this (there's fido-vertical-mode in standard GNU Emacs, for example), but it's what I started with and it has a number of settings that let me control both how densely the completions are presented (and so how many of them I get to see at once) and how they're presented.

The first thing I do with vertico is override its key binding for TAB, because I want standard Emacs minibuffer tab completion, not vertico's default behavior of inserting the current thing completion is currently on. Specifically, my key bindings are:

 :bind (:map vertico-map
             ("TAB" . minibuffer-complete)
             ;; M-v is taken by vertico
             ("M-g M-c" . switch-to-completions)
             ;; Original tab binding, which we want sometimes when
             ;; using orderless completion.
             ("M-TAB" . vertico-insert))

I normally work by using regular tab completion and orderless's completion until I'm happy, then hitting M-TAB if necessary and then RET. I use M-g M-c so rarely that I'd forgotten it until writing this entry. Using M-TAB is especially likely for a long filename completion, where I might use the cursor keys (or theoretically the mouse) to move vertico's selection to a directory and then hit M-TAB to fill it in so I can then tab-complete within it.

Normally, vertico displays a single column of completion candidates, which potentially leaves a lot of wasted space on the right; I use marginalia to add information some sorts of completion targets (such as Emacs Lisp function names) in this space. For other sorts of completions where there's no particular additional information, such as MH-E mail folder names, I use vertico's vertico-multiform-mode to switch to a vertico-grid so I fill the space with several columns of completion candidates and reduce the number of vertical lines that vertico uses (both are part of vertico's extensions).

(I also have vertico-mouse enabled when I'm using Emacs under X, but in practice I mostly don't use it.)

Another important change (for me) is to turn off vertico's default behavior of remembering the history of your completions and putting recently used entries first in the list. This sounds like a fine idea, but in practice I want my completion order to be completely predictable and I'm rarely completing the same thing over and over again. The one exception is my custom MH-E folder completion, where I do enable history because I may be, for example, refiling messages into one of a few folders. This is done through another extension, vertico-sort, or at least I think it is.

(When vertico is installed as an ELPA or MELPA package and then use-package'd, you apparently get all of the extensions without necessarily having to specifically enable them and can just use bits from them.)

My feeling is that effective use of vertico probably requires this sort of customization if you regularly use minibuffer completion for anything beyond standard things where vertico (and possibly marginalia) can make good use of all of your horizontal space. Beyond what key bindings and other vertico behavior you can stand and what behavior you have to change, you want to figure out how to tune vertico so that it's significantly useful for each thing you regularly complete, instead of mostly showing you a lot of empty space and useless results. This is intrinsically a relatively personal thing.

PS: One area where vertico's completion history is not as useful as it looks is filename completion or anything that looks like it (such as standard MH-E folder completion). This is because Emacs filename completion and thus vertico's history happens component by component, while you probably want your history to give you the full path that you wound up completing.

PPS: I experimented with setting vertico-resize, but found that the resulting jumping around was too visually distracting.

A thought on JavaScript "proof of work" anti-scraper systems

By: cks

One of the things that people are increasingly using these days to deal with the issue of aggressive LLM and other web scrapers is JavaScript based "proof of work" systems, where your web server requires visiting clients to run some JavaScript to solve a challenge; one such system (increasingly widely used) is Xe Iaso's Anubis. One of the things that people say about these systems is that LLM scrapers will just start spending the CPU time to run this challenge JavaScript, and LLM scrapers may well have lots of CPU time available through means such as compromised machines. One of my thoughts is that things are not quite as simple for the LLM scrapers as they look.

An LLM scraper is operating in a hostile environment (although its operator may not realize this). In a hostile environment, dealing with JavaScript proof of work systems is not as simple as simply running it, because you can't particularly tell a JavaScript proof of work system from JavaScript that does other things. Letting your scraper run JavaScript means that it can also run JavaScript for other purposes, for example for people who would like to exploit your scraper's CPU to do some cryptocurrency mining, or simply have you run JavaScript for as long as you'll let it keep going (perhaps because they've recognized you as a LLM scraper and want to waste as much of your CPU as possible).

An LLM scraper can try to recognize a JavaScript proof of work system but this is a losing game. The other parties have every reason to make themselves look like a proof of work system, and the proof of work systems don't necessarily have an interest in being recognized (partly because this might allow LLM scrapers to short-cut their JavaScript with optimized host implementations of the challenges). And as both spammers and cryptocurrency miners have demonstrated, there is no honor among thieves. If LLM scrapers dangle free computation in front of people, someone will spring up to take advantage of it. This leaves LLM scrapers trying to pick a JavaScript runtime limit that doesn't cut them off from too many sites, while sites can try to recognize LLM scrapers and increase their proof of work difficulty if they see a suspect.

(This is probably not an original thought, but it's been floating around my head for a while.)

PS: JavaScript proof of work systems aren't the greatest thing, but they're going to happen unless someone convincingly demonstrates a better alternative.

The length of file names in early Unix

By: cks

If you use Unix today, you can enjoy relatively long file names on more or less any filesystem that you care to name. But it wasn't always this way. Research V7 had 14-byte filenames, and the System III/System V lineage continued this restriction until it merged with BSD Unix, which had significantly increased this limit as part of moving to a new filesystem (initially called the 'Fast File System', for good reasons). You might wonder where this unusual number came from, and for that matter, what the file name limit was on very early Unixes (it was 8 bytes, which surprised me; I vaguely assumed that it had been 14 from the start).

I've mentioned before that the early versions of Unix had a quite simple format for directory entries. In V7, we can find the directory structure specified in sys/dir.h (dir(5) helpfully directs you to sys/dir.h), which is so short that I will quote it in full:

#ifndef	DIRSIZ
#define	DIRSIZ	14
#endif
struct	direct
{
    ino_t    d_ino;
    char     d_name[DIRSIZ];
};

To fill in the last blank, ino_t was a 16-bit (two byte) unsigned integer (and field alignment on PDP-11s meant that this structure required no padding), for a total of 16 bytes. This directory structure goes back to V4 Unix. In V3 Unix and before, directory entries were only ten bytes long, with 8 byte file names.

(Unix V4 (the Fourth Edition) was when the kernel was rewritten in C, so that may have been considered a good time to do this change. I do have to wonder how they handled the move from the old directory format to the new one, since Unix at this time didn't have multiple filesystem types inside the kernel; you just had the filesystem, plus all of your user tools knew the directory structure.)

One benefit of the change in filename size is that 16-byte directory entries fit evenly in 512-byte disk blocks (or other powers-of-two buffer sizes). You never have a directory entry that spans two disk blocks, so you can deal with directories a block at a time. Ten byte directory entries don't have this property; eight-byte ones would, but then that would leave space for only six character file names, and presumably that was considered too small even in Unix V1.

PS: That inode numbers in V7 (and earlier) were 16-bit unsigned integers does mean what you think it means; there could only be at most 65,536 inodes in a single classical V7 filesystem. If you needed more files, you had better make more filesystems. Early Unix had a lot of low limits like that, some of them quite hard-coded.

What keeps Wandering Thoughts more or less free of comment spam (2025 edition)

By: cks

Like everywhere else, Wandering Thoughts (this blog) gets a certain amount of automated comment spam attempts. Over the years I've fiddled around with a variety of anti-spam precautions, although not all of them have worked out over time. It's been a long time since I've written anything about this, because one particular trick has been extremely effective ever since I introduced it.

That one trick is a honeypot text field in my 'write a comment' form. This field is normally hidden by CSS, and in any case the label for the field says not to put anything in it. However, for a very long time now, automated comment spam systems seem to operate by stuffing some text into every (text) form field that they find before they submit the form, which always trips over this. I log the form field's text out of curiosity; sometimes it's garbage and sometimes it's (probably) meaningful for the spam comment that the system is trying to submit.

Obviously this doesn't stop human-submitted spam, which I get a small amount of every so often. In general I don't expect anything I can reasonably do to stop humans who do the work themselves; we've seen this play out in email and I don't have any expectations that I can do better. It also probably wouldn't work if I was using a popular platform that had this as a general standard feature, because then it would be worth the time of the people writing automated comment spam systems to automatically recognize it and work around it.

Making comments on Wandering Thoughts also has an additional small obstacle in the way of automated comment spammers, which is that you must initially preview your comment before you can submit it (although you don't have to submit the comment that you previewed, you can edit it after the first preview). Based on a quick look at my server logs, I don't think this matters to the current automated comment spam systems that try things here, as they only appear to try submitting once. I consider requiring people to preview their comment before posting it to be a good idea in general, especially since Wandering Thoughts uses a custom wiki-syntax and a forced preview gives people some chance of noticing any mistakes.

(I think some amount of people trying to write comments here do miss this requirement and wind up not actually posting their comment in the end. Or maybe they decide not to after writing one version of it; server logs give me only so much information.)

In a world that is increasingly introducing various sorts of aggressive precautions against LLM crawlers, including 'proof of work' challenges, all of this may become increasingly irrelevant. This could go either way; either the automated comment spammers die off as more and more systems have protections that are too aggressive for them to deal with, or the automated systems become increasingly browser-based and sidestep my major precaution because they no longer 'see' the honeypot field.

Fedora's DNF 5 and the curse of mandatory too-smart output

By: cks

DNF is Fedora's high(er) level package management system, which pretty much any system administrator is going to have to use to install and upgrade packages. Fedora 41 and later have switched from DNF 4 to DNF 5 as their normal (and probably almost mandatory) version of DNF. I ran into some problems with this switch, and since then I've found other issues, all of which boil down to a simple issue: DNF 5 insists on doing too-smart output.

Regardless of what you set your $TERM to and what else you do, if DNF 5 is connected to a terminal (and perhaps if it isn't), it will pretty-print its output in an assortment of ways. As far as I can tell it simply assumes ANSI cursor addressability, among other things, and will always fit its output to the width of your terminal window, truncating output as necessary. This includes output from RPM package scripts that are running as part of the update. Did one of them print a line longer than your current terminal width? Tough, it was probably truncated. Are you using script so that you can capture and review all of the output from DNF and RPM package scripts? Again, tough, you can't turn off the progress bars and other things that will make a complete mess of the typescript.

(It's possible that you can find the information you want in /var/log/dnf5.log in un-truncated and readable form, but if so it's buried in debug output and I'm not sure I trust dnf5.log in general.)

DNF 5 is far from the only offender these days. An increasing number of command line programs simply assume that they should always produce 'smart' output (ideally only if they're connected to a terminal). They have no command line option to turn this off and since they always use 'ANSI' escape sequences, they ignore the tradition of '$TERM' and especially 'TERM=dumb' to turn that off. Some of them can specifically disable colour output (typically with one of a number of environment variables, which may or may not be documented, and sometimes with a command line option), but that's usually the limits of their willingness to stop doing things. The idea of printing one whole line at a time as you do things and not printing progress bars, interleaving output, and so on has increasingly become a non-starter for modern command line tools.

(Another semi-offender is Debian's 'apt' and also 'apt-get' to some extent, although apt-get's progress bars can be turned off and 'apt' is explicitly a more user friendly front end to apt-get and friends.)

PS: I can't run DNF with its output directed into a file because it wants you to interact with it to approve things, and I don't feel like letting it run freely without that.

Thinking about what you'd want in a modern simple web server

By: cks

Over on the Fediverse, I said:

I'm currently thinking about what you'd want in a simple modern web server that made life easy for sites that weren't purely static. I think you want CGI, FastCGI, and HTTP reverse proxying, plus process supervision. Automatic HTTPS of course. Rate limiting support, and who knows what you'd want to make it easier to deal with the LLM crawler problem.

(This is where I imagine a 'stick a third party proxy in the middle' mode of operation.)

What I left out of my Fediverse post is that this would be aimed at small scale sites. Larger, more complex sites can and should invest in the power, performance, and so on of headline choices like Apache, Nginx, and so on. And yes, one obvious candidate in this area is Caddy, but at the same time something that has "more scalable" (than alternatives) as a headline features is not really targeting the same area as I'm thinking of.

This goal of simplicity of operation is why I put "process supervision" into the list of features. In a traditional reverse proxy situation (whether this is FastCGI or HTTP), you manage the reverse proxy process separately from the main webserver, but that requires more work from you. Putting process supervision into the web server has the goal of making all of that more transparent to you. Ideally, in common configurations you wouldn't even really care that there was a separate process handling FastCGI, PHP, or whatever; you could just put things into a directory or add some simple configuration to the web server and restart it, and everything would work. Ideally this would extend to automatically supporting PHP by just putting PHP files somewhere in the directory tree, just like CGI; internally the web server would start a FastCGI process to handle them or something.

(Possibly you'd implement CGI through a FastCGI gateway, but if so this would be more or less pre-configured into the web server and it'd ship with a FastCGI gateway for this (and for PHP).)

This is also the goal for making it easy to stick a third party filtering proxy in the middle of processing requests. Rather than having to explicitly set up two web servers (a frontend and a backend) with an anti-LLM filtering proxy in the middle, you would write some web server configuration bits and then your one web server would split itself into a frontend and a backend with the filtering proxy in the middle. There's no technical reason you can't do this, and even control what's run through the filtering proxy and what's served directly by the front end web server.

This simple web server should probably include support for HTTP Basic Authentication, so that you can easily create access restricted areas within your website. I'm not sure if it should include support for any other sort of authentication, but if it did it would probably be OpenID Connect (OIDC), since that would let you (and other people) authenticate through external identity providers.

It would be nice if the web server included some degree of support for more or less automatic smart in-memory (or on-disk) caching, so that if some popular site linked to your little server, things wouldn't explode (or these days, if a link to your site was shared on the Fediverse and all of the Fediverse servers that it propagated to immediately descended on your server). At the very least there should be enough rate limiting that your little server wouldn't fall over, and perhaps some degree of bandwidth limits you could set so that you wouldn't wake up to discover you had run over your outgoing bandwidth limits and were facing large charges.

I doubt anyone is going to write such a web server, since this isn't likely to be the kind of web server that sets the world on fire, and probably something like Caddy is more or less good enough.

(Doing a good job of writing such a server would also involve a fair amount of research to learn what people want to run at a small scale, how much they know, what sort of server resources they have or want to use, what server side languages they wind up using, what features they need, and so on. I certainly don't know enough about the small scale web today.)

PS: One reason I'm interested in this is that I'd sort of like such a server myself. These days I use Apache and I'm quite familiar with it, but at the same time I know it's a big beast and sometimes it has entirely too many configuration options and special settings and so on.

The five platforms we have to cover when planning systems

By: cks

Suppose, not entirely hypothetically, that you're going to need a 'VPN' system that authenticates through OIDC. What platforms do you need this VPN system to support? In our environment, the answer is that we have five platforms that we need to care about, and they're the obvious four plus one more: Windows, macOS, iOS, Android, and Linux.

We need to cover these five platforms because people here use our services from all of those platforms. Both Windows and macOS are popular on laptops (and desktops, which still linger around), and there's enough people who use Linux to be something we need to care about. On mobile devices (phones and tablets), obviously iOS and Android are the two big options, with people using either or both. We don't usually worry about the versions of Windows and macOS and suggest that people to stick to supported ones, but that may need to change with Windows 10.

Needing to support mobile devices unquestionably narrows our options for what we can use, at least in theory, because there are certain sorts of things you can semi-reasonably do on Linux, macOS, and Windows that are infeasible to do (at least for us) on mobile devices. But we have to support access to various of our services even on iOS and Android, which constrains us to certain sorts of solutions, and ideally ones that can deal with network interruptions (which are quite common on mobile devices in Toronto, as anyone who takes our subways is familiar with).

(And obviously it's easier for open source systems to support Linux, macOS, and Windows than it is for them to extend this support to Android and especially iOS. This extends to us patching and rebuilding them for local needs; with various modern languages, we can produce Windows or macOS binaries from modified open source projects. Not so much for mobile devices.)

In an ideal world it would be easy to find out the support matrix of platforms (and features) for any given project. In this world, the information can sometimes be obscure, especially for what features are supported on what platforms. One of my resolutions to myself is that when I find interesting projects but they seem to have platform limitations, I should note down where in their documentation they discuss this, so I can find it later to see if things have changed (or to discuss with people why certain projects might be troublesome).

Python, type hints, and feeling like they create a different language

By: cks

At this point I've only written a few, relatively small programs with type hints. At times when doing this, I've wound up feeling that I was writing programs in a language that wasn't quite exactly Python (but obviously was closely related to it). What was idiomatic in one language was non-idiomatic in the other, and I wanted to write code differently. This feeling of difference is one reason I've kept going back and forth over whether I should use type hints (well, in personal programs).

Looking back, I suspect that this is partly a product of a style where I tried to use typing.NewType a lot. As I found out, this may not really be what I want to do. Using type aliases (or just structural descriptions of the types) seems like it's going to be easier, since it's mostly just a matter of marking up things. I also suspect that this feeling that typed Python is a somewhat different language from plain Python is a product of my lack of experience with typed Python (which I can fix by doing more with types in my own code, perhaps revising existing programs to add type annotations).

However, I suspect some of this feeling of difference is that you (I) want to structure 'typed' Python code differently than untyped code. In untyped Python, duck typing is fine, including things like returning None or some meaningful type, and you can to a certain extent pass things around without caring what type they are. In this sort of situation, typed Python has pushed me toward narrowing the types involved in my code (although typing.Optional can help here). Sometimes this is a good thing; at other times, I wind up using '0.0' to mean 'this float value is not set' when in untyped Python I would use 'None' (because propagating the type difference of the second way through the code is too annoying). Or to put it another way, typed Python feels less casual, and there are good and bad aspects to this.

Unfortunately, one significant source of Python code that I work on is effectively off limits for type hints, and that's the Python code I write for work. For that code, I need to stick to the subset of Python that my co-workers know and can readily understand, and that subset doesn't include Python's type hints. I could try to teach my co-workers about type hints, but my view is that if I'm wrestling with whether it's worth it, my co-workers will be even less receptive to the idea of trying to learn and remember them (especially when they look at my Python code only infrequently). If we were constantly working with medium to large Python programs where type hints were valuable for documenting things and avoiding irritating errors it would be one thing, but as it is our programs are small and we can go months between touching any Python code. I care about Python type hints and have active exposure to them, and even I have to refresh my memory on them from time to time.

(Perhaps some day type hints will be pervasive enough in third party Python code and code examples that my co-workers will absorb and remember them through osmosis, but that day isn't today.)

The lack of a good command line way to sort IPv6 addresses

By: cks

A few years ago, I wrote about how 'sort -V' can sort IPv4 addresses into their natural order for you. Even back then I was smart enough to put in that 'IPv4' qualification and note that this didn't work with IPv6 addresses, and said that I didn't know of any way to handle IPv6 addresses with existing command line tools. As far as I know, that remains the case today, although you can probably build a Perl, Python, or other language program that does such sorting for you if you need to do this regularly.

Unix tools like 'sort' are pretty flexible, so you might innocently wonder why it can't be coerced into sorting IPv6 addresses. The first problem is that IPv6 addresses are written in hex without leading 0s, not decimal. Conventional sort will correctly sort hex numbers if all of the numbers are the same length, but IPv6 addresses are written in hex groups that conventionally drop leading zeros, so you will have 'ff' instead of '00ff' in common output (or '0' instead of '0000'). The second and bigger problem is the IPv6 '::' notation, which stands for the longest run of all-zero fields, ie some number of '0000' fields.

(I'm ignoring IPv6 scopes and zones for this, let's assume we have public IPv6 addresses.)

If IPv6 addresses were written out in full, with leading 0s on fields and all their 0000 fields, you could handle them as a simple conventional sort (you wouldn't even need to tell sort that the field separator was ':'). Unfortunately they almost never are, so you need to either transform them to that form, print them out, sort the output, and perhaps transform them back, or read them into a program as 128-bit numbers, sort the numbers, and print them back out as IPv6 addresses. Ideally your language of choice for this has a way to sort a collection of IPv6 addresses.

The very determined can probably do this with awk with enough work (people have done amazing things in awk). But my feeling is that doing this in conventional Unix command line tools is a Turing tarpit; you might as well use a language where there's a type of IPv6 addresses that exposes the functionality that you need.

(And because IPv6 addresses are so complex, I suspect that GNU Sort will never support them directly. If you need GNU Sort to deal with them, the best option is a program that turns them into their full form.)

PS: People have probably written programs to sort IPv6 addresses, but with the state of the Internet today, the challenge is finding them.

It's not obvious how to verify TLS client certificates issued for domains

By: cks

TLS server certificate verification has two parts; you first verify that the TLS certificate is valid, CA-signed certificate, and then you verify that the TLS certificate is for the host you're connecting to. One of the practical issues with TLS 'Client Authentication' certificates for host and domain names (which are on the way out) is that there's no standard meaning for how you do the second part of this verification, and if you even should. In particular, what host name are you validating the TLS client certificate against?

Some existing protocols provide the 'client host name' to the server; for example, SMTP has the EHLO command. However, existing protocols tend not to have explicitly standardized using this name (or any specific approach) for verifying a TLS client certificate if one is presented to the server, and large mail providers vary in what they send as a TLS client certificate in SMTP conversations. For example, Google's use of 'smtp.gmail.com' doesn't match any of the other names available, so its only meaning is 'this connection comes from a machine that has access to private keys for a TLS certificate for smtp.gmail.com', which hopefully means that it belongs to GMail and is supposed to be used for this purpose.

If there is no validation of the TLS client certificate host name, that is all that a validly signed TLS client certificate means; the connecting host has access to the private keys and so can be presumed to be 'part of' that domain or host. This isn't nothing, but it doesn't authenticate what exactly the client host is. If you want to validate the host name, you have to decide what to validate against and there are multiple answers. If you design the protocol you can have the protocol send a client host name and then validate the TLS certificate against the hostname; this is slightly better than using the TLS certificate's hostname as is in the rest of your processing, since the TLS certificate might have a wildcard host name. Otherwise, you might validate the TLS certificate host name against its reverse DNS, which is more complicated than you might expect and which will fail if DNS isn't working. If the TLS client certificate doesn't have a wildcard, you could also try to look up the IP addresses associated with the host names in the TLS certificate and see if any of the IP addresses match, but again you're depending on DNS.

(You can require non-wildcard TLS certificate names in your protocol, but people may not like it for various reasons.)

This dependency on DNS for TLS client certificates is different from the DNS dependency for TLS server certificates. If DNS doesn't work for the server case, you're not connecting at all since you have no target IPs; if you can connect, you have a target hostname to validate against (in the straightforward case of using a hostname instead of an IP address). In the TLS client certificate case, the client can connect but then the TLS server may deny it access for apparently arbitrary reasons.

That your protocol has to specifically decide what verifying TLS client certificates means (and there are multiple possible answers) is, I suspect, one reason that TLS client certificates aren't used more in general Internet protocols. In turn this is a disincentive for servers implementing TLS-based protocols (including SMTP) from telling TLS clients that they can send a TLS client certificate, since it's not clear what you should do with it if one is sent.

Let's Encrypt drops "Client Authentication" from its TLS certificates

By: cks

The TLS news of the time interval is that Let's Encrypt certificates will no longer be usable to authenticate your client to a TLS server (via a number of people on the Fediverse). This is driven by a change in Chrome's "Root Program", covered in section 3.2, with a further discussion of this in Chrome's charmingly named Moving Forward, Together in the "Understanding dedicated hierarchies" section; apparently only half of the current root Certificate Authorities actually issue TLS server certificates. As far as I know this is not yet a CA/Browser Forum requirement, so this is all driven by Chrome.

In TLS client authentication, a TLS client (the thing connecting to a TLS server) can present its own TLS certificate to the TLS server, just as the TLS server presents its certificate to the client. The server can then authenticate the client certificate however it wants to, although how to do this is not as clear as when you're authenticating a TLS server's certificate. To enable this usage, a TLS certificate and the entire certificate chain must be marked as 'you can use these TLS certificates for client authentication' (and similarly, a TLS certificate that will be used to authenticate a server to clients must be marked as such). That marking is what Let's Encrypt is removing.

This doesn't affect public web PKI, which basically never used conventional CA-issued host and domain TLS certificates as TLS client certificates (websites that used TLS client certificates used other sorts of TLS certificates). It does potentially affect some non-web public TLS, where domain TLS certificates have seen small usage in adding more authentication to SMTP connections between mail systems. I run some spam trap SMTP servers that advertise that sending mail systems can include a TLS client certificate if the sender wants to, and some senders (including GMail and Outlook) do send proper public TLS certificates (and somewhat more SMTP senders include bad TLS certificates). Most mail servers don't, though, and given that one of the best sources of free TLS certificates has just dropped support for this usage, that's unlikely to change. Let's Encrypt's TLS certificates can still be used by your SMTP server for receiving email, but you'll no longer be able to use them for sending it.

On the one hand, I don't think this is going to have material effects on much public Internet traffic and TLS usage. On the other hand, it does cut off some possibilities in non-web public TLS, at least until someone starts up a free, ACME-enabled Certificate Authority that will issue TLS client certificates. And probably some number of mail servers will keep sending their TLS certificates to people as client certificates even though they're no longer valid for that purpose.

PS: If you're building your own system and you want to, there's nothing stopping you from accepting public TLS server certificates from TLS clients (although you'll have to tell your TLS library to validate them as TLS server certificates, not client certificates, since they won't be marked as valid for TLS client usage). Doing the security analysis is up to you but I don't think it's a fatally flawed idea.

Classical "Single user computers" were a flawed or at least limited idea

By: cks

Every so often people yearn for a lost (1980s or so) era of 'single user computers', whether these are simple personal computers or high end things like Lisp machines and Smalltalk workstations. It's my view that the whole idea of a 1980s style "single user computer" is not what we actually want and has some significant flaws in practice.

The platonic image of a single user computer in this style was one where everything about the computer (or at least its software) was open to your inspection and modification, from the very lowest level of the 'operating system' (which was more of a runtime environment than an OS as such) to the highest things you interacted with (both Lisp machines and Smalltalk environments often touted this as a significant attraction, and it's often repeated in stories about them). In personal computers this was a simple machine that you had full control over from system boot onward.

The problem is that this unitary, open environment is (or was) complex and often lacked resilience. Famously, in the case of early personal computers, you could crash the entire system with programming mistakes, and if there's one thing people do all the time, it's make mistakes. Most personal computers mitigated this by only doing one thing at once, but even then it was unpleasant, and the Amiga would let you blow multiple processes up at once if you could fit them all into RAM. Even on better protected systems, like Lisp and Smalltalk, you still had the complexity and connectedness of a unitary environment.

One of the things that we've learned from computing over the past N decades is that separation, isolation, and abstraction are good ideas. People can only keep track of so many things in their heads at once, and modularity (in the broad sense) is one large way we keep things within that limit (or at least closer to it). Single user computers were quite personal but usually not very modular. There are reasons that people moved to computers with things like memory protection, multiple processes, and various sorts of privilege separation.

(Let us not forget the great power of just having things in separate objects, where you can move around or manipulate or revert just one object instead of 'your entire world'.)

I think that there is a role for computers that are unapologetically designed to be used by only a single person who is in full control of everything and able to change it if they want to. But I don't think any of the classical "single user computer" designs are how we want to realize a modern version of the idea.

(As a practical matter I think that a usable modern computer system has to be beyond the understanding of any single person. There is just too much complexity involved in anything except very restricted computing, even if you start from complete scratch. This implies that an 'understandable' system really needs strong boundaries between its modules so that you can focus on the bits that are of interest to you without having to learn lots of things about the rest of the system or risk changing things you don't intend to.)

Two broad approaches to having Multi-Factor Authentication everywhere

By: cks

In this modern age, more and more people are facing more and more pressure to have pervasive Multi-Factor Authentication, with every authentication your people perform protected by MFA in some way. I've come to feel that there are two broad approaches to achieving this and one of them is more realistic than the other, although it's also less appealing in some ways and less neat (and arguably less secure).

The 'proper' way to protect everything with MFA is to separately and individually add MFA to everything you have that does authentication. Ideally you will have a central 'single sign on' system, perhaps using OIDC, and certainly your people will want you to have only one form of MFA even if it's not all run through your SSO. What this implies is that you need to add MFA to every service and protocol you have, which ranges from generally easy (websites) through being annoying to people or requiring odd things (SSH) to almost impossible at the moment (IMAP, authenticated SMTP, and POP3). If you opt to set it up with no exemptions for internal access, this approach to MFA insures that absolutely everything is MFA protected without any holes through which an un-MFA'd authentication can be done.

The other way is to create some form of MFA-protected network access (a VPN, a mesh network, a MFA-authenticated SSH jumphost, there are many options) and then restrict all non-MFA access to coming through this MFA-protected network access. For services where it's easy enough, you might support additional MFA authenticated access from outside your special network. For other services where MFA isn't easy or isn't feasible, they're only accessible from the MFA-protected environment and a necessary step for getting access to them is to bring up your MFA-protected connection. This approach to MFA has the obvious problem that if someone gets access to your MFA-protected network, they have non-MFA access to everything else, and the not as obvious problem that attackers might be able to MFA as one person to the network access and then do non-MFA authentication as another person on your systems and services.

The proper way is quite appealing to system administrators. It gives us an array of interesting challenges to solve, neat technology to poke at, and appealingly strong security guarantees. Unfortunately the proper way has two downsides; there's essentially no chance of it covering your IMAP and authenticated SMTP services any time soon (unless you're willing to accept some significant restrictions), and it requires your people to learn and use a bewildering variety of special purpose, one-off interfaces and sometimes software (and when it needs software, there may be restrictions on what platforms the software is readily available on). Although it's less neat and less nominally secure, the practical advantage of the MFA protected network access approach is that it's universal and it's one single thing for people to deal with (and by extension, as long as the network system itself covers all platforms you care about, your services are fully accessible from all platforms).

(In practice the MFA protected network approach will probably be two things for people to deal with, not one, since if you have websites the natural way to protect them is with OIDC (or if you have to, SAML) through your single sign on system. Hopefully your SSO system is also what's being used for the MFA network access, so people only have to sign on to it once a day or whatever.)

Using awk to check your script's configuration file

By: cks

Suppose, not hypothetically, that you have a shell script with a relatively simple configuration file format that people can still accidentally get wrong. You'd like to check the configuration file for problems before you use it in the rest of your script, for example by using it with 'join' (where things like the wrong number or type of fields will be a problem). Recently on the Fediverse I shared how I was doing this with awk, so here's a slightly more elaborate and filled out version:

errs=$(awk '
         $1 ~ "^#" { next }
         NF != 3 {
            printf " line %d: wrong number of fields\n", NR;
            next }
         [...]
         ' "$cfg_file"
       )

if [ -n "$errs" ]; then
   echo "$prog: Errors found in '$cfg_file'. Stopping." 1>&2
   echo "$errs" 1>&2
   exit 1
fi

(Here I've chosen to have awk's diagnostic messages indented by one space when the script prints them out, hence the space before 'line %d: ...'.)

The advantage of having awk simply print out the errors it detects and letting the script deal with them later is that you don't need to mess around with awk's exit status; your awk program can simply print what it finds and be done. Using awk for the syntax checks is handy because it lets you express a fair amount of logic and checks relatively simply (you can even check for duplicate entries and so on), and it also gives you line numbers for free.

One trick with using awk in this way is to progressively filter things in your checks (by skipping further processing of the current line with 'next'). We start out by skipping all comments, then we report and otherwise skip every line with the wrong number of fields, and then every check after this can assume that at least we have the right number of fields so it can confidently check what should be in each one. If the number of fields in a line is wrong there's no point in complaining about how one of them has the wrong sort of value, and the early check and 'next' to skip the rest of this line's processing is the simple way.

If you're also having awk process the configuration file later you might be tempted to have it check for errors at the same time, in an all-in-one awk program, but my view is that it's simpler to split the error checking from the processing. That way you don't have to worry about stopping the processing if you detect errors or intermingle processing logic with checking logic. You do have to make sure the two versions have the same handling of comments and so on, but in simple configuration file formats this is usually easy.

(Speaking from personal experience, you don't want to use '$1 == "#"' as your comment definition, because then you can't just stick a '#' in front of an existing configuration file line to comment it out. Instead you have to remember to make it '# ', and someday you'll forget.)

PS: If your awk program is big and complex enough, it might make more sense to use a here document to create a shell variable containing it, which will let you avoid certain sorts of annoying quoting problems.

Our need for re-provisioning support in mesh networks (and elsewhere)

By: cks

In a comment on my entry on how WireGuard mesh networks need a provisioning system, vcarceler pointed me to Innernet (also), an interesting but opinionated provisioning system for WireGuard. However, two bits of it combined made me twitch a bit; Innernet only allows you to provision a given node once, and once a node is assigned an internal IP, that IP is never reused. This lack of support for re-provisioning machines would be a problem for us and we'd likely have to do something about it, one way or another. Nor is this an issue unique to Innernet, as a number of mesh network systems have it.

Our important servers have fixed, durable identities, and in practice these identities are both DNS names and IP addresses (we have some generic machines, but they aren't as important). We also regularly re-provision these servers, which is to say that we reinstall them from scratch, usually on new hardware. In the usual course of events this happens roughly every two years or every four years, depending on whether we're upgrading the machine for every Ubuntu LTS release or every other one. Over time this is a lot of re-provisionings, and we need the re-provisioned servers to keep their 'identity' when this happens.

We especially need to be able to rebuild a dead server as an identical replacement if its hardware completely breaks and eats its system disks. We're already in a crisis, we don't want to have a worse crisis because other things need to be updated because we can't exactly replace the server but instead have to build a new server that fills the same role, or will once DNS is updated, configurations are updated, etc etc.

This is relatively straightforward for regular Linux servers with regular networking; there's the issue of SSH host keys, but there's several solutions. But obviously there's a problem if the server is also a mesh network node and the mesh network system will not let it be re-provisioned under the same name or the same internal IP address. Accepting this limitation would make it difficult to use the mesh network for some things, especially things where we don't want to depend on DNS working (for example, sending system logs via syslog). Working around the limitation requires reverse engineering where the mesh network system stores local state and hopefully being able to save a copy elsewhere and restore it; among other things, this has implications for the mesh network system's security model.

For us, it would be better if mesh networking systems explicitly allowed this re-provisioning. They could make it a non-default setting that took explicit manual action on the part of the network administrator (and possibly required nodes to cooperate and extend more trust than normal to the central provisioning system). Or a system like Innernet could have a separate class of IP addresses, call them 'service addresses', that could be assigned and reassigned to nodes by administrators. A node would always have its unique identity but could also be assigned one or more service addresses.

(Of course our other option is to not use a mesh network system that imposes this restriction, even if it would otherwise make our lives easier. Unless we really need the system for some other reason or its local state management is explicitly documented, this is our more likely choice.)

PS: The other problem with permanently 'consuming' IP addresses as machines are re-provisioned is that you run out of them sooner or later unless you use gigantic network blocks that are many times larger than the number of servers you'll ever have (well, in IPv4, but we're not going to switch to IPv6 just to enable a mesh network provisioning system).

How and why typical (SaaS) pricing is too high for university departments

By: cks

One thing I've seen repeatedly is that companies that sell SaaS or SaaS like things and offer educational pricing (because they want to sell to universities too) are setting (initial) educational pricing that is in practice much too high. Today I'm going to work through a schematic example to explain what I mean. All of this is based on how it works in Canadian and I believe US universities; other university systems may be somewhat different.

Let's suppose that you're a SaaS vendor and like many vendors, you price your product at $X per person per month; I'll pick $5 (US, because most of the time the prices are in USD). Since you want to sell to universities and other educational institutions and you understand they don't have as much money to spend as regular companies, you offer a generous academic discount; they pay only $3 USD per person per month.

(If these numbers seem low, I'm deliberately stacking the deck in the favour of the SaaS company. Things get worse for your pricing as the numbers go up.)

The research and graduate student side of a large but not enormous university department is considering your software. They have 100 professors 'in' the department, 50 technical and administrative support staff (this is a low ratio), and professors have an average of 10 graduate students, research assistants, postdocs, outside collaborators, undergraduate students helping out with research projects, and so on around them, for a total of 1,000 additional people 'in' the department who will also have to be covered. These 1,150 people will cost the department $3,450 USD a month for your software, a total of $41,400 USD a year, which is a significant saving over what a commercial company would pay for the same number of people.

Unfortunately, unless your software is extremely compelling or absolutely necessary, this cost is likely to be a very tough sell. In many departments, that's enough money to fund (or mostly fund) an additional low-level staff position, and it's certainly enough money to hire more TAs, supplement more graduate student stipends (these are often the same thing, since hiring graduate students as TAs is one of the ways that you support them), or pay for summer projects, all of which are likely to be more useful and meaningful to the department than a year of your service. It's also more than enough money to cause people in the department to ask awkward questions like 'how much technical staff time will it take to put together an inferior but functional enough alternate to this', which may well not be $41,000 worth of time (especially not every year).

(Of course putting together a complete equivalent of your SaaS will cost much more than that, since you have multiple full time programmers working on it and you've invested years in your software at this point. But university departments are already used to not having nice things, and staff time is often considered almost free.)

If you decide to make your pricing nicer by only charging based on the actual number of people who wind up using your stuff, unfortunately you've probably made the situation worse for the university department. One thing that's worse than a large predictable bill is an uncertain but possibly large bill; the department will have to reserve and allocate the money in its budget to cover the full cost, and then figure out what to do with the unused budget at the end of the year (or the end of every month, or whatever). Among other things, this may lead to awkward conversations with higher powers about how the department's initial budget and actual spending don't necessarily match up.

As we can see from the numbers, one big part of the issue is those 1,000 non-professor, non-staff people. These people aren't really "employees" the way they would be in a conventional organization (and mostly don't think of themselves as employees), and the university isn't set up to support their work and spend money on them in the way it is for the people it considers actual employees. The university cares if a staff member or a professor can't get their work done, and having them work faster or better is potentially valuable to the university. This is mostly not true for graduate students and many other additional people around a department (and almost entirely not true if the person is an outside collaborator, an undergraduate doing extra work to prepare for graduate studies elsewhere, and so on).

In practice, most of those 1,000 extra people will and must be supported on a shoestring basis (for everything, not just for your SaaS). The university as a whole and their department in particular will probably only pay a meaningful per-person price for them for things that are either absolutely necessary or extremely compelling. At the same time, often the software that the department is considering is something that those people should be using too, and they may need a substitute if the department can't afford the software for them. And once the department has the substitute, it becomes budgetarily tempting and perhaps politically better if everyone uses the substitute and the department doesn't get your software at all.

(It's probably okay to charge a very low price for such people, as opposed to just throwing them in for free, but it has to be low enough that the department or the university doesn't have to think hard about it. One way to look at it is that regardless of the numbers, the collective group of those extra people is 'less important' to provide services to than the technical staff, the administrative staff, and the professors, and the costs probably should work out accordingly. Certainly the collective group of extra people isn't more important than the other groups, despite having a lot more people in it.)

Incidentally, all of this applies just as much (if not more so) when the 'vendor' is the university's central organizations and they decide to charge (back) people within the university for something on a per-person basis. If this is truly cost recovery and accurately represents the actual costs to provide the service, then it's not going to be something that most graduate students get (unless the university opts to explicitly subsidize it for them).

PS: All of this is much worse if undergraduate students need to be covered too, because there are even more of them. But often the department or the university can get away with not covering them, partly because their interactions with the university are often much narrower than those of graduate students.

Using WireGuard seriously as a mesh network needs a provisioning system

By: cks

One thing that my recent experience expanding our WireGuard mesh network has driven home to me is how (and why) WireGuard needs a provisioning system, especially if you're using it as a mesh networking system. In fact I think that if you use a mesh WireGuard setup at any real scale, you're going to wind up either adopting or building such a provisioning system.

In a 'VPN' WireGuard setup with a bunch of clients and one or a small number of gateway servers, adding a new client is mostly a matter of generating and giving it some critical information. However, it's possible to more or less automate this and make it relatively easy for people who want to connect to you to do this. You'll still need to update your WireGuard VPN server too, but at least you only have one of them (probably), and it may well be the host where you generate the client configuration and provide it to the client's owner.

The extra problem with adding a new client to a WireGuard mesh network is that there's many more WireGuard nodes that need to be updated (and also the new client needs a lot more information; it needs to know about all of the other nodes it's supposed to talk to). More broadly, every time you change the mesh network configuration, every node needs to update with the new information. If you add a client, remove a client, a client changes its keys for some reason (perhaps it had to be re-provisioned because the hardware died), all of these means nodes need updates (or at least the nodes that talk to the changed node). In the VPN model, only the VPN server node (and the new client) needed updates.

Our little WireGuard mesh is operating at a small scale, so we can afford to do this by hand. As you have more WireGuard nodes and more changes in nodes, you're not going to want to manually update things one by one, any more than you want to do that for other system administration work. Thus, you're going to want some sort of a provisioning system, where at a minimum you can say 'this is a new node' or 'this node has been removed' and all of your WireGuard configurations are regenerated, propagated to WireGuard nodes, trigger WireGuard configuration reloads, and so on. Some amount of this can be relatively generic in your configuration management system, but not all of it.

(Many configuration systems can propagate client-specific files to clients on changes and then trigger client side actions when the files are updated. But you have to build the per-client WireGuard configuration.)

PS: I haven't looked into systems that will do this for you, either as pure WireGuard provisioning systems or as bigger 'mesh networking using WireGuard' software, so I don't have any opinions on how you want to handle this. I don't even know if people have built and published things that are just WireGuard provisioning systems, or if everything out there is a 'mesh networking based on WireGuard' complex system.

Some notes on using 'join' to supplement one file with data from another

By: cks

Recently I said something vaguely grumpy about the venerable Unix 'join' tool. As the POSIX specification page for join will unhelpfully tell you, join is a 'relational database operator', which means that it implements the rough equivalent of SQL joins. One way to use join is to add additional information for some lines in your input data.

Suppose, not entirely hypothetically, that we have an input file (or data stream) that starts with a login name and contains some additional information, and that for some logins (but not all of them) we have useful additional data about them in another file. Using join, the simple case of this is easy, if the 'master' and 'suppl' files are already sorted:

join -1 1 -2 1 -a 1 master suppl

(I'm sticking to POSIX syntax here. Some versions of join accept '-j 1' as an alternative to '-1 1 -2 1'.)

Our specific options tell join to join each line of 'master' and 'suppl' on the first field in each (the login) and print them, and also print all of the lines from 'master' that didn't have a login in 'suppl' (that's the '-a 1' argument). For lines with matching logins, we get all of the fields from 'master' and then all of the extra fields from 'suppl'; for lines from 'master' that don't match, we just get the fields from 'master'. Generally you'll tell apart which lines got supplemented and which ones didn't by how many fields they have.

If we want something other than all of the fields in the order that they are in the existing data source, in theory we have the '-o <list>' option to tell join what fields from each source to output. However, this option has a little problem, which I will show you by quoting the important bit from the POSIX standard (emphasis mine):

The fields specified by list shall be written for all selected output lines. Fields selected by list that do not appear in the input shall be treated as empty output fields.

What that means is that if we're also printing non-joined lines from our 'master' file, our '-o' still applies and any fields we specified from 'suppl' will be blank and empty (unless you use '-e'). This can be inconvenient if you were re-ordering fields so that, for example, a field from 'suppl' was listed before some fields from 'master'. It also means that you want to use '1.1' to get the login from 'master', which is always going to be there, not '2.1', the login from 'suppl', which is only there some of the time.

(All of this assumes that your supplementary file is listed second and the master file first.)

On the other hand, using '-e' we can simplify life in some situations. Suppose that 'suppl' contains only one additional interesting piece of information, and it has a default value that you'll use if 'suppl' doesn't contain a line for the login. Then if 'master' has three fields and 'suppl' two, we can write:

join -1 1 -2 1 -a 1 -e "$DEFVALUE" -o '1.1,1.2,1.3,2.2' master suppl

Now we don't have to try to tell whether or not a line from 'master' was supplemented by counting how many fields it has; everything has the same number of fields, it's just sometimes the last (supplementary) field is the default value.

(This is harder to apply if you have multiple fields from the 'suppl' file, but possibly you can find a 'there is nothing here' value that works for the rest of your processing.)

In Apache, using OIDC instead of SAML makes for easier testing

By: cks

In my earlier installment, I wrote about my views on the common Apache modules for SAML and OIDC authentication, where I concluded that OpenIDC was generally easier to use than Mellon (for SAML). Recently I came up with another reason to prefer OIDC, one sufficiently strong enough that we converted one of our remaining Mellon uses over to OIDC. The advantage is that OIDC is easier to test if you're building a new version of your web server under another name.

Suppose that you're (re)building a version of your Apache based web server with authentication on, for example, a new version of Ubuntu, using a test server name. You want to test that everything still works before you deploy it, including your authentication. If you're using Mellon, as far as I can see you have to generate an entirely new SP configuration using your test server's name and then load it into your SAML IdP. You can't use your existing SAML SP configuration from your existing web server, because it specifies the exact URL the SAML IdP needs to use for various parts of the SAML protocol, and of course those URLs point to your production web server under its production name. As far as I know, to get another set of URLs that point to your test server, you need to set up an entirely new SP configuration.

OIDC has an equivalent thing in its redirect URI, but the OIDC redirect URL works somewhat differently. OIDC identity providers typically allow you to list multiple allowed redirect URIs for a given OIDC client, and it's the client that tells the server what redirect URI to use during authentication. So when you need to test your new server build under a different name, you don't need to register a new OIDC client; you can just add some more redirect URIs to your existing production OIDC client registration to allow your new test server to provide its own redirect URI. In the OpenIDC module, this will typically require no Apache configuration changes at all (from the production version), as the module automatically uses the current virtual host as the host for the redirect URI. This makes testing rather easier in practice, and it also generally tests the Apache OIDC configuration you'll use in production, instead of a changed version of it.

(You can put a hostname in the Apache OIDCRedirectURI directive, but it's simpler to not do so. Even if you did use a full URL in this, that's a single change in a text file.)

Chosing between "it works for now" and "it works in the long term"

By: cks

A comment on my entry about how Netplan can only have WireGuard peers in one file made me realize one of my implicit system administration views (it's the first one by Jon). That is the tradeoff between something that works now and something that not only works now but is likely to keep working in the long term. In system administration this is a tradeoff, not an obvious choice, because what you want is different depending on the circumstances.

Something that works now is, for example, something that works because of how Netplan's code is currently written, where you can hack around an issue by structuring your code, your configuration files, or your system in a particular way. As a system administrator I do a surprisingly large amount of these, for example to fix or work around issues in systemd units that people have written in less than ideal or simply mistaken ways.

Something that's going to keep working in the longer term is doing things 'correctly', which is to say in whatever way that the software wants you to do and supports. Sometimes this means doing things the hard way when the software doesn't actually implement some feature that would make your life better, even if you could work around it with something that works now but isn't necessarily guaranteed to keep working in the future.

When you need something to work and there's no other way to do it, you have to take a solution that (only) works now. Sometimes you take a 'works now' solution even if there's an alternative because you expect your works-now version to be good enough for the lifetime of this system, this OS release, or whatever; you'll revisit things for the next version (at least in theory, workarounds to get things going can last a surprisingly long time if they don't break anything). You can't always insist on a 'works now and in the future' solution.

On the other hand, sometimes you don't want to do a works-now thing even if you could. A works-now thing is in some sense technical debt, with all that that implies, and this particular situation isn't important enough to justify taking on such debt. You may solve the problem properly, or you may decide that the problem isn't big and important enough to solve at all and you'll leave things in their imperfect state. One of the things I think about when making this decision is how annoying it would be and how much would have to change if my works-now solution broke because of some update.

(Another is how ugly the works-now solution is, including how big of a note we're going to want to write for our future selves so we can understand what this peculiar load bearing thing is. The longer the note, the more I generally wind up questioning the decision.)

It can feel bad to not deal with a problem by taking a works-now solution. After all, it works, and otherwise you're stuck with the problem (or with less pleasant solutions). But sometimes it's the right option and the works-now solution is simply 'too clever'.

(I've undoubtedly made this decision many times over my career. But Jon's comment and my reply to it crystalized the distinction between a 'works now' and a 'works for the long term' solution in my mind in a way that I think I can sort of articulate.)

Netplan can only have WireGuard peers in one file

By: cks

We have started using WireGuard to build a small mesh network so that machines outside of our network can securely get at some services inside it (for example, to send syslog entries to our central syslog server). Since this is all on Ubuntu, we set it up through Netplan, which works but which I said 'has warts' in my first entry about it. Today I discovered another wart due to what I'll call the WireGuard provisioning problem:

Current status: provisioning WireGuard endpoints is exhausting, at least in Ubuntu 22.04 and 24.04 with netplan. So many netplan files to update. I wonder if Netplan will accept files that just define a single peer for a WG network, but I suspect not.

The core WireGuard provisioning problem is that when you add a new WireGuard peer, you have to tell all of the other peers about it (or at least all of the other peers you want to be able to talk to the new peer). When you're using iNetplan, it would be convenient if you could put each peer in a separate file in /etc/netplan; then when you add a new peer, you just propagate the new Netplan file for the peer to everything (and do the special Netplan dance required to update peers).

(Apparently I should now call it 'Canonical Netplan', as that's what its front page calls it. At least that makes it clear exactly who is responsible for Netplan's state and how it's not going to be widely used.)

Unfortunately this doesn't work, and it doesn't work in a dangerous way, which is that Netplan only notices one set of WireGuard peers in one netplan file (at least on servers, using systemd-networkd as the backend). If you put each peer in its own file, only the first peer is picked up. If you define some peers in the file where you define your WireGuard private key, local address, and so on, and some peers in another file, only peers from whichever is first will be used (even if the first file only defines peers, which isn't enough to bring up a WireGuard device by itself). As far as I can see, Netplan doesn't report any errors or warnings to the system logs on boot about this situation; instead, you silently get incomplete WireGuard configurations.

This is visibly and clearly a Netplan issue, because on servers you can inspect the systemd-networkd files written by Netplan (in /run/systemd/network). When I do this, the WireGuard .netdev file has only the peers from one file defined in it (and the .netdev file matches the state of the WireGuard interface). This is especially striking when the netplan file with the private key and listening port (and some peers) is second; since the .netdev file contains the private key and so on, Netplan is clearly merging data from more than one netplan file, not completely ignoring everything except the first one. It's just ignoring any peers encountered after the first set of them.

My overall conclusion is that in Netplan, you need to put all configuration for a given WireGuard interface into a single file, however tempting it might be to try splitting it up (for example, to put core WireGuard configuration stuff in one file and then list all peers in another one).

I don't know if this is an already filed Netplan bug and I don't plan on bothering to file one for it, partly because I don't expect Canonical to fix Netplan issues any more than I expect them to fix anything else and partly for other reasons.

PS: I'm aware that we could build a system to generate the Netplan WireGuard file, or maybe find a YAML manipulating program that could insert and delete blocks that matched some criteria. I'm not interested in building yet another bespoke custom system to deal with what is (for us) a minor problem, since we don't expect to be constantly deploying or removing WireGuard peers.

I moved my local Firefox changes between Git trees the easy way

By: cks

Firefox recently officially switched to Git, in a completely different Git tree than their old mirror. This presented me a little bit of a problem because I have a collection of local changes I make to my own Firefox builds, which I carry as constantly-rebased commits on top of the upstream Firefox tree. The change in upstream trees meant that I was going to have to move my commits to the new tree. When I wrote my first entry I thought I might try to do this in some clever way similar to rebasing my own changes on top of something that was rebased, but in the end I decided to do it the simple and brute force way that I was confident would either work or would leave me in a situation I could back out from easily.

This simple and brute force way was to get both my old tree and my new 'firefox' tree up to date, then export my changes with 'git format-patch' from the old tree and import them into the new tree with 'git am'. There were a few irritations along the way, of course. First I (re)discovered that 'git am' can't directly consume the directory of patches you create with 'git format-patch'. Git-am will consume a Maildir of patches, but git-format-patch will only give you a directory full of files with names like '00NN-<author>-<title>.patch', which is not a proper Maildir. The solution is to cat all of the .patch files together in order to some other file, which is now a mailbox that git-am will handle. The other minor thing is that git-am unsurprisingly has no 'dry-run' option (which would probably be hard to implement). Of course in my situation, I can always reset 'main' back to 'origin/main', which was one reason I was willing to try this.

(Looking at the 'git format-patch' manual page suggests that what I might have wanted was the '--stdout' option, which would have automatically created the mbox format version for me. On the other hand it was sort of nice to be able to look at the list of patches and see that they were exactly what I expected.)

On the one hand, moving my changes in this brute force way (and to a completely separate new tree) feels like giving in to my unfamiliarity with git. There are probably clever git ways to do this move in a single tree without having to turn everything into patches and then apply them (even if most of that is automated). On the other hand, this got the job done with minimal hassles and time consumed, and sometimes I need to put a stop to my programmer's urge to be clever.

LLMs ('AI') are coming for our jobs whether or not they work

By: cks

Over on the Fediverse, I said something about this:

Hot take: I don't really know what vibe coding is but I can confidently predict that it's 'coming for', if not your job, then definitely the jobs of the people who work in internal development at medium to large non-tech companies. I can predict this because management at such companies has *always* wanted to get rid of programmers, and has consistently seized on every excuse presented by the industry to do so. COBOL, report generators, rule based systems, etc etc etc at length.

(The story I heard is that at one point COBOL's English language basis was at least said to enable non-programmers to understand COBOL programs and maybe even write them, and this was seen as a feature by organizations adopting it.)

The current LLM craze is also coming for the jobs of system administrators for the same reason; we're overhead, just like internal development at (most) non-tech companies. In most non-tech organizations, both internal development and system administration is something similar to janitorial services; you have to have it because otherwise your organization falls over, but you don't like it and you're happy to spend as little on it as possible. And, unfortunately, we have a long history in technology that shows the long term results don't matter for the people making short term decisions about how many people to hire and who.

(Are they eating their seed corn? Well, they probably don't think it matters to them, and anyway that's a collective problem, which 'the market' is generally bad at solving.)

As I sort of suggested by using 'excuse' in my Fediverse post, it doesn't really matter if LLMs truly work, especially if they work over the long run. All they need to do in order to get senior management enthused about 'cutting costs' is appear to work well enough over the short term, and appearing to work is not necessarily a matter of substance. In sort of a flipside of how part of computer security is convincing people, sometimes it's enough to simply convince (senior) people and not have obvious failures.

(I have other thoughts about the LLM craze and 'vibe coding', as I understand it, but they don't fit within the margins of this entry.)

PS: I know it's picky of me to call this an 'LLM craze' instead of an 'AI craze', but I feel I have to both as someone who works in a computer science department that does all sorts of AI research beyond LLMs and as someone who was around for a much, much earlier 'AI' craze (that wasn't all of AI either, cf).

These days, Linux audio seems to just work (at least for me)

By: cks

For a long time, the common perception was that 'Linux audio' was the punchline for a not particularly funny joke. I sort of shared that belief; although audio had basically worked for me for a long time, I had a simple configuration and dreaded having to make more complex audio work in my unusual desktop environment. But these days, audio seems to just work for me, even in systems that have somewhat complex audio options.

On my office desktop, I've wound up with three potential audio outputs and two audio inputs: the motherboard's standard sound system, a USB headset with a microphone that I use for online meetings, the microphone on my USB webcam, and (to my surprise) a HDMI audio output because my LCD displays do in fact have tiny little speakers built in. In PulseAudio (or whatever is emulating it today), I have the program I use for online meetings set to use the USB headset and everything else plays sound through the motherboard's sound system (which I have basic desktop speakers plugged into). All of this works sufficiently seamlessly that I don't think about it, although I do keep a script around to reset the default audio destination.

On my home desktop, for a long time I had a simple single-output audio system that played through the motherboard's sound system (plus a microphone on a USB webcam that was mostly not connected). Recently I got an outboard USB DAC and, contrary to my fears, it basically plugged in and just worked. It was easy to set the USB DAC as the default output in pavucontrol and all of the settings related to it stick around even when I put it to sleep overnight and it drops off the USB bus. I was quite pleased by how painless the USB DAC was to get working, since I'd been expecting much more hassles.

(Normally I wouldn't bother meticulously switching the USB DAC to standby mode when I'm not using it for an extended time, but I noticed that the case is clearly cooler when it rests in standby mode.)

This is still a relatively simple audio configuration because it's basically static. I can imagine more complex ones, where you have audio outputs that aren't always present and that you want some programs (or more generally audio sources) to use when they are present, perhaps even with priorities. I don't know if the Linux audio systems that Linux distributions are using these days could cope with that, or if they did would give you any easy way to configure it.

(I'm aware that PulseAudio and so on can be fearsomely complex under the hood. As far as the current actual audio system goes, I believe that what my Fedora 41 machines are using for audio is PipeWire (also) with WirePlumber, based on what processes seem to be running. I think this is the current Fedora 41 audio configuration in general, but I'm not sure.)

The HTTP status codes of responses from about 22 hours of traffic to here (part 2)

By: cks

A few months ago, I wrote an entry about this topic, because I'd started putting in some blocks against crawlers, including things that claimed to be old versions of browsers, and I'd also started rate-limiting syndication feed fetching. Unfortunately, my rules at the time were flawed, rejecting a lot of people that I actually wanted to accept. So here are some revised numbers from today, a day when my logs suggest that I've seen what I'd call broadly typical traffic and traffic levels.

I'll start with the overall numbers (for HTTP status codes) for all requests:

  10592 403		[26.6%]
   9872 304		[24.8%]
   9388 429		[23.6%]
   8037 200		[20.2%]
   1629 302		[ 4.1%]
    114 301
     47 404
      2 400
      2 206

This is a much more balanced picture of activity than the last time around, with a lot less of the overall traffic being HTTP 403s. The HTTP 403s are from aggressive blocks, the HTTP 304s and HTTP 429s are mostly from syndication feed fetchers, and the HTTP 302s are mostly from things with various flaws that I redirect to informative static pages instead of giving HTTP 403s. The two HTTP 206s were from Facebook's 'externalhit' agent on a recent entry. A disturbing amount of the HTTP 403s were from Bing's crawler and almost 500 of them were from something claiming to be an Akkoma Fediverse server. 8.5% of the HTTP 403s were from something using Go's default User-Agent string.

The most popular User-Agent strings today for successful requests (of anything) were for versions of NetNewsWire, FreshRSS, and Miniflux, then Googlebot and Applebot, and then Chrome 130 on 'Windows NT 10'. Although I haven't checked, I assume that all of the first three were for syndication feeds specifically, with few or no fetches of other things. Meanwhile, Googlebot and Applebot can only fetch regular pages; they're blocked from syndication feeds.

The picture for syndication feeds looks like this:

   9923 304		[42%]
   9535 429		[40%]
   1984 403		[ 8.5%]
   1600 200		[ 6.8%]
    301 302
     34 301
      1 404

On the one hand it's nice that 42% of syndication feed fetches successfully did a conditional GET. On the other hand, it's not nice that 40% of them got rate-limited, or that there were clearly more explicitly blocked requests that there were HTTP 200 responses. On the sort of good side, 37% of the blocked feed fetches were from one IP that's using "Go-http-client/1.1" as its User-Agent (and which accounts for 80% of the blocks of that). This time around, about 58% of the requests were for my syndication feed, which is better than it was before but still not great.

These days, if certain problems are detected in a request I redirect the request to a static page about the problem. This gives me some indication of how often these issues are detected, although crawlers may be re-visiting the pages on their own (I can't tell). Today's breakdown of this is roughly:

   78%  too-old browser
   13%  too generic a User-Agent
    9%  unexpectedly using HTTP/1.0

There were slightly more HTTP 302 responses from requests to here than there were requests for these static pages, so I suspect that not everything that gets these redirects follows them (or at least doesn't bother re-fetching the static page).

I hope that the better balance in HTTP status codes here is a sign that I have my blocks in a better state than I did a couple of months ago. It would be even better if the bad crawlers would go away, but there's little sign of that happening any time soon.

The complexity of mixing mesh networking and routes to subnets

By: cks

One of the in things these days is encrypted (overlay) mesh networks, where you have a bunch of nodes and the nodes have encrypted connections to each other that they use for (at least) internal IP traffic. WireGuard is one of the things that can be used for this. A popular thing to add to such mesh network solutions is 'subnet routes', where nodes will act as gateways to specific subnets, not just endpoints in themselves. This way, if you have an internal network of servers at your cloud provider, you can establish a single node on your mesh network and route to the internal network through that node, rather than having to enroll every machine in the internal network.

(There are various reasons not to enroll every machine, including that on some of them it would be a security or stability risk.)

In simple configurations this is easy to reason about and easy to set up through the tools that these systems tend to give you. Unfortunately, our network configuration isn't simple. We have an environment with multiple internal networks, some of which are partially firewalled off from each other, and where people would want to enroll various internal machines in any mesh networking setup (partly so they can be reached directly). This creates problems for a simple 'every node can advertise some routes and you accept the whole bundle' model.

The first problem is what I'll call the direct subnet problem. Suppose that you have a subnet with a bunch of machines on it and two of them are nodes (call them A and B), with one of them (call it A) advertising a route to the subnet so that other machines in the mesh can reach it. The direct subnet problem is that you don't want B to ever send its traffic for the subnet to A; since it's directly connected to the subnet, it should send the traffic directly. Whether or not this happens automatically depends on various implementation choices the setup makes.

The second problem is the indirect subnet problem. Suppose that you have a collection of internal networks that can all talk to each other (perhaps through firewalls and somewhat selectively). Not all of the machines on all of the internal networks are part of the mesh, and you want people who are outside of your networks to be able to reach all of the internal machines, so you have a mesh node that advertises routes to all of your internal networks. However, if a mesh node is already inside your perimeter and can reach your internal networks, you don't want it to go through your mesh gateway; you want it to send its traffic directly.

(You especially want this if mesh nodes have different mesh IPs from their normal IPs, because you probably want the traffic to come from the normal IP, not the mesh IP.)

You can handle the direct subnet case with a general rule like 'if you're directly attached to this network, ignore a mesh subnet route to it', or by some automatic system like route priorities. The indirect subnet case can't be handled automatically because it requires knowledge about your specific network configuration and what can reach what without the mesh (and what you want to reach what without the mesh, since some traffic you want to go over the mesh even if there's a non-mesh route between the two nodes). As far as I can see, to deal with this you need the ability to selectively configure or accept (subnet) routes on a mesh node by mesh node basis.

(In a simple topology you can get away with accepting or not accepting all subnet routes, but in a more complex one you can't. You might have two separate locations, each with their own set of internal subnets. Mesh nodes in each location want the other location's subnet routes, but not their own location's subnet routes.)

Being reminded that Git commits are separate from Git trees

By: cks

Firefox's official source repository has moved to Git, but to a completely new Git repository, not the Git mirror that I've used for the past few years. This led me to a lament on the Fediverse:

This is my sad face that Firefox's switch to using git of course has completely different commit IDs than the old not-official gecko-dev git repository, meaning that I get to re-clone everything from scratch (all ~8 GB of it). Oh well, so it goes in the land of commit hashes.

Then Tim Chase pointed out something that I should have thought of:

If you add the new repo as a secondary remote in your existing one and pull from it, would it mitigate pulling all the blobs (which likely remain the same), limiting your transfer to just the commit-objects (and possibly some treeish items and tags)?

Git is famously a form of content-addressed storage, or more specifically a tree of content addressed storage, where as much as possible is kept the same over time. This includes all the portions of the actual source tree. A Git commit doesn't directly include a source tree; instead it just has the hash of the source tree (well, its top level, cf).

What this means is that if you completely change the commits so that all of them have new hashes, for example by rebuilding your history from scratch in a new version of the repository, but you keep the actual tree contents the same in most or all of the commits, the only thing that actually changes is the commits. If you add this new repository (with its new commit history) as a Git remote to your existing repository and pull from it, most or all of the tree contents are the same across the two sets of commits and won't have to be fetched. So you don't fetch gigabytes of tree contents, you only fetch megabytes (one hopes) of commits.

As I mentioned on the Fediverse, I was told this too late to save me from re-fetching the entire new Firefox repository from scratch on my office desktop (which has lots of bandwidth). I may yet try this on my home desktop, or alternately use it on my office desktop to easily move my local changes on top of the new official Git history.

(I think this is effectively rebasing my own changes on top of something that's been rebased, which I've done before, although not recently. I'll also want to refresh my understanding of what 'git rebase' does.)

The appeal of keyboard launchers for (Unix) desktops

By: cks

A keyboard launcher is a big part of my (modern) desktop, but over on the Fediverse I recently said something about them in general:

I don't necessarily suggest that people use dmenu or some equivalent. Keyboard launchers in GUI desktops are an acquired taste and you need to do a bunch of setup and infrastructure work before they really shine. But if you like driving things by the keyboard and will write scripts, dmenu or equivalents can be awesome.

The basic job of a pure keyboard launcher is to let you hit a key, start typing, and then select and do 'something'. Generally the keyboard launcher will make a window appear so that you can see what you're typing and maybe what you could complete it to or select.

The simplest and generally easiest way to use a keyboard launcher, and how many of them come configured to work, is to use it to select and run programs. You can find a version of this idea in GNOME, and even Windows has a pseudo-launcher in that you can hit a key to pop up the Start menu and the modern Start menu lets you type in stuff to search your programs (and other things). One problem with the GNOME version, and many basic versions, is that in practice you don't necessarily launch desktop programs all that often or launch very many different ones, so you can have easier ways to invoke the ones you care about. One problem with the Windows version (at least in my experience) is that it will do too much, which is to say that no matter what garbage you type into it by accident, it will do something with that garbage (such as launching a web search).

The happy spot for a keyboard launcher is somewhere in the middle, where they do a variety of things that are useful for you but not without limits. The best window launcher for your desktop is one that gives you fast access to whatever things you do a lot, ideally with completion so you type as little as possible. When you have it tuned up and working smoothly the feel is magical; I tap a key, type a couple of characters and then hit tab, hit return, and the right thing happens without me thinking about it, all fast enough that I can and do type ahead blindly (which then goes wrong if the keyboard launcher doesn't start fast enough).

The problem with keyboard launchers, and why they're not for everyone, is that everyone has a different set of things that they do a lot and that are useful for them to trigger entirely through the keyboard. No keyboard launcher will come precisely set up for what you do a lot in their default installation, so at a minimum you need to spend the time and effort to curate what the launcher will do and how it does it. If you're more ambitious, you may need to build supporting scripts that give the launcher a list of things to complete and then act on them when you complete one. If you don't curate the launcher and throw in the kitchen sink, you wind up with the Windows experience where it will certainly do something when you type things but perhaps not really what you wanted.

(For example, I routinely ssh to a lot of our machines, so my particular keyboard launcher setup lets me type a machine name (with completion) to start a session to it. But I had to build all of that, including sourcing the machine names I wanted included from somewhere, and this isn't necessarily useful for people who aren't constantly ssh'ing to machines.)

There are a variety of keyboard launchers for both X and Wayland, basically none of which I have any experience with. See the Arch Wiki section on application launchers. Someday I will have to get a Wayland equivalent to my particular modified dmenu, a thought that fills me with no more enthusiasm than any other part of replacing my whole X environment.

PS: Another issue with keyboard launchers is that sometimes you're wrong about what you want to do with them. I once built an entire keyboard launcher setup to select terminal windows and then later wound up abandoning it when I didn't use it enough.

Updating venv-based things by replacing the venv not updating it

By: cks

These days, we have mostly switched over to installing third-party Python programs (and sometimes things like Django) in virtual environments instead of various past practices. This is clearly the way Python expects you to do things and increasingly problems emerge if you don't. One of the issues I've been thinking about is how we want to handle updating these programs when they release new versions, because there are two approaches.

One option would be to update the existing venv in place, through various 'pip' commands. However, pip-based upgrades have some long standing issues, and also they give you no straightforward way to revert an upgrade if something goes wrong. The other option is to build a separate venv with the new version of the program (and all of its current dependency versions) and then swap the whole new venv into place, which works because venvs can generally be moved around. You can even work with symbolic links, creating a situation where you refer to 'dir/program', which is a symlink to 'dir/venvs/program-1.2.0' or 'dir/venvs/programs-1.3.0' or whatever you want today.

In practice we're more likely to have 'dir/program' be a real venv and just create 'dir/program-new', rename directories, and so on. The full scale version with always versioned directories is likely to only be used for things, like Django, where we want to be able to easily see what version we're running and switch back very simply.

Our Django versions were always going to be handled by building entirely new venvs and switching to them (it's the venv version of what we did before). We haven't had upgrades of other venv based programs until recently, and when I started thinking about it, I reached the obvious conclusion: we'll update everything by building a new venv and replacing the old one, because this deals with pretty much all of the issues at the small cost of yet more disk space for yet more venvs.

(This feels quite obvious once I'd made the decision, but I want to write it down anyway. And who knows, maybe there are reasons to update venvs in place. The one that I can think of is to only change the main program version but not any of the dependencies, if they're still compatible.)

The glass box/opaque box unit testing argument in light of standards

By: cks

One of the traditional divides in unit testing is whether you should write 'glass box' or 'opaque box' tests (like GeePawHill I think I prefer those terms to the traditional ones), which is to say whether you should write tests exploiting your knowledge of the module's code or without it. Since I prefer testing inside my modules, I'm implicitly on the side of glass box tests; even if I'm testing public APIs, I write tests with knowledge of potential corner cases. Recently, another reason for this occurred to me, by analogy to standards.

I've read about standards (and read the actual standards) enough by now to have absorbed the lesson that it is very hard to write a (computer) standard that can't be implemented perversely. Our standards need good faith implementations and there's only so much you can do to make it hard for people implementing them in bad faith. After that, you have to let the 'market' sort it out (including the market of whether or not people want to use perverse implementations, which generally they don't).

(Of course some time the market doesn't give you a real choice. Optimizing C compilers are an example, where your only two real options (GCC and LLVM) have aggressively exploited arguably perverse readings of 'undefined behavior' as part of their code optimization passes. There's some recent evidence that this might not always be worth it [PDF], via.)

If you look at them in the right way, unit tests are also a sort of standard. And like standards, opaque box unit tests have a very hard time of completely preventing perverse implementations. While people usually don't deliberately create perverse implementations, they can happen by accident or by misunderstandings, and there can be areas of perverse problems due to bugs. Your cheapest assurance that you don't have a perverse implementation is to peer inside and then write glass box tests that in part target the areas where perverse problems could arise. If you write opaque box tests, you're basically hoping that you can imagine all of the perverse mistakes that you'll make.

(Some things are amenable to exhaustive testing, but usually not very many.)

PS: One way to get perverse implementations is 'write code until all of the tests pass, then stop'. This doesn't guarantee a perverse implementation but it certainly puts the onus on the tests to force the implementation to do things, much like with standards (cf).

Trying to understand OpenID Connect (OIDC) and its relation to OAuth2

By: cks

The OIDC specification describes it as "a simple identity layer" on top of OAuth2. As I've been discovering, this is sort of technically true but also misleading. Since I think I've finally sorted this out, here's what I've come to understand about the relationship.

OAuth2 describes a HTTP-based protocol where a client (typically using a web browser) can obtain an access token from an authorization server and then present this token to a resource server to gain access to something. For example, your mail client works with a browser to obtain an access token from an OAuth2 identity provider, which it then presents to your IMAP server. However, the base OAuth2 specification is only concerned with the interaction between clients and the authorization server; it explicitly has nothing to say about issues like how a resource server validations and uses the access tokens. This is right at the start of RFC 6749:

The interaction between the authorization server and resource server is beyond the scope of this specification. [...]

Because it's purely about the client to authorization server flows, the base OAuth2 RFC provides nothing that will let your IMAP server validate the alleged 'OAuth2 access token' your mail client has handed it (or find out from the token who you are). There were customary ways to do this, and then later you had RFC 7662 Token Introspection or perhaps RFC 9068 JWT access tokens, but those are all outside basic OAuth2.

(This has obvious effects on interoperability. You can't write a resource server that works with arbitrary OAuth2 identity providers, or an OAuth2 identity provider of your own that everyone will be able to work with. I suspect that this is one reason why, for example, IMAP mail clients often only support a few big OAuth2 identity providers.)

OIDC takes the OAuth2 specification and augments it in a number of ways. In addition to an OAuth2 access token, an OIDC identity provider can also give clients (you) an ID Token that's a (signed) JSON Web Token (JWT) that has a specific structure and contains at least a minimal set of information about who authenticated. An OIDC IdP also provides an official Userinfo endpoint that will provide information about an access token, although this is different information than the RFC 7662 Token Introspection endpoint.

Both of these changes make resource servers and by extension OIDC identity providers much more generic. If a client hands a resource server either an OIDC ID Token or an OIDC Access Token, the resource server ('consumer') has standard ways to inspect and verify them. If your resource server isn't too picky (or is sufficiently configurable), I think it can work with either an OIDC Userinfo endpoint or an OAuth2 RFC 7662 Token Introspection endpoint (I believe this is true of Dovecot, cf).

(OIDC is especially convenient in cases like websites, where the client that gets the OIDC ID Token and Access Token is the same thing that uses them.)

An OAuth2 client can talk to an OIDC IdP as if it was an OAuth2 IdP and get back an access token, because the OIDC IdP protocol flow is compatible with the OAuth2 protocol flow. This access token could be described as an 'OAuth2' access token, but this is sort of meaningless to say since OAuth2 gives you nothing you can do with an access token. An OAuth2 resource server (such as an IMAP server) that expects to get 'OAuth2 access tokens' may or may not be able to interact with any particular OIDC IdP to verify those OIDC IdP provided tokens to its satisfaction; it depends on what the resource server supports and requires. For example, if the resource server specifically requires RFC 7662 Token Introspection you may be out of luck, because OIDC IdPs aren't required to support that and not all do.

In practice, I believe that OIDC has been around for long enough and has been popular enough that consumers of 'OAuth2 access tokens', like your IMAP server, will likely have been updated so that they can work with OIDC Access Tokens. Servers can do this either by verifying the access tokens through an OIDC Userinfo endpoint (with suitable server configuration to tell them what to look for) or by letting you tell them that the access token is a JWT and verifying the JWT. OIDC doesn't require the access token to be a JWT but OIDC IdPs can (and do) use JWTs for this, and perhaps you can actually have your client software send the ID Token (which is guaranteed to be a JWT) instead of the OIDC Access Token.

(It helps that OIDC is obviously better if you want to write 'resource server' side software that works with any IdP without elaborate and perhaps custom configuration or even programming for each separate IdP.)

(I have to thank Henryk PlΓΆtz for helping me understand OAuth2's limited scope.)

(The basic OAuth2 has been extended with multiple additional standards, see eg RFC 8414, and if enough of them are implemented in both your IdP and your resource servers, some of this is fixed. OIDC has also been extended somewhat, see eg OpenID Provider Metadata discovery.)

Looking at OIDC tokens and getting information on them as a 'consumer'

By: cks

In OIDC, roughly speaking and as I understand it, there are three possible roles: the identity provider ('OP'), a Client or 'Relying Party' (the program, website, or whatever that has you authenticate with the IdP and that may then use the resulting authentication information), and what is sometimes called a 'resource server', which uses the IdP's authentication information that it gets from you (your client, acting as a RP). 'Resource Server' is actually an OAuth2 term, which comes into the picture because OIDC is 'a simple identity layer' on top of OAuth2 (to quote from the core OIDC specification). A website authenticating you with OIDC can be described as acting both as a 'RP' and a 'RS', but in cases like IMAP authentication with OIDC/OAuth2, the two roles are separate; your mail client is a RP, and the IMAP server is a RS. I will broadly call both RPs and RSs 'consumers' of OIDC tokens.

When you talk to an OIDC IdP to authenticate, you can get back either or both of an ID Token and an Access Token. The ID Token is always a JWT with some claims in it, including the 'sub(ject)', the 'issuer', and the 'aud(ience)' (which is what client the token was requested by), although this may not be all of the claims you asked for and are entitled to. In general, to verify an ID Token (as a consumer), you need to extract the issuer, consult the issuer's provider metadata to find how to get their keys, and then fetch the keys so you can check the signature on the ID Token (and then proceed to do a number of additional verifications on the information in the token, as covered in the specification). You may cache the keys to save yourself the network traffic, which allows you to do offline verification of ID Tokens. Quite commonly, you'll only accept ID Tokens from pre-configured issuers, not any random IdP on the Internet (ie, you will verify that the 'iss' claim is what you expect). As far as I know, there's no particular way in OIDC to tell if the IdP still considers the ID Token valid or to go from an ID Token alone to all of the claims you're entitled to.

The Access Token officially doesn't have to be anything more than an opaque string. To validate it and get the full set of OIDC claim information, including the token's subject (ie, who it's for), you can use the provider's Userinfo endpoint. However, this doesn't necessarily give you the 'aud' information that will let you verify that this Access Token was created for use with you and not someone else. If you have to know this information, there are two approaches, although an OIDC identity provider doesn't have to support either.

The first is that the Access Token may actually be a RFC 9068 JWT. If it is, you can validate it in the usual OIDC JWT way (as for an ID Token) and then use the information inside, possibly in combination with what you get from the Userinfo endpoint. The second is that your OAuth2 provider may support an RFC 7662 Token Introspection endpoint. This endpoint is not exposed in the issuer's provider metadata and isn't mandatory in OIDC, so your IdP may or may not support it (ours doesn't, although that may change someday).

(There's also an informal 'standard' way of obtaining information about Access Tokens that predates RFC 7662. For all of the usual reasons, this may still be supported by some large, well-established OIDC/OAuth2 identity providers.)

Under some circumstances, the ID Token and the Access Token may be tied together in that the ID Token contains a claim field that you can use to validate that you have the matching Access Token. Otherwise, if you're purely a Resource Server and someone hands you a theoretically matching ID Token and Access Token, all that you can definitely do is use the Access Token with the Userinfo endpoint and verify that the 'sub' matches. If you have a JWT Access Token or a Token Introspection endpoint, you can get more information and do more checks (and maybe the Userinfo endpoint also gives you an 'aud' claim).

If you're a straightforward Relying Party client, you get both the ID Token and the Access Token at the same time and you're supposed to keep track of them together yourself. If you're acting as a 'resource server' as well and need the additional claims that may not be in the ID Token, you're probably going to use the Access Token to talk to the Userinfo endpoint to get them; this is typically how websites acting as OIDC clients behave.

Because the only OIDC standard way to get additional claims is to obtain an Access Token and use it to access the Userinfo endpoint, I think that many OIDC clients that are acting as both a RP and a RS will always request both an ID Token and an Access Token. Unless you know the Access Token is a JWT, you want both; you'll verify the audience in the ID Token, and then use the Access Token to obtain the additional claims. Programs that are only getting things to pass to another server (for example, a mail client that will send OIDC/OAuth2 authentication to the server) may only get an Access Token, or in some protocols only obtain an ID Token.

(If you don't know all of this and you get a mail client testing program to dump the 'token' it obtains from the OIDC IdP, you can get confused because a JWT format Access Token can look just like an ID Token.)

This means that OIDC doesn't necessarily provide a consumer with a completely self-contained single object that both has all of the information about the person who authenticated and that lets you be sure that this object is intended for you. An ID Token by itself doesn't necessarily contain all of the claims, and while you can use any (opaque) Access Token to obtain a full set of claims, I believe that these claims don't have to include the 'aud' claim (although your OIDC IdP may chose to include it).

This is in a sense okay for OIDC. My understanding is that OIDC is not particularly aimed at the 'bearer token' usage case where the RP and the Resource Server are separate systems; instead, it's aimed at the 'website authenticating you' case where the RP is the party that will directly rely on the OIDC information. In this case the RP has (or can have) both the ID Token and the Access Token and all is fine.

(A lot of my understanding on this is due to the generosity of @Denvercoder9 and others after I was confused about this.)

Sidebar: Authorization flow versus implicit flow in OIDC authentication

In the implicit flow, you send people to the OIDC IdP and the OIDC IdP directly returns the ID Token and Access Token you asked for to your redirect URI, or rather has the person's browser do it. In this flow, the ID Token includes a partial hash of the Access Token and you use this to verify that the two are tied together. You need to do this because you don't actually know what happened in the person's browser to send them to your redirect URI, and it's possible things were corrupted by an attacker.

In the authorization flow, you send people to the OIDC IdP and it redirects them back to you with an 'authorization code'. You then use this code to call the OIDC IdP again at another endpoint, which replies with both the ID Token and the Access Token. Because you got both of these at once during the same HTTP conversation directly with the IdP, you automatically know that they go together. As a result, the ID Token doesn't have to contain any partial hash of the Access Token, although it can.

I think the corollary of this is that if you want to be able to hand the ID Token and the Access Token to a Resource Server and allow it to verify that the two are connected, you want to use the implicit flow, because that definitely means that the ID Token has the partial hash the Resource Server will need.

(There's also a hybrid flow which I'll let people read about in the standard.)

Chrome and the burden of developing a browser

By: cks

One part of the news of the time interval is that the US courts may require Google to spin off Chrome (cf). Over on the Fediverse, I felt this wasn't a good thing:

I have to reluctantly agree that separating Chrome from Google would probably go very badlyΒΉ. Browsers are very valuable but also very expensive public goods, and our track record of funding and organizing them as such in a way to not wind up captive to something is pretty bad (see: Mozilla, which is at best questionable on this). Google is not ideal but at least Chrome is mostly a sideline, not a main hustle.

ΒΉ <Lauren Weinstein Fediverse post> [...]

One possible reaction to this is that it would be good for everyone if people stopped spending so much money on browsers and so everything involving them slowed down. Unfortunately, I don't think that this would work out the way people want, because popular browsers are costly beasts. To quote what I said on the Fediverse:

I suspect that the cost of simply keeping the lights on in a modern browser is probably on the order of plural millions of dollars a year. This is not implementing new things, this is fixing bugs, keeping up with security issues, monitoring CAs, and keeping the development, CI, testing, and update infrastructure running. This has costs for people, for servers, and for bandwidth.

The reality of the modern Internet is that browsers are load bearing infrastructure; a huge amount of things run through them, including and especially on minority platforms. Among other things, no browser is 'secure' and all of them are constantly under attack. We want browser projects that are used by lots of people to have enough resources (in people, build infrastructure, update servers, and so on) to be able to rapidly push out security updates. All browsers need a security team and any browser with addons (which should be all of them) needs a security team for monitoring and dealing with addons too.

(Browsers are also the people who keep Certificate Authorities honest, and Chrome is very important in this because of how many people use it.)

On the whole, it's a good thing for the web that Chrome is in the hands of an organization that can spend tens of millions of dollars a year on maintaining it without having to directly monetize it in some way. It would be better if we could collectively fund browsers as the public good that they are without having corporations in the way, because Google absolutely corrupts Chrome (also) and Mozilla has stumbled spectacularly (more than once). But we have to deal with the world that we have, not the world that we'd like to have, and in this world no government seems to be interested in seriously funding obvious Internet public goods (not only browsers but also, for example, free TLS Certificate Authorities).

(It's not obvious that a government funded browser would come out better overall, but at least there would be a chance of something different than the narrowing status quo.)

PS: Another reason that spending on browsers might not drop is that Apple (with Safari) and Microsoft (with Edge) are also in the picture. Both of these companies might take the opportunity to slow down, or they might decide that Chrome's potentially weak new position was a good moment to push for greater dominance and maybe lock-in through feature leads.

The many ways of getting access to information ('claims') in OIDC

By: cks

Any authentication and authorization framework, such as OIDC, needs a way for the identity provider (an 'OIDC OP') to provide information about the person or thing that was just authenticated. In OIDC specifically, what you get are claims that are grouped into scopes. You have to ask for specific scopes, and the IdP may restrict what scopes a particular client has access to. Well, that is not quite the full story, and the full story is complicated (more so than I expected when I started writing this entry).

When you talk to the OIDC identity server (OP) to authenticate, you (the program or website or whatever acting as the client) can get back either or both of an ID Token and an Access Token. I believe that in general your Access Token is an opaque string, although there's a standard for making it a JWT. Your ID Token is ultimately some JSON (okay, it's a JWT) and has certain mandatory claims like 'sub' (the subject) that you don't have to ask for with a scope. It would be nice if all of the claims from all of the scopes that you asked for were automatically included in the ID Token, but the OIDC standard doesn't require this. Apparently many but not all OIDC OPs include all the claims (at least by default); however, our OIDC OP doesn't currently do so, and I believe that Google's OIDC OP also doesn't include some claims.

(Unsurprisingly, I believe that there is a certain amount of OIDC-using software out there that assumes that all OIDC OPs return all claims in the ID Token.)

The standard approved and always available way to obtain the additional claims (which in some cases will be basically all claims) is to present your Access Token (not your ID Token) to the OIDC Userinfo endpoint at your OIDC OP. If your Access Token is (still) valid, what you will get back is either a plain, unsigned JSON listing of those claims (and their values) or perhaps a signed JWT of the same thing (which you can find out from the provider metadata). As far as I can see, you don't necessarily use the ID Token in this additional information flow, although you may want to be cautious and verify that the 'sub' claim is the same in the Userinfo response and the ID Token that is theoretically paired with your Access Token.

(As far as I can tell, the ID Token doesn't include a copy of the Access Token as another pseudo-claim. The two are provided to you at the same time (if you asked the OIDC OP for both), but are independent. The ID Token can't quite be verified offline because you need to get the necessary public key from the OIDC OP to check the signature.)

If I'm understanding things correctly (which may not be the case), in an OAuth2 authentication context, such as using OAUTHBEARER with the Dovecot IMAP server, I believe your local program will send the Access Token to the remote end and not do much with the ID Token, if it even requested one. The remote end then uses the Access Token with a pre-configured Userinfo endpoint to get a bunch of claims, and incidentally to validate that the Access Token is still good. In other protocols, such as the current version of OpenPubkey, your local program sends the ID Token (perhaps wrapped up) and so needs it to already include the full claims, and can't use the Userinfo approach. If what you have is a website that is both receiving the OIDC stuff and processing it, I believe that the website will normally ask for both the ID Token and the Access Token and then augment the ID Token information with additional claims from the Userinfo response (this is what the Apache OIDC module does, as far as I can see).

An OIDC OP may optionally allow clients to specifically request that certain claims be included in the ID Token that they get, through the "claims" request parameter on the initial request. One potential complication here is that you have to ask for specific claims, not simple 'all claims in this scope'; it's up to you to know what potentially non-standard claims you should ask for (and I believe that the claims you get have to be covered by the scopes you asked for and that the OIDC OP allows you to get). I don't know how widely implemented this is, but our OIDC OP supports it.

(An OIDC OP can list all of its available claims in its metadata, but doesn't have to. I believe that most OPs will list their scopes, although technically this is just 'recommended'.)

If you really want a self-contained signed object that has all of the information, I think you have to hope for an OIDC OP that either puts all claims in the ID Token by default or lets you ask for all of the claims you care about to be added for your request. Even if an OIDC OP gives you a signed userinfo response, it may not include all of the ID Token information and it might not be possible to validate various things later. You can always validate an Access Token by making a Userinfo request with it, but I don't know if there's any way to validate an ID Token.

We've chosen to 'modernize' all of our ZFS filesystems

By: cks

We are almost all of the way to the end of a multi-month process of upgrading our ZFS fileservers from Ubuntu 22.04 to 24.04 by also moving to more recent hardware. This involved migrating all of our pools and filesystems, involving terabytes of data. Our traditional way of doing this sort of migration (which we used, for example, when going from our OmniOS fileservers to our Linux fileservers was the good old reliable 'zfs send | zfs receive' approach of sending snapshots over. This sort of migration is fast, reliable, and straightforward. However, it has one drawback, which is that it preserves all of the old filesystem's history, including things like the possibility of panics and possibly other things.

We've been running ZFS for long enough that we had some ZFS filesystems that were still at ZFS filesystem version 4. In late 2023, we upgraded them all to ZFS filesystem version 5, and after that we got some infrequent kernel panics. We could never reproduce the kernel panics and they were very infrequent, but 'infrequent' is not the same as 'never' (the previous state of affairs), and it seemed likely that they were in some way related to upgrading our filesystem versions, which in turn was related to us having some number of very old filesystems. So in this migration, we deliberately decided to 'migrate' filesystems the hard way. Which is to say, rather than migrating the filesystems, we migrated the data with user level tools, moving it into pools and filesystems that were created from scratch on our new Ubuntu 24.04 fileservers (which led us to discover that default property values sometimes change in ways that we care about).

(The filesystems reused the same names as their old versions, because that keeps things easier for our people and for us.)

It's possible that this user level rewriting of all data has wound up laying things out in a better way (although all of this is on SSDs), and it's certainly insured that everything has modern metadata associated with it and so on. The 'fragmentation' value of the new pools on the new fileservers is certainly rather lower than the value for most old pools, although what that means is a bit complicated.

There's a bit of me that misses the deep history of our old filesystems, some of which dated back to our first generation Solaris ZFS fileservers. However, on the whole I'm happy that we're now using filesystems that don't have ancient historical relics and peculiarities that may not be well supported by OpenZFS's code any more (and which were only likely to get less tested and more obscure over time).

(Our pools were all (re)created from scratch as part of our migration from OmniOS to Linux, and anyway would have been remade from scratch again in this migration even if we moved the filesystems with 'zfs send'.)

My Cinnamon desktop customizations (as of 2025)

By: cks

A long time ago I wrote up some basic customizations of Cinnamon, shortly after I started using Cinnamon (also) on my laptop of the time. Since then, the laptop got replaced with another one and various things changed in both the land of Cinnamon and my customizations (eg, also). Today I feel like writing down a general outline of my current customizations, which fall into a number of areas from the modest but visible to the large but invisible.

The large but invisible category is that just like on my main fvwm-based desktop environment, I use xcape (plus a custom Cinnamon key binding for a weird key combination) to invoke my custom dmenu setup (1, 2) when I tap the CapsLock key. I have dmenu set to come up horizontally on the top of the display, which Cinnamon conveniently leaves alone in the default setup (it has its bar at the bottom). And of course I make CapsLock into an additional Control key when held.

(On the laptop I'm using a very old method of doing this. On more modern Cinnamon setups in virtual machines, I do this with Settings β†’ Keyboard β†’ Layout β†’ Options, and then in the CapsLock section set CapsLock to be an additional Ctrl key.)

To start xcape up and do some other things, like load X resources, I have a personal entry in Settings β†’ Startup Applications that runs a script in my ~/bin/X11. I could probably do this in a more modern way with an assortment of .desktop files in ~/.config/autostart (which is where my 'Startup Applications' setting actually wind up) that run each thing individually or perhaps some systemd user units. But the current approach works and is easy to modify if I want to add or remove things (I can just edit the script).

I have a number of Cinnamon 'applets' installed on my laptop and my other Cinnamon VM setups. The ones I have everywhere are Spices Update and Shutdown Applet, the latter because if I tell the (virtual) machine to log me off, shut down, or restart, I generally don't want to be nagged about it. On my laptop I also have CPU Frequency Applet (set to only display a summary) and CPU Temperature Indicator, for no compelling reason. In all environments I also pin launchers for Firefox and (Gnome) Terminal to the Cinnamon bottom bar, because I start both of them often enough. I position the Shutdown Applet on the left side, next to the launchers, because I think of it as a peculiar 'launcher' instead of an applet (on the right).

(The default Cinnamon keybindings also start a terminal with Ctrl + Alt + T, which you can still find through the same process from several years ago provided that you don't cleverly put something in .local/share/glib-2.0/schemas and then run 'glib-compile-schemas .' in that directory. If I was a smarter bear, I'd understand what I should have done when I was experimenting with something.)

On my virtual machines with Cinnamon, I don't bother with the whole xcape and dmenu framework, but I do set up the applets and the launchers and fix CapsLock.

(This entry was sort of inspired by someone I know who just became a Linux desktop user (after being a long time terminal user).)

Sidebar: My Cinnamon 'window manager' custom keybindings

I have these (on my laptop) and perpetually forget about them, so I'm going to write them down now so perhaps that will change.

move-to-corner-ne=['<Alt><Super>Right']
move-to-corner-nw=['<Alt><Super>Left']
move-to-corner-se=['<Primary><Alt><Super>Right']
move-to-corner-sw=['<Primary><Alt><Super>Left']
move-to-side-e=['<Shift><Alt><Super>Right']
move-to-side-n=['<Shift><Alt><Super>Up']
move-to-side-s=['<Shift><Alt><Super>Down']
move-to-side-w=['<Shift><Alt><Super>Left']

I have some other keybindings on the laptop but they're even less important, especially once I added dmenu.

❌