utcc.utoronto.ca/~cks
Finding out what your big RPMs are, in two different 'sizes'
Suppose, not hypothetically, that you have an old Fedora system with a lot of packages installed and a 70 GByte root filesystem, which is now awkwardly small during system upgrades and so on. You would like to find out which of your roughly 7,500 packages are contributing the most to your space usage. (The real solution is to move to a bigger pair of NVMe drives, but that involves various yak shaving and you want to upgrade to Fedora 43 today .) The simple version of 'how big are your RPMs' is t...
Two little scripts: <code>addup</code> and <code>sumup</code>
(Once again it's been a while since the last little script .) Every so often I find myself in a situation where I have a bunch of lines with multiple columns and I want to either add up all of the numbers in one column (for example, to get total transfer volume from Apache log files) or add up all of the numbers in one column grouped by the value of a second column. This leads to two scripts, which I call ' addup ' and ' sumup '. Addup is a simple awk script that adds up all the values from some...
Web server ratelimits are a precaution to let me stop worrying
These days, Wandering Thoughts has some hacked together HTTP request rate limits. They don't exist for strong technical reasons; my blog engine setup here can generally stand up to even fairly extreme traffic floods (through an extensive series of hacks). It's definitely possible to overwhelm Wandering Thoughts with a high enough request volume, and HTTP rate limits will certainly help with that, but that's not really why they exist. My HTTP rate limits exist for ultimately social reasons and be...
Using '<code>pkg</code>' for everything on FreeBSD 15 has been nice
Traditionally, the FreeBSD base system was managed through freebsd-update ( also ), which I would call primarily a patch-based system, while third party software was (usually) managed through pkg , a package manager. This was a quite traditional split, but it had some less than ideal aspects, and as of FreeBSD 15 you can choose to manage FreeBSD through pkg using what is called freebsd-base (which is also known as 'pkgbase'). If you're installing FreeBSD 15 from scratch, the installer will let y...
I should use argument groups in Python's argparse module more than I do
For reasons well outside the scope of this entry, the other day I looked at the --help output from one of my old Python programs. This particular program has a lot of options, but when I'd written it, I had used argparse argument groups to break up the large list of options into logical groups, starting with the most important and running down to the 'you should probably ignore these' ones. The result was far more readable than it would have been without the grouping. (I want to call these 'opti...
Updating Ubuntu packages that you have local changes for with dgit
Suppose, not entirely hypothetically, that you've made local changes to an Ubuntu package using dgit and now Ubuntu has come out with an update to that package that you want to switch to, with your local changes still on top. Back when I wrote about moving local changes to a new Ubuntu release with dgit , I wrote an appendix with a theory of how to do this, based on a conversation . Now that I've actually done this, I've discovered that there is a minor variation and I'm going to write it down e...
Here in 2026, we're retaining old systems instead of discarding them
I mentioned recently that at work, we 're retaining old systems that we would have normally discarded. We're doing this for the obvious reason that new servers have become increasingly expensive, due to escalating prices of RAM (especially DDR5 RAM) and all forms of SSDs, especially as new servers might really require us to buy ones that support U.2 NVMe instead of SATA SSDs (because I'm not sure how available SATA SSDs are these days). Our servers are generally fairly old anyways , so our reten...
How old our servers are (as of 2026)
Back in 2022, I wrote about how old our servers were at the time , partly because they're older than you might expect, and today I want to update that with our current situation. My group handles the general departmental infrastructure for the research side of the department (the teaching side is a different group), and we've tended to keep servers for quite a while. Research groups are a different matter; they often have much more modern servers and turn them over much faster. As in past instal...
New old systems in the age of hardware shortages
Recently I asked something on the Fediverse : Lazyweb, if you were going to put together new DDR4-based desktop (because you already have the RAM and disks), what CPU would you use? Integrated graphics would probably be ideal because my needs are modest and that saves wrangling a GPU. (Also I'm interested in your motherboard opinions, but the motherboard needs 2x M.2 and 2x to 4x SATA, which makes life harder. And maybe 4K@60Hz DisplayPort output, for integrated graphics) If I was thinking of bu...
Canonical's Netplan is hard to deal with in automation
Suppose, not entirely hypothetically, that you've traditionally used /etc/resolv.conf on your Ubuntu servers but you're considering switching to systemd-resolved, partly for fast failover if your normal primary DNS server is unavailable and partly because it feels increasingly dangerous not to, since resolved is the normal configuration and what software is likely to expect. One of the ways that resolv.conf is nice is that you can set the configuration by simply copying a single file that isn't ...
Considering mmap() verus plain reads for my recent code
The other day I wrote about a brute force approach to mapping IPv4 /24 subnets to Autonomous System Numbers (ASNs) , where I built a big, somewhat sparse file of four-byte records, with the record for each /24 at a fixed byte position determined by its first three octets (so 0.0.0.0/24's ASN, if any, is at byte 0, 0.0.1.0/24 is at byte 4, and so on). My initial approach was to open, lseek(), and read() to access the data; in a comment, Aristotle Pagaltzis wondered if mmap() would perform better....
Early notes on switching some libvirt-based virtual machines to UEFI
I keep around a small collection of virtual machines so I don't have to drag out one of our spare physical servers to test things on. These virtual machines have traditionally used traditional MBR-based booting ('BIOS' in libvirt instead of 'UEFI'), partly because for a long time libvirt didn't support snapshots of UEFI based virtual machines and snapshots are very important for my use of these scratch virtual machines . However, I recently discovered that libvirt now can do snapshots of UEFI ba...
Going from an IPv4 address to an ASN in Python 2 with Unix brute force
For reasons , I've reached the point where I would like to be able to map IPv4 addresses into the organizations responsible for them, which is to say their Autonomous System Number (ASN) , for use in DWiki , the blog engine of Wandering Thoughts . So today on the Fediverse I mused : Current status: wondering if I can design an on-disk (read only) data structure of some sort that would allow a Python 2 program to efficiently map an IP address to an ASN. There are good in-memory data structures fo...
Fedora's virt-manager started using external snapshots for me as of Fedora 41
Today I made an unpleasant discovery about virt-manager on my (still) Fedora 42 machines that I shared on the Fediverse : This is my face that Fedora virt-manager appears to have been defaulting to external snapshots for some time and SURPRISE, external snapshots can't be reverted by virsh. This is my face, especially as it seems to have completely screwed up even deleting snapshots on some virtual machines. (I only discovered this today because today is the first time I tried to touch such a sn...
Mass production's effects on the cheapest way to get some things
We have a bunch of networks in a number of buildings , and as part of looking after them, we want to monitor whether or not they're actually working. For reasons beyond the scope of this entry we don't do things like collect information from our switches through SNMP, so our best approach is 'ping something on the network in the relevant location'. This requires something to ping. We want that thing to be stable and always on the network, which typically rules out machines and devices run by oth...
A traditional path to getting lingering duplicate systems
In yesterday's entry I described a lingering duplicate system and how it had taken us a long time to get rid of it , but I got too distracted by the story to write down the general thoughts I had on how this sort of thing happens and keeps happening (also, the story turned out to be longer than I expected). We've had other long running duplicate systems, and often they have more or less the same story as yesterday's disk space usage tracking system . The first system built is a basic system. It'...
Lingering duplicate systems and the expense of weeding them out (an illustration)
We have been operating a fileserver environment for a long time now, back before we used ZFS . When you operate fileservers in a traditional general Unix environment , one of the things you need is disk usage information. So a very long time ago, before I even arrived, people built a very Unix-y system to do this. Every night, raw usage information was generated for each filesystem (for a while with ' du '), written to a special system directory in the filesystem, and then used to create a text ...
DMARC DNS record inheritance and DMARC alignment requirements
To simplify, DMARC is based on the domain in the 'From:' header, and what policy (if any) that domain specifies. As I've written about (and rediscovered) more than once ( here and here ), DMARC will look up the DNS record for the DMARC policy in exactly one of two places, either in the exact From: domain or on the organization's top level domain. In other words, if a message has a From: of '[email protected]', a receiver will first look for a DMARC TXT DNS record with the name _d...
One problem with (Python) docstrings is that they're local
When I wrote about documenting my Django forms , I said that I knew I didn't want to put my documentation in docstrings, because I'd written some in the past and then not read it this time around. One of the reasons for that is that Python docstrings have to be attached to functions, or more generally, Python docstrings have to be scattered through your code. The corollary to this is that to find relevant docstrings you have to read through your code and then remember which bits of it are releva...
Wayland has good reasons to put the window manager in the display server
I recently ran across Isaac Freund's Separating the Wayland Compositor and Window Manager ( via ), which is excellent news as far as I'm concerned. But in passing, it says: Traditionally, Wayland compositors have taken on the role of the window manager as well, but this is not in fact a necessary step to solve the architectural problems with X11. Although, I do not know for sure why the original Wayland authors chose to combine the window manager and Wayland compositor, I assume it was simply th...