Sunday, October 6, 2024

Pulling at threads: File Capabilities

For unimportant reasons, on my Ubuntu 24.04 installation, I went looking for things that set file capabilities in /usr/bin and /usr/sbin.  There were three:

  • ping: cap_net_raw=ep
  • mtr-packet: cap_net_raw=ep
  • kwin_wayland: cap_sys_resource=ep

The =ep notation means that only the listed capabilities are set to “effective” and “permitted”, but not “inheritable.”  Processes can and do receive the capability, but cannot pass it to child processes.

ping and mtr-packet are “as expected.”  They want to send unusual network packets, so they need that right.  (This is the sort of thing I would also expect to see on nmap, if it were installed.)

kwin_wayland was a bit more surprising to see.  Why does it want that?  Through reading capabilities(7) and running strings /usr/bin/kwin_xwayland, my best guess is that kwin needs to raise its RLIMIT_NOFILE (max number of open files.)

There’s a separate kwin_wayland_wrapper file.  A quick check showed that it was not a shell script (a common form of wrapper), but an actual executable.  Could it have had the capability, set the limits, and launched the main process?  For that matter, could this whole startup sequence have been structured through systemd, so that none of kwin’s components needed elevated capabilities?

The latter question is easily answered: no.  This clearly isn’t a system service, and if it were run from the user instance, that never had any elevated privileges.  (The goal, as I understand it, is that a systemd user-session bug doesn’t permit privilege escalation, and “not having those privileges” is the surest way to avoid malicious use of them.)

If kwin actually adjusts the limit dynamically, in response to the actual number of clients seen, then the former answer would also be “no.”  To exercise the capability at any time, kwin itself must retain it.

I haven’t read the code to confirm any of this.  Really, it seems like this situation is exactly what capabilities are for; to allow limited actions like raising resource limits, without giving away broad access to the entire system.  Even if I were to engineer a less-privileged alternative, it doesn’t seem like it will measurably improve the “security,” especially not for cap_sys_resource.  It was just a fun little thought experiment.

Sunday, September 29, 2024

Scattered Thoughts on Distrobox

Distrobox’s aim is to integrate a container with the host OS’ data, to the extent possible.  By default, everything (process, network, filesystem) are shared, and in particular, the home directory is mounted into the container as well.  It is not even trying to be “a sandbox,” even when using the --unshare-all option.

I also found out the hard way that Distrobox integrates devices explicitly. If some USB devices are unplugged, the container will not start.  This happened because I pulled my laptop away from its normal dock area (with USB hub, keyboard, and fancy mouse) and tried to use a distrobox.  Thankfully, I wasn’t fully offline, so I was able to rebuild the container.

Before it stopped performing properly in Ubuntu Studio 23.10, I used distrobox to build ares and install it into my home directory.  This process yielded a ~/.local/share/applications/ares.desktop file, which my host desktop picked up, but which would not actually work from the host.  I always needed to be careful to click the “ares (on ares)“ in the menu after exporting, to start it on the ares container.

I have observed that distroboxes must be stopped to be deleted, but then distrobox will want to start them to un-export the apps.  Very recent distrobox versions ask whether to start the container and do the un-exporting, but there’s still a base assumption that you wanted the distrobox specifically to export GUI apps from it.  It clearly doesn’t track whether any apps are exported, because it always asks, even if none are.

Sunday, September 22, 2024

CloudSearch's Tricky prefix Operator

We ran into an interesting problem with CloudSearch.  Maybe I did it wrong, but I stored customer names in CloudSearch as “text” type with “English” analysis.  We do generic-search-bar scans with prefix searches, like (or (prefix field=name 'moon') (prefix field=address 'moon')).

Then, a developer found that a search term of “john” would find customers with a name of “johns”, but a search for “johns” would not!  The root cause turned out to be that the English analyzer stems everything that is a plausible plural, storing “Johns” as “john”.

Normally, this isn’t a problem.  When—and only when—using a prefix search, stemming is not applied to the terms for those matches.  Thus, doing a prefix search of “johns” will match “johnson” but not “johns”.  Doing a regular search through the CloudSearch Console will turn up the expected customers, and so might checking the database directly, adding to the confusion.  It even works as expected with most names, because “Karl” or “Packard” don’t look like plurals.

We added a custom analyzer with no stemming, set our text fields to use it, and reindexed.

Sunday, September 15, 2024

The Wrong Terminal

Somewhere in my Pop!_OS 22.04 settings, I set Tilix as the preferred terminal emulator. When I use the Super+T* keyboard shortcut, I get a Tilix window.  However, when I use a launcher that Distrobox has created for a container from the Super+A (for Applications) UI, the command-line doesn’t come up in Tilix… it comes up in gnome-terminal instead. Why is that, and can I fix it?

AIUI, all the Application Launcher UI does is ask the system to open the .desktop file that Distrobox added.  That file has the “run in terminal” option, but lacks the ability to request some specific terminal. That gets chosen, eventually, by the GIO library.

GIO isn’t desktop-specific, so it doesn’t read the desktop settings.  It actually builds in a hard-coded list of terminals that it knows how to work with, like gnome-terminal, konsole, and eventually (I am assuming) ye olde xterm.  It walks down the list and runs the first one that exists on the system, which happens to be gnome-terminal.  AFAIK, there is no configuration for this, at any level.

It is also possible that one of the distributions in the chain (Debian, Ubuntu, or Pop!_OS) patched GIO to try x-terminal-emulator first.  If so, it would go through the alternatives system, which would send it directly to gnome-terminal, since between that and Tilix, gnome-terminal has priority.  We are deep into speculative territory, but if all of that were the case, I could “make it work” by making a system-level change resulting in all users now preferring Tilix… but only for cases where x-terminal-emulator is involved, specifically.

I want the admin user to have all the defaults, like gnome-terminal, because the less deviation made in that account, the less likely I am to configure a weird problem for myself.** (Especially for Gnome, which has a configuration system, but they don’t want anyone to use it.  For simplicity.) Changing the alternatives globally is in direct contradiction to that goal.

It seems that the “simplest” solution is to change the .desktop file to avoid launching in a terminal, and then update the command to include the desired terminal.  It would work in the short term, but fall “out of sync” if I ever changed away from Tilix as default in the desktop settings, or uninstalled Tilix.  It’s not robust.

It seems like there’s some sort of desktop-environment standard missing here.  If we don’t want to invoke threads or communication inside GIO, then there would need to be a way for the Gnome libraries to pass an “XDG Configuration” or something, to allow settings like “current terminal app” to be passed in.

If we relax the constraints, then a D-Bus call would be reasonable… but in that case, maybe GIO could be removed from the sequence entirely.  The Applications UI would make the D-Bus call to “launch the thing,” and a desktop-specific implementation would pick it up and do it properly.

It seems like there should be solutions, but from searching the web, it looks like the UX has “just done that” for years.

* Super is the |□| key, because it's a System76 laptop, a Darter Pro 8 in particular.

** As a side effect, this makes the admin account feel like “someone else’s computer,” which makes me take more care with it.  I may not want to break my own things, exactly, but I would feel even worse about breaking other people’s stuff.

Monday, September 9, 2024

Some Solutions to a Problem

We have an EC2 instance that has a quickly-produced shell script that runs on boot.  It sets a DNS name to the instance’s public IPv4 address.  Since time was of the essence, it hard-codes everything about this, particularly, the DNS name to use.

This means, if we want to bring up a copy of this instance, based on a snapshot of its root volume, the copied instance will overwrite the DNS record for the production service. We need to stop this.

As a side project, it would be nice to remove the hard-coding of the DNS name.  It would be trivial to “stop DNS name conflicts” if we did not have a DNS name stored on the instance’s disk image to begin with.

What are the options?

Sunday, September 1, 2024

A Problem of Semantic Versioning

For a while, we’ve been unable to upgrade to PHPUnit 11 due to a conflict in transitive dependencies.  The crux of the problem is:

  1. Psalm (5.25.0) directly requires nikic/php-parser: ^4.16, prohibiting 5.x.
  2. PHPUnit (11.3.1) transitively requires nikic/php-parser: ^5.1, prohibiting 4.x.

It is possible in the short term to retain PHPUnit 10.x, but it brings to light a certain limitation of Semantic Versioning: it tells you how to create version numbers for your own code base, but it does not carry information about the dependencies of that code.

When the required PHP runtime version goes up, what kind of change is that?  SemVer prescribes incrementing the major number for “incompatible API changes,” or the patch for “backward compatible bug fixes.”

So, is it a bug fix?  Is it incompatible? Or is the question ill-formed?

It feels wrong to increment the patch version with such a change.  Such a release states, “We are now preventing the installation on {whatever Enterprise Linux versions} and below, and in exchange, you get absolutely nothing. There are no new features.  Or perhaps we fixed bugs, but now you can’t access those fixes.”  That sounds… rude.

Meanwhile, it seems gratuitous to bump the major version on a strict time schedule, merely because an old PHP version is no longer supported upstream every year.  It appears to cause a lot of churn in the API, simply because making a major version change is an opportunity to “break” that API.  PHPUnit is particularly annoying about this, constantly moving the deck chairs around.

In between is the feature release.  I have the same misgivings as with the patch version, although weaker.  Hypothetically, a project could release X.3.0 while continuing to maintain X.2.Y, but I’m not sure how many of them do.  When people have a new shiny thing to chase, they don’t enjoy spending any time on the old, tarnishing one.

What if we take the path of never upgrading the minimum versions of our dependencies?  I have also seen a project try this.  They were starving themselves of contributors, because few volunteers want to make their patch work on PHP 5.2–8.1.  (At the PHP 8.1 release in 2021, PHP 5.2 had reached its “end of life” about 11 years prior, four years after its own release in 2006.) Aside from that issue, they were also either unable to pick up new features in other packages they may use, or they were forever adding run-time feature detection.

As in most things engineering, it comes down to trade-offs… but versions end up being a social question, and projects do not determine their answers in isolation.  The ecosystem as a whole has to work together.  When they don’t, users have to deal with the results, like the nikic/php-parser situation.  And maybe, that means users will migrate away from Psalm, if it’s not moving fast enough to permit use with other popular packages.

Sunday, August 18, 2024

The Missing Call

I decided to combine (and minify) some CSS files for our backend administration site, so I wrote the code to load, minify, and output the final stylesheet.  I was very careful to write to a temporary file, check even the fclose() return code, rename it into place, and so on.  I even renamed the original to a backup so that I could attempt to rename it back if the first rename succeeded, but the second failed.

For style points, I updated it to set the last-modified time of the generated file to the timestamp of the latest input, so that If-Modified-Since headers will work correctly.

I tested it, multiple times, with various states of having the main and backup filenames. It looked great.  I pushed it out to production… and that wasn’t so great.

We just had no styles at all. Yikes!  I had some logic in there for “if production and minified CSS exists, use it; else, fall back to the source stylesheets.”  I hastily changed that to if (false) and pushed another deployment, so I could figure out what happened.

It didn’t take long.  The web server log helpfully noted that the site.min.css file wasn’t accessible to the OS user.

I had used tempnam(), which created an output file with mode 600, rw- --- ---.  Per long-standing philosophy, the deployment runs as a separate user from the web server, so a file that’s only readable to the deployer can’t be served by the web server.  Oops.

I had considered the direct failure modes of all of the filesystem calls I was making, but I hadn’t considered the indirect consequences of the actions being performed.  I added a chmod(0o644) call and its error check, and deployed again.  After that, the site worked.