Sunday, February 16, 2025

My Experience with Switching from Psalm to PHPStan

Due to Psalm’s lack of support for being installed with PHP 8.4 or PHPUnit 11 at the time (January 15, 2025, prior to the Psalm 6.0 release), I finally gave PHPStan a try.

The big difference that has caused the most trouble is that PHPStan wants iterable/container types to explicitly document their contents.  Any time a method returns array, PHPStan wants to know, array of what? Psalm was happy to observe what the method put in the array for return, and use that de facto type as the developer’s intention.

Outside of the smallest library repositories, that rule got ignored.  It is responsible for maybe 75% of issue reports.  If I can take it from 1200 down to 275 with a single ignore, that is the difference between “there are too many things to deal with” and “I can make a dent in this today.”

The next obvious difference has been that PHPStan is much more interested in handling the potential false returned from preg_replace('/\\W+/', '', $str); calls.  Psalm expected that giving three string arguments to preg_replace() will always result in a string of some sort.

There’s also a class of issues reported by PHPStan due to a disagreement in the other direction. Psalm seemed to think that number_format() returned string|false, requiring an is_numeric() check on the variable.  PHPStan thinks that is redundant, i.e. that number_format() has already returned a numeric-string.

I don’t have a sense yet for how effective PHPStan is at finding problems overall.  In code that was previously checked with Psalm, many defects visible to static analysis have already been removed, leaving little fruit behind for PHPStan to pick.

As of early February, PHPStan can be considered a successful migration.  I haven’t touched PHPStan Pro, but I may try it if I ever want to fix those hundreds of issues with array types.

Sunday, February 9, 2025

The Ruthless Elimination of Differences

I am excited for image-based Linux.  Yes, I usually complain about people upending things just when they get stable, but I think there’s a real benefit here: the elimination of differences.

Why, exactly, does installing Ubuntu have to unpack a bunch of .deb files inside a system? Thousands or millions of machines will go consume CPU to run maintainer scripts, to hopefully produce identical output, when most of the desired result should have been possible to save as an image in the first place.  Upstream should know what’s in ubuntu-minimal!  Looking through a different lens, Gentoo distributes a stage2 image.

In theory, an installation CD could carry the minimal image, the installer overlay, and the flavor’s overlay.  The installer’s boot loader would bring up the kernel, use the minimal+installer pair as root file system, and the installer would unpack the minimal+flavor images into the new disk partition.

“Image-based Linux” more or less takes this one more step, running the entire system directly from the images (or a singular combined image.)  Everyone gets to use the same pre-made images, and bugs become less dependent on the history of package operations.

If any of this sounds like Puppy Linux, that’s not entirely accidental.

This is also the space where things like ABRoot are being introduced.  Image-based Linux lends itself well to having an integrated rollback/recovery pathway. Even on my non-image systems, having “a recovery partition” has been more valuable than I ever anticipated.  It let me test backups without having to work very hard about simulating a disaster. I also created my own recovery partition when I was still using a RealTek USB WiFi device, to avoid being stranded without internet.  (Word to the wise: use Mediatek instead, or an Intel PCIe card is a good non-USB option.)

Image-based Linux and the tools around it are poised to make real improvements to the repeatability and reliability of the systems.  I don’t know when I, personally, might benefit (as my daily driver is macOS now), but I am very excited about the progress being made here.

Sunday, February 2, 2025

We Finally Put Up a WAF

Someone sent an awful lot of requests at a system for long enough that management noticed the issue. Working with the responsible admin, I ended up proposing AWS WAF “to see what would happen.”

What happened: WAF blocked 10,000 requests per minute, and someone got the message.  This released the pressure on the DynamoDB table behind the system, allowing it to jump straight from max to min capacity (1/16th) after fifteen minutes.

It seems some automated vulnerability scanner had gotten into an infinite loop.  There were a lot of repeated URLs in the access logs, like it wasn’t clearing pages from its queue if they got an OK response but unexpected data.  The reason “everything” returns OK is because an unknown URL (outside of a specific static-content prefix) returns a page with the React app root, and lets JavaScript worry about rendering whatever should be there.

I went ahead and put the same WAF on my systems, promptly breaking them.  Meanwhile, our automated testing provider started reporting failures, with every request from the original system returning Forbidden.

The testing platform… is a bot.  I had to write them an exception.

Turning my attention back to my systems, I put together a second WAF so I could have different policies.  My system includes an API or two, so I needed to allow HTTP Libraries and non-browsers. I linked in the exception for the testing platform as well.  Things went much more smoothly after that.

I know that the WAF is fundamentally “enumerating badness,” but it is clearly better than zero filtering.  It is also much less effort and risk, which is why this sort of thing persists.

Sunday, January 26, 2025

Separate Components Allow “Least Privilege”

At work, our AWS image-building process exists as a set of scripts that are run manually, in-order, if everything looks okay to the human at the terminal.

The problem is, the image auto-installs security updates once it launches; if enough accumulate, instances launched from the image start failing to become ready for traffic before the timeout is up.  The system is guaranteed to decay if I don’t periodically build a fresh image, manually.

One reason it isn’t automated (say, rebuilding itself weekly using AWS infrastructure) is that the process takes more permissions than normal business operations.  It must be able to run and terminate instances, tag things, and reconfigure what image is launched by Auto Scaling.  Those would be scary permissions to give to any instances, which generally only have write access to specific data storage locations.

However, if the various steps (build, test, update configuration) each existed as separate objects visible to AWS, we could give each component unique permissions.  The configuration-update step would be the only one to have access to the image ID in SSM Parameter Store.  Likewise, that step would be forbidden from creating or terminating EC2 instances.

Due to implementation details, for us and our system, we could split it into the following components:

  1. Configuration and test script deployment (run by developers off-AWS; stores to S3)
  2. Ubuntu base image lookup (read-only)
  3. Image build sequence (create/wait/destroy EC2 instances, tag instance and image, read configuration scripts from S3, invoke the AWS-maintained SSM Automation to create the image from the instance)
  4. Image test sequence (create/destroy EC2 instances, read the test script from S3)
  5. Configuration update (describe the image, update a specific SSM Parameter Store item)
  6. Garbage collection (read running configuration to determine “unused” status, deregister images, delete their snapshots)

We only need to update the configuration scripts if we make changes to them; otherwise, doing a rebuild simply pulls in the available security updates.  Nothing inside AWS gets special permission to write these files.

After that, we can run step 2 to find an appropriate base AMI on our cron host.  That requires no write access, so all the cron host really needs is permission to start and monitor the latter tasks.  Tasks 3-5, in particular, are simple enough (once inputs are determined in step 2) to be run via AWS SSM Automation. I imagine there will be a “coordinating” automation that runs those three tasks in order, and the granular tasks exist mainly for ease of debugging each one.  Finally, garbage collection is somewhat compute-heavy but requires no waiting, so AWS Lambda might be the best option for it.

The end result will be a major improvement: each step will actually have minimal privileges, and the union of all privileges is much less than the admin privileges that I am technically granting to the process.  It “works” in the sense that we trust my entire laptop, but what if we didn’t need to?

Sunday, January 19, 2025

Reminiscing about fvwm

Inspired by Chris Siebenmann talking about his setup and reminiscing about MGR, let’s jump in the Epoch and set the dial for 2002.

I was in college, absolutely blown away by the customizability of X11.  You could have a “full desktop” like CDE on the Suns, or KDE/Gnome on my home computer (variously Linux and FreeBSD); or else you could choose to run with only a window manager.  Those ranged from quasi-desktops like WindowMaker and Fluxbox, down to minimalist options like Ratpoison (so named because it was intended to be used without the mouse), with what I can only describe as “normal” options in between.

In this milieu, I found fvwm2 and really dug into that.  The configuration makes it more of a “window manager construction kit” than a fully-defined window manager.  (In contrast, one could barely do anything but theme Metacity.)  I put my window title bars down the left side, because it let me have ten more pixels of vertical space!  And, let’s be honest, just because I could.

I had keyboard shortcuts for everything.  Nobody in Unix space used Macs in those days; we were busy calling the iMac a “lampshade.” Therefore, Linux applications all used ctrl/alt/shift for shortcuts, and left the Windows key (Super) free for my window management shortcuts.  Oh, and I think I drew my own icons, so I could have Amiga-style raise/lower buttons in the title bars.

I also found out I had limits.  I configured a 3x3 grid of virtual desktops each with 3x3 pages on them, but even “9 places to look for windows” was absurdly oversized.  In practice, I only ever used 3 pages on 1 desktop.  By the time I wrote Layer Juggling, I had forgotten about fvwm’s “pages” layer entirely.

I had a great run with fvwm and WindowMaker for 2–4 years there, but it soon became clear I was trading features (like having a volume control, or instantly applying themes) for an environment that nobody else could use.  Meanwhile, I was losing familiarity with Windows, which would slow me down if I were using anyone else’s computer.  I switched to KDE by 2004, and I would eventually, relucantly, capitulate and try Gnome again in 2008.  More or less just in time for Gnome and Canonical to blow everything up again!

Ironically, over time, there has been less need to use other peoples’ computers.  Besides which, I have kept using Dvorak for 20+ years now, despite that being a much bigger issue when switching systems.

Sunday, January 12, 2025

Systemd Allows Unknown Units in Before/After

Most of the time, my development virtual machine guest would boot and run perfectly fine.  Sometimes, though, the FastCGI service backing one of the websites would not be up and running.  It had a ConditionPathExists, and if the code to run the service wasn’t mounted, it wouldn’t start.

The intention was to allow colleagues to import a copy of this guest, then set up the mount to share the project from the host as they saw fit.  On their first boot, with no sharing, ConditionPathExists would prevent the FastCGI service from attempting to start, and therefore, systemd would not report that the system was degraded.  Another point about this system is that the sharing mechanism is unspecified: colleagues are free to use NFS (as I do), Plan9 file sharing, or the hypervisor’s shared-files mechanism.  The host paths are also unspecified, so there is no way I can set up the guest to expect specific sharing in advance.

In practice, sometimes NFS wasn’t ready in my guest before systemd was checking conditions for the FastCGI service.  The obvious answer was to add After=remote-fs.target to the FastCGI service.  I quickly added a drop-in to add this directive to my own post-configuration scripts.

However, that’s a local solution to a global problem.  My colleagues can’t benefit from that, and I should minimize the burden on them periodically setting up new guest images.  The fewer things they must remember, the better.

It turns out the answer was even simpler: I could skip the drop-in and add the After= line to the main service file. I added both remote-fs.target and the hypervisor’s guest services to the line, which means:

  1. In production, there are no remote filesystems to mount, nor guest services; there is no latency introduced.
  2. When using NFS or similar, systemd waits for the remote filesystem before starting the FastCGI service.
  3. With the hypervisor’s file sharing, the guest services mount the shared files before starting the FastCGI service.

My guest doesn’t actually have the guest services installed, but the FastCGI service starts up as intended.  Looking at systemctl list-units --all output, the guest services are (now) listed as not-found and inactive, which is pretty much what I would expect from a dangling reference.  systemd knows about it because I listed it in After, but since it’s not required by anything, the missing definition for it doesn’t cause any problems.

Sunday, January 5, 2025

Residual Config Without Config Files

apt makes a distinction between “removed” and “purged.” In both, the packages are uninstalled; in the former state, config files remain, and in the latter, those are also removed.  Actually, that’s not quite the whole story.

A package can have no configuration files, yet still be in ”residual config” state when removed.  This happens if a package defines a postrm maintainer script. These can have basically any shell commands in them, so their actions aren’t visible in any list-of-files.

The specific package I was looking into was a library, with a postrm script that ran ldconfig… during removal.  The package was being shown in residual-config state because it had a script.  Although that script would do nothing during purge, apt (and dpkg) can’t know that.

How to list residual-config packages: apt list 2>/dev/null | grep residual-config or dpkg -l | grep ^rc.

Listing configuration files: try one of these answers as this gets real complex, real fast.

Reading a postrm script: look at /var/lib/dpkg/info/{PACKAGE}[:{ARCH}].postrm (the ARCH component may not be present.)