I had a hard time getting IPv6 to work properly on my VPS. It has a static address, which I published to DNS (ages ago), but it wasn’t fully operational. It wasn’t obvious, because it was able to accept and respond to incoming IPv6, but it was not able to generate outgoing IPv6 connections. Thanks to Happy Eyeballs, the system cheerfully fell back to IPv4 and left me none the wiser. Probably for years. (Since inbound traffic could be responded to, the IPv6 network-transfer graph looked plausible, too.)
Sunday, March 23, 2025
Sunday, March 16, 2025
Spammers Get Confused About Temporary Errors
When I wrote Everything Needs Rate Limits, I mentioned in passing that the disk-full state prevented receiving email. The MTA was returning a temporary “try again later” code, but the clients weren’t responding. I got several session transcripts emailed to me that were of the form:
> EHLO some-host-name
< 200 OK + capabilities listing
> MAIL FROM sender-address
< 200 OK
> RCPT TO recipient-address
< 415 Resource unavailable, try again later
> RCPT TO recipient-address
< 415 Resource unavailable, try again later
> RCPT TO recipient-address
< 415 Resource unavailable, try again later
> DATA …
< 500 No recipient given
X Connection lost
The client is recognizing that they’re getting some sort of error, but their idea of “later” was milliseconds later, and the disk space problem was not being cleared at CPU speeds. After enough rejections of their recipient-address, they YOLO’d it and sent the email body, to no avail.
(I also think it’s pretty interesting that the MTA is happy to tell everyone else ENOSPC
, yet deliver these error emails through to me. I suppose it purposely stops accepting email with some reserve disk space, so that it can continue to deliver critical errors for a while.)
Sunday, March 9, 2025
Finding When You Pushed and Pulled a Git Repository
I found myself needing to know if I had pushed something before or after some operational event. The commit itself was dated late on Friday before the event; could it have been deployed then, or did I wait until Monday to push?
It turns out that git reflog
has the answers; as it's a log of what the branch tip pointed to, it logs commits, amends, and push/pull activity. It doesn't normally include a date in the output, but we can convince it to give us one by using the --pretty
formatting option.
git reflog --pretty='%cd %h %gd %gs'
This format means “commit date”, Thu Nov 28 17:56:08 -0500
style; the hash; the “shortened reflog selector,” like HEAD{1}
; and finally, the reflog summary, which has information like commit (amend): Awesome subject line
.
Within the reflog, for push/pull events, the “commit date” is actually the date of the push/pull completing its task.
The major limitation here is that the reflog is local; we can only look at our own activity with this method.
However, in my situation, that was enough to exonerate my commit, allowing us to turn our attention elsewhere for an explanation.
Sunday, March 2, 2025
Everything Needs Rate Limits
For reasons of “anything else would cost more,” my web server and email MTA are running on the same VPS. Which apparently means, if a known issue in some web-side software fills the disk, email quits flowing in.
Fortunately, I was at liberty to investigate immediately. The root cause was a bot making things up and filling a cache with negative responses. The actual bot was promptly banned. Not only via robots.txt
(which used to return text "# 200 OK Have Fun" so that I wouldn’t get 404 errors for it) but via User-Agent in the Web server, since it had obviously already seen the permissive robots.txt contents.
(Also, the only email that failed to be received were spam messages. “Lucky” me!)
It worried me, though. What if another bot did this? Am I going to play Whac-A-Mole® with it? (Don’t get me wrong—that’s not a bad game, but this version doesn’t give me any tickets redeemable for prizes at the front counter.)
To give me more runway to respond to future problems, I added per-source rate limits. This should prevent the cache from filling in units of MB/s. Concurrently, there is also a custom disk-usage alarm that will activate if free space falls below a “normal” amount, giving me a chance to catch problems before the MTA starts refusing service again.
A global rate limit for new connections is still being considered. The current bot menace has been beaten for today, and future bots from a single IP will also find themselves running into limits, but someone running a crawler network could still cause plenty of trouble. The problem is that a global rate limit is probably something that would be HTTP-oblivious, responding with an RST packet instead of an application-layer 429 message.
I know the “everything needs rate limits” is common wisdom, at least in some circles, but I ran public sites for at least 14 years without them. Sadly for the nostalgia, it seems those days are gone.
Wednesday, February 26, 2025
AWS Auto Scaling, Load Balancers, and Availability Zones
We ran into an odd situation: one of our EC2 instances was attached to a target group, but “unused” on the load balancer, which ended up tripping the “not enough healthy hosts” alarm.
This ended up being my fault: I had removed a subnet (thus, availability zone) from the load balancer, but did not realize it was still associated with the Auto Scaling group.
When increasing capacity, AWS picked one of the four subnets available on the group, as instructed. The instance was cheerfully launched into that subnet, but it happened to be the one that wasn’t associated with the load balancer. Conversely, the fifth zone the load balancer used to have was pointless, because Auto Scaling would never launch an instance there.
(This all came about as part of an effort to reduce our availability zone footprint and improve colocation of resources. There are probably diminishing returns, and three zones to a region should be enough.)
tl;dr: Auto Scaling has a subnet configuration that is independent of any load balancer’s configuration. For successful operation, the load balancer must have every subnet that is assigned to any Auto Scaling group using that load balancer.
Sunday, February 23, 2025
Podman Desktop Isn’t Great
Since buying a new computer, my primary desktop is no longer Linux, and containers are no longer native. I decided that the path of least resistance would be to try Podman Desktop, but it leaves a lot to be desired.
Sunday, February 16, 2025
My Experience with Switching from Psalm to PHPStan
Due to Psalm’s lack of support for being installed with PHP 8.4 or PHPUnit 11 at the time (January 15, 2025, prior to the Psalm 6.0 release), I finally gave PHPStan a try.
The big difference that has caused the most trouble is that PHPStan wants iterable/container types to explicitly document their contents. Any time a method returns array
, PHPStan wants to know, array of what? Psalm was happy to observe what the method put in the array for return, and use that de facto type as the developer’s intention.
Outside of the smallest library repositories, that rule got ignored. It is responsible for maybe 75% of issue reports. If I can take it from 1200 down to 275 with a single ignore, that is the difference between “there are too many things to deal with” and “I can make a dent in this today.”
The next obvious difference has been that PHPStan is much more interested in handling the potential false
returned from preg_replace('/\\W+/', '', $str);
calls. Psalm expected that giving three string arguments to preg_replace()
will always result in a string of some sort.
There’s also a class of issues reported by PHPStan due to a disagreement in the other direction. Psalm seemed to think that number_format()
returned string|false
, requiring an is_numeric()
check on the variable. PHPStan thinks that is redundant, i.e. that number_format()
has already returned a numeric-string.
I don’t have a sense yet for how effective PHPStan is at finding problems overall. In code that was previously checked with Psalm, many defects visible to static analysis have already been removed, leaving little fruit behind for PHPStan to pick.
As of early February, PHPStan can be considered a successful migration. I haven’t touched PHPStan Pro, but I may try it if I ever want to fix those hundreds of issues with array
types.