Sunday, May 11, 2025

Thoughts from Trying Generators in PHP

I am late to the party, but I have been playing with Generators in PHP more, and running into the limitations of module boundaries.

Some module might produce a Generator so that iteration can be performed in chunks, reducing peak RAM.  For example, producing results one store at a time, instead of loading up all stores into a giant array.  Code that processes an entire database table, but wants to lower lock contention and memory use can also benefit; it can use a Generator to isolate the fetch-in-pages logic from processing the individual records.  The consumer sees one stream of results, while the Generator fetches more as needed.

In short, there are plenty of use cases.

The problem comes when a caller wants to pass “the data” produced by the Generator to another function or method that specifically takes an array.  Once that happens, either the destination needs to be reworked to accept the broader iterable type, or the efforts toward efficiency are erased by an iterator_to_array() call.

(Of course, back when generators were introduced to PHP, I didn’t use type declarations, so I could have gotten away with throwing a generator at something that assumed it would receive an array or PDOStatement. Dealing with larger teams and beginning to use an IDE were both great reasons to add the type information, and the array type forbids passing a Generator in its place.)

A separate issue is that anything consuming a Generator (thus, anything type-hinted iterable) needs to be aware of its once-only nature.  This only sometimes becomes a problem—for instance, if a template wants to output the data set and also some aggregate statistics over it for display before the main output.

Generators can also produce “return” values, which can be fetched by code that knows it is dealing with a Generator after the regular values are produced.  (I might change my mind later, with more experience, but it doesn’t pass the vibe check.  It feels a lot like requiring methods of a class to be called in a specific order, which is usually best to avoid.)  It implies that the entire system should lean into handling Generators in particular, and not allow them to mix with other iterable types.

These are (mostly) things I was vaguely aware of from reading about Python generators, but they weren’t on my mind while writing PHP.

Sunday, May 4, 2025

The fiserv Outage

Editor’s Note: this post was penned offline early Friday evening, before the author had knowledge of the issue being resolved, and fiserv processing the backlog as of 16:45.  We have chosen to simply add some links, now that we are online to retrieve them.  The post follows.

As I write, on Friday, 2025-05-02, fiserv has been offline all day, or substantially all day.  This company acts as a third party to a number of banks, providing wire transfers, ACH, and/or direct deposit services, and possibly even online/mobile banking.  A number of large banks, including Ally Bank, Bank of America, Capital One, and Synchrony have been affected in some way by this outage, as was my regional bank.

I don’t know anything about the root causes yet.  It would be irresponsible to speculate about those causes, so of course I am going to.

Sunday, April 20, 2025

Simplicity can be Imaginary

There’s a comic about simplicity: how an Apple product has one place to touch, a Google product has one search field, and “your company’s app” has dozens of fields with interrelated requirements, obscure codes, strange highlighting, and “…” buttons.

The thing is, for internal or even b2b apps, the user probably knows what kind of thing they have, that they would like to search on.  If they are trying to look up a customer ID, then matching to a PO number is irrelevant; it will just take time and produce extraneous results.  If they can tell the computer directly, “Find customer #33448” then jump straight to the customer record, it saves them an extra round-trip through a search result page they didn’t need.

“Your company’s app” from the comic comes across as more of a data-entry page than the main point of interaction.  One might still organize the form along required/optional dimensions, and put auto-loaded fields in proximity with what will automatically update them.  However, to make the business happen, there’s a minimum amount of data that is genuinely required, that shouldn’t be crammed down to one textarea and parsed back out.

Sunday, April 13, 2025

The Enterprise’s Goals

When I was a n00b on the internets, I heard whisperings about awful, over-complex “rule based systems” out there, somewhere.  Programmers scoffed at them for essentially being programs that were being written by non-programmers; nebulous “managers” allegedly dreamed of replacing expensive programmers with cheap office staff.  I did not understand at the time where these systems came from, if everyone seemed to think they were so bad.

Part of that answer is simple.  “Programmers aren’t ‘everyone.’” Oops.

The other part of that answer turns out to be review and auditing. Anything that exists in code is opaque to the business staff; they largely have to trust the programmers on it, or demonstrate defective outcomes. (And at that point, they need to wait for the necessary programming and deployment for it to be fixed.  If it is a big enough problem that customers or clients are exploiting in the meantime, that delay can become costly.)

Functionality that is exposed as ‘configuration data’ to the office staff becomes reviewable by other office staff, such as managers, and errors can be corrected more quickly.  External auditors can use the same review capability for their own work.  The next problem is that this data might not be flexible enough, which pushes toward the development of conditions and actions, and the rule-based system is born.

It was never about the programmers; it was about the business being able to view its own source code.

Sunday, April 6, 2025

Every Change Might Be Breaking

We originally had the “automatic minor version upgrade” option active at Amazon RDS.  This option simply does not work very well.  Sometimes, for no clear reason (and without notification), it would stop applying upgrades, and require manual updates to get moving again.  We mostly lived with it, and then we hit the worst case scenario: it did perform the upgrade, and then one of our scripts stopped working.

Not only that, it managed to break while I was on vacation.

(Obligatory xkcd about spacebar heating.)

Since then, we don’t use that option.  When I’m good and ready, I peruse the changelogs, then schedule the update to happen when I will be in the office to handle unexpected issues.

For their part, AWS recommends testing the app against the new version of the database before performing any upgrades.  This is implicitly a recommendation against using automatic minor upgrades, because there is no automated process to test the upgrade first.

One knows an analysis tool is looking at AWS with a security-first paradigm when it recommends switching the automatic upgrade option back on for the database.  It is technically correct that new releases MAY contain security fixes, but upgrading to them MAY cause an automated denial of service.  It is not a simple, inconsequential task.

Sunday, March 23, 2025

Some Notes from Fixing my Server’s IPv6 / SLAAC

I had a hard time getting IPv6 to work properly on my VPS. It has a static address, which I published to DNS (ages ago), but it wasn’t fully operational. It wasn’t obvious, because it was able to accept and respond to incoming IPv6, but it was not able to generate outgoing IPv6 connections. Thanks to Happy Eyeballs, the system cheerfully fell back to IPv4 and left me none the wiser.  Probably for years. (Since inbound traffic could be responded to, the IPv6 network-transfer graph looked plausible, too.)

Sunday, March 16, 2025

Spammers Get Confused About Temporary Errors

When I wrote Everything Needs Rate Limits, I mentioned in passing that the disk-full state prevented receiving email.  The MTA was returning a temporary “try again later” code, but the clients weren’t responding.  I got several session transcripts emailed to me that were of the form:

> EHLO some-host-name
< 200 OK + capabilities listing
> MAIL FROM sender-address
< 200 OK
> RCPT TO recipient-address
< 415 Resource unavailable, try again later
> RCPT TO recipient-address
< 415 Resource unavailable, try again later
> RCPT TO recipient-address
< 415 Resource unavailable, try again later
> DATA …
< 500 No recipient given
X Connection lost

The client is recognizing that they’re getting some sort of error, but their idea of “later” was milliseconds later, and the disk space problem was not being cleared at CPU speeds.  After enough rejections of their recipient-address, they YOLO’d it and sent the email body, to no avail.

(I also think it’s pretty interesting that the MTA is happy to tell everyone else ENOSPC, yet deliver these error emails through to me.  I suppose it purposely stops accepting email with some reserve disk space, so that it can continue to deliver critical errors for a while.)