For reasons of “anything else would cost more,” my web server and email MTA are running on the same VPS. Which apparently means, if a known issue in some web-side software fills the disk, email quits flowing in.
Fortunately, I was at liberty to investigate immediately. The root cause was a bot making things up and filling a cache with negative responses. The actual bot was promptly banned. Not only via robots.txt
(which used to return text "# 200 OK Have Fun" so that I wouldn’t get 404 errors for it) but via User-Agent in the Web server, since it had obviously already seen the permissive robots.txt contents.
(Also, the only email that failed to be received were spam messages. “Lucky” me!)
It worried me, though. What if another bot did this? Am I going to play Whac-A-Mole® with it? (Don’t get me wrong—that’s not a bad game, but this version doesn’t give me any tickets redeemable for prizes at the front counter.)
To give me more runway to respond to future problems, I added per-source rate limits. This should prevent the cache from filling in units of MB/s. Concurrently, there is also a custom disk-usage alarm that will activate if free space falls below a “normal” amount, giving me a chance to catch problems before the MTA starts refusing service again.
A global rate limit for new connections is still being considered. The current bot menace has been beaten for today, and future bots from a single IP will also find themselves running into limits, but someone running a crawler network could still cause plenty of trouble. The problem is that a global rate limit is probably something that would be HTTP-oblivious, responding with an RST packet instead of an application-layer 429 message.
I know the “everything needs rate limits” is common wisdom, at least in some circles, but I ran public sites for at least 14 years without them. Sadly for the nostalgia, it seems those days are gone.
No comments:
Post a Comment