Wednesday, August 7, 2024

AWS CodeDeploy’s Blue/Green Deployment Gotcha

Once, well after I no longer remembered how the whole thing was bootstrapped, I accidentally deleted the target group associated with a CodeDeploy application that was configured for blue/green deployment.  That’s how I found out (rediscovered?) that CodeDeploy doesn’t create a target group for blue/green deployments, it copies an existing one.  Since I had just deleted that existing one, I couldn’t do a (re)deployment and bring the system back online!

(Also, it cemented my opinion that prompts should be like, “Type ‘delete foo-production-dAdw4a1Ta’ to delete the target group” rather than “Type ‘delete’ to delete.” Guess which way the AWS Console is set up.)

I started up an instance to add to a new target group, and it promptly fell over.  The AMI had health monitoring baked in, and one of the health checks was “CodeDeploy has installed the application on the instance.”  Since it was not CodeDeploy starting the instance for the purpose of installing the application, the health check failed, and Auto Scaling dutifully pulled it down to replace it.

Meanwhile, the lack of healthy instances was helpfully sending alerts and bringing my boss’ attention to the problem.

[Now I wonder if it could have worked to issue a redploy at this point.  The group was there to copy, even if the instances weren’t functional.  I guess we’ll never know; I’m not about to delete the target group again, just to find out!]

I ended up flipping the configuration to using EC2 health checks instead of HTTP, and then everything was stable enough to issue a proper redeployment through CodeDeploy.  With service restored, I finally put the health checks back to HTTP.

And then, with production in service again, I finally got to work on moving staging from in-place to blue/green.  Ironically, I would have learned the lesson either way; but by breaking production, it really stuck with me.

No comments: