07/05/2024 Comment
chocolatey
appveyor
I maintain several Chocolatey community packages on GitHub. I am using the chocolatey-au tool to automatically update new versions of software, running in an AppVeyor scheduled build. A few months ago the Zoom package stopped automatically updating, and it appeared that the AppVeyor build agent was not seeing the latest version of the Zoom version feed. After troubleshooting what I thought was a caching issue, I realized that the build agent was still running on a Windows Server 2012 OS. It appears the Zoom version feed observes the user agent HTTP request header, and serves the latest version that your operating system can support. Once I updated the build agent to use the Visual Studio 2022
image, the chocolatey-au tool worked as expected.
#TIL that you can RDP into an AppVeyor build agent to help troubleshoot issues. First you need to add the following to the on_finish
block in your .yml file:
on_finish:
- ps: $blockRdp = $true; iex ((new-object net.webclient).DownloadString('https://raw.githubusercontent.com/appveyor/ci/master/scripts/enable-rdp.ps1'))
This will block the job from succeeding and output the RDP info in the console. Before re-running the job, you need to add an environment variable named APPVEYOR_RDP_PASSWORD
to your build settings. The value of this will be the password of your RDP connection.
07/10/2024 Comment
azure
dev-ops
github-action
Azure App Service deployments using slot swapping is an easy way to do zero-downtime deployments. The basic workflow is to deploy your app to the staging slot in Azure, then swap the slot with production. Azure magic ensure that the application will
cut over to the new version without a disruption as long as you’ve architected your application in a way that supports this. When reading about the az webapp deployment slot swap
command, one of the first steps is -supposed- to warm up the slot
automatically. In reality what we were finding was a lengthy operation (20+ minutes) followed by a failure with no helpful details. As part of the troubleshooting process I added a step to ping the staging slot after deployment, and I saw that it
resulted in random errors, mostly 503 Service Unavailable
. This was happening on several different systems. I ended up adding a few retries and saw that the site would eventually successfully return 200 OK
after a minute or so and then the slot swap
step would work reliably within 2 minutes. It seems that when your site does not immediately return 200 OK
when starting a slot swap, the command would freak out.
In our deployment GitHub Actions jobs, I added the following step to manually warm up the site. The warmup_url
should point to a page on your staging slot. We have it pointing to a health checks endpoint so we can ensure stability before swapping. --connect-timeout=30
will attempt to connect to the site for 30 seconds, then retry 4 times with a 30 second delay in between attempts. -f
argument will tell your retry loop to not fail the step if it receives an http error. --retry-all-errors
is an
aggressive form of retry which ensures a retry on pretty much everything that isn’t a 200 level success code. The second curl
statement instructs the step to fail, which will fail the entire job altogether.
- name: Warmup
if: inputs.warmup_url != ''
run: |
curl \
--connect-timeout 30 \
--retry 4 \
--retry-delay 30 \
-f \
--retry-all-errors \
$
curl \
--fail-with-body \
$
After adding this to all of our slot swap deployments, we consistently saw quite a bit more reliable behavior, and no more frustrating 20 minute black hole failures.