High API Error Rates
Incident Report for Labelbox
Postmortem

A deployment of our API at 8:15 PDT triggered downscaling of the service’s available pods during a period of high traffic. Once the new pods were ready, it took approximately 5 minutes for our autoscaling to rescale the service back up to the number of pods necessary to handle all the traffic, so during that period we saw a high rate of 502 errors.

In response to this issue we have improved our autoscaling logic to start spin up 10 new instances before switching traffic to the new deployment version.

Posted Jul 17, 2019 - 22:09 UTC

Resolved
https://api.labelbox.com had a high 502 error rate for 8mins while a new deployment was being released. Some users experienced problems logging in during this time period.
Posted Jul 15, 2019 - 20:15 UTC