On July 6th at 07:45 UTC we added capacity to mitigate a data request spike that impacted payload processing capacity. This may have shown on some customer devices as graph gaps and delayed alerting.
The next day, sometime after 07:00 UTC, we saw the same event happening again. We applied the same mitigation and were able to reduce the impact to a few minutes between 07:00 and 07:11 UTC.
In the following couple of days we saw the same event and were able to eliminate impact by keeping the mitigation in place. We then were able to identify a set of abnormal API calls to https://api.serverdensity.io. These were legitimate customer requests adhering to the API specification but their timing and size characteristics propagated an impact to the payload processing engine.
In these past weeks, we have worked with the requesting customers and reviewed our API implementation to prevent this issue from happening again.