PagerDuty notifications not delivered
Incident Report for Server Density
Postmortem

PagerDuty have now published their postmortem for the notifications outage in this incident, which caused PagerDuty to silently drop notifications for a period of 24 hours. The full writeup can be read online. We recommend customers set up multiple notification methods e.g. via e-mail or push notification through our mobile apps to ensure they are always notified even if one method is unavailable.

We are also working on a number of improvements to our monitoring so we can extend the end-to-end testing we conduct on our notification providers. We already do this to the point of acceptance at their APIs but will be adding in additional testing to verify that messages are delivered to the final destination.

Posted Sep 23, 2014 - 17:09 BST

Resolved
Starting earlier today (exact time to be confirmed), notifications sent to PagerDuty via our integration with their API were not successfully delivered. All other notification types were unaffected.

We have a range of mechanisms in place to ensure alerts get delivered correctly but during this time, all calls to the PagerDuty API were successful and returning valid API responses plus incident IDs, so our own monitoring was showing everything was working correctly.

After we manually noticed the missing alerts at 18:35 UTC, we notified PagerDuty and they diagnosed and pushed out a fix at around 21:30 UTC.

We put a lot of time into ensuring alert delivery so this has highlighted that we need to do further checks against 3rd party integrations to ensure that when they say they have received the alert data, that the next expected actions happen. In PagerDuty's case, we will be implementing regular, automated checks against their API to send test events and then ensure they are actually created as incidents within PagerDuty.

We are waiting for a root cause analysis and time period for the incident from PagerDuty and will post further information once we have it.
Posted Sep 11, 2014 - 23:20 BST