All Systems Operational
Alerting ? Operational
Alert Delivery Operational
SMS Operational
E-mail Operational
PagerDuty (Incident Creation) Operational
PagerDuty (Notification Delivery) Operational
Slack Operational
Webhooks Operational
HipChat Operational
Push notifications (global) ? Operational
Push notifications (iOS) Operational
Push notifications (Android) Operational
Agent payloads ? Operational
API Operational
Availability monitoring ? Operational
Web UI Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Past Incidents
Feb 19, 2017

No incidents reported today.

Feb 18, 2017
Today, between 00:11 and 15:17 UTC, notifications generated from alerts were queued but not sent. At this time the notification queue has been consumed with all pending notifications delivered. We will be publishing a postmortem of this incident on Monday with details of the work already ongoing to prevent a reoccurrence.
Feb 18, 16:05 GMT
Feb 17, 2017

No incidents reported.

Feb 16, 2017

No incidents reported.

Feb 15, 2017

No incidents reported.

Feb 14, 2017

No incidents reported.

Feb 13, 2017

No incidents reported.

Feb 12, 2017

No incidents reported.

Feb 11, 2017

No incidents reported.

Feb 10, 2017

No incidents reported.

Feb 9, 2017

No incidents reported.

Feb 8, 2017
Resolved - We have not seen a recurrence of this issue in over 24h after deploying a code fix. So we're considering this resolved and will publish a detailed postmortem in a few days.
Feb 8, 16:50 GMT
Monitoring - Graphs loading is now normalized. A gap may be visible between 12:48 and 13:05 UTC. This will even out in the next few days.
We are continuing to monitor for re-occurences closely while we work on the root cause.
Feb 6, 13:16 GMT
Identified - This is happening again now and we are working to restore graphs.
Feb 6, 13:03 GMT
Monitoring - We have normalized all redundancy after confirming the replacement members are working within the expected parameters.
This will now be monitored closer for the next few hours.
Feb 5, 23:08 GMT
Update - The slowest members have been removed and graphs should be loading faster now as well as gaps should be gone. We're continuing to work on this to normalize redundancy.
Feb 5, 22:20 GMT
Identified - We have identified a slow member in one of our metrics clusters which powers graphs and are removing it from rotation.
Some graphs gaps may be showing.
Feb 5, 21:53 GMT
Feb 7, 2017

No incidents reported.

Feb 6, 2017
Resolved - The notification queue completed consuming at 13:45 UTC.
Notification delivery has been in real time since then.
Feb 6, 14:07 GMT
Identified - We have identified a stuck notification processor and have restarted it. The notification queue is now consuming. This is delaying notification delivery.
Feb 6, 13:02 GMT
Investigating - We're currently investigating a drop in notifications processing.
Feb 6, 12:28 GMT