Monitoring - Today we have confirmed the source of the request spike causing this incident.
After today we don't expect a re-occurrence and we have moved the incident state to "Monitoring" while we work with the source of these requests to remove it's negative impact.
Jul 11, 08:19 BST
Update - During there last occurrences we narrowed down the cause of the request spike as coming from our api (api.serverdensity.io) and not from eg. the user facing app or incoming device payloads.
Today we were able to prevent the daily 07:00 UTC occurrence by blocking a set of suspect API calls. This has reduced the issue scope even further, putting us closer to a solution. Today's impact was a 4 minute unavailability (06:58 - 07:02) on that set of API calls.
Jul 9, 08:15 BST
Update - We have kept this incident open this long as this is an event only happening at 07:00 UTC, preventing us from continuously verifying possible corrections. We are continuing to work on it.
We'll update this again tomorrow after 07:00 UTC.
Jul 8, 08:37 BST
Update - Between 06:00 and 06:11 UTC we had a re-occurrence of this incident. The consequence was immediately mitigated but we are still following up on the root cause of this data request spike.
Jul 7, 08:24 BST
Update - Payload processing is normal since 08:15 UTC. We're continuing to work on the cause the observed request spike.
Jul 6, 12:02 BST
Identified - We have identified a reduction in our device payload processing capacity caused by an abnormal data request. This may show on some devices as missing metrics data. Alerting is not affected.
We've adjusted capacity while we identify and resolve the request spike.
Jul 6, 08:49 BST
Alerting ? Operational
Alert Delivery Operational
SMS Operational
E-mail Operational
PagerDuty (Incident Creation) Operational
PagerDuty (Notification Delivery) Operational
Slack Operational
Webhooks Operational
HipChat Operational
Push notifications (global) ? Operational
Push notifications (iOS) Operational
Push notifications (Android) Operational
Agent payloads ? Operational
API Operational
Availability monitoring ? Operational
Web UI Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Past Incidents
Jul 24, 2017

No incidents reported today.

Jul 23, 2017

No incidents reported.

Jul 22, 2017

No incidents reported.

Jul 21, 2017

No incidents reported.

Jul 20, 2017
Resolved - We have not seen re-occurrences of this incident from our provider and Server Density systems have been operating as expected since the network connectivity was restored at 07:09 UTC.
We'll share our provider post-mortem as soon as it becomes available.
Jul 20, 08:56 BST
Monitoring - We have just received confirmation from our provider: "Backend network connectivity has been restored to affected customer hosts."
We are monitoring the systems closely in case of another occurence.
Jul 20, 08:22 BST
Update - Update from our provider "Network connectivity was briefly restored to affected customer hosts however the issue has recurred. Engineers are still working with hardware vendor to mitigate the disruption to customer hosts."
Jul 20, 07:59 BST
Update - Update from our provider: "Attempts to restore connectivity to affected customer VLANs have been unsuccessful. Engineers are continuing to investigate and are reaching out to the hardware vendor for assistance."
Jul 20, 06:50 BST
Update - Update from our provider: "Engineers have identified that some customer VLANs on bcr01a.wdc04 are down. This is the cause of the disruption for some customer hosts behind bcr01a.wdc04 and Engineers are working to recover service to these hosts as quickly as possible."
Jul 20, 05:59 BST
Identified - Update from our provider: "Backend private network connectivity may disrupted for customer in wdc04, pod01. Networking staff are investigating at this time and working to restore connectivity to affected customer hosts."

We are working on restoring services to normal operation.
Jul 20, 05:59 BST
Update - One of our providers is performing emergency maintenance which is affecting several of our services:

"Network Engineers have identified a software defect on a redundant backend customer router (BCR), bcr01.wdc04, which provides private connectivity for your server(s). To prevent an unplanned outage resulting in significant downtime, Engineers will be upgrading the firmware during an EMERGENCY maintenance window at the time referenced above."

The period of the maintenance window is: 04:00 UTC - 06:00 UTC
Jul 20, 05:53 BST
Investigating - We are currently working on identifying the cause of the stall, during this period you might have issues logging in to the UI and gaps might be visible.
Jul 20, 05:45 BST
Jul 19, 2017

No incidents reported.

Jul 18, 2017

No incidents reported.

Jul 17, 2017

No incidents reported.

Jul 16, 2017

No incidents reported.

Jul 15, 2017

No incidents reported.

Jul 14, 2017

No incidents reported.

Jul 13, 2017

No incidents reported.

Jul 12, 2017

No incidents reported.

Jul 10, 2017

No incidents reported.