We're happy that we now have this issue resolved since last night, and have been monitoring overnight to ensure the issue doesn't reappear. We're currently looking at possible causes and ways we can mitigate this in the future.
Mar 11, 12:55 GMT
The fix we were attempted looks good and we are processing payloads again. Monitoring to ensure things remain stable, and we'll be investigating the root cause in the morning. It looks like a race condition in some queue processing and DNS.
Unfortunately, any payloads sent from 17.06 until now will not be processed and customers will see a gap in their graphs for this time. No data alerts have been enabled and other alert processing is active again.
Mar 10, 20:17 GMT
We've identified a race condition that seems to be causing this issue, related to DNS caching. Currently testing a fix for this.
Mar 10, 20:00 GMT
We're still investigating this issue and will update when we have more information
Mar 10, 19:45 GMT
We are currently investigating an issue with payloads not being processed correctly from our queues. Any payloads currently being sent will be stored but are not being processed, therefore alerts will not be triggered currently. No data alerting has been automatically stopped by our automated procedures.
We are currently investigating.
Mar 10, 18:32 GMT