We have not seen a recurrence of this issue in over 24h after deploying a code fix. So we're considering this resolved and will publish a detailed postmortem in a few days.
Feb 8, 16:50 GMT
Graphs loading is now normalized. A gap may be visible between 12:48 and 13:05 UTC. This will even out in the next few days.
We are continuing to monitor for re-occurences closely while we work on the root cause.
Feb 6, 13:16 GMT
This is happening again now and we are working to restore graphs.
Feb 6, 13:03 GMT
We have normalized all redundancy after confirming the replacement members are working within the expected parameters.
This will now be monitored closer for the next few hours.
Feb 5, 23:08 GMT
The slowest members have been removed and graphs should be loading faster now as well as gaps should be gone. We're continuing to work on this to normalize redundancy.
Feb 5, 22:20 GMT
We have identified a slow member in one of our metrics clusters which powers graphs and are removing it from rotation.
Some graphs gaps may be showing.
Feb 5, 21:53 GMT