Server Density uses CloudFlare in front of all our web traffic to provide performance enhancements, a CDN and security functionality. Worldwide traffic is directed to the closest CloudFlare data centre which then proxies the traffic to our own infrastructure.
At 15:29 UTC a large Qatar based ISP misconfigured their BGP announcements resulting in a route leak. The Internet is built on BGP trust between networks (or AS numbers), so if an ISP incorrectly announces an IP address (or in this case a number of prefixes), upstream networks or peers can incorrectly route packets. This change propagated to large carriers such as NTT, TeliaSonera and Level 3 meaning the impact was much larger, and not isolated to Doha, Qatar.
BGPmon (used to assess routing health) notified CloudFlare of the route leak at 15:41 UTC and at 15:42 UTC the Doha POP was removed from production, and requests were sent via other locations. Network engineers reached out to inform the ISP of their misconfiguration and the changes were reverted at 16:04 UTC. Routing was back to normal at 16:08 UTC.
An external timeline and visualisation of this route leak is available at https://bgpstream.com/event/2424 which details the changes.
Between 15:29 UTC and 16:05 UTC, a subset of Server Density monitored devices may have been unable to send post backs to our global endpoints. According to our monitoring, this affected around 20% of customer devices.
Unfortunately, route leaks are notoriously difficult to mitigate due to the fundamental design of the Internet BGP architecture. Some systems we rely on are outside our control but it is still ultimately our responsibility to ensure the availability of our services. We will be evaluating how we can minimise the impact of similar incidents in the future.
In the more medium term, CloudFlare are evaluating IP renumbering for select prefixes, are providing more proactive training for their peers and long term working on reducing the impact caused by route leaks.