distributed system architecture
Fault tolerant distributed system failure; real world example and mitigation
Starting October 28th and fully resolving on October 31st, Roblox experienced a 73-hour outage.¹ Fifty million players regularly use Roblox every day and, to create the experience our players expect, our scale involves hundreds of internal online services. As with any large-scale service, we have service interruptions from time to