How Our Advanced Engineering Tools Are Shaping Immersive Worlds
-
As a massive-scale platform serving 85.3 million daily active users*, Roblox delivers a broad range of experiences with high reliability and low latency. Achieving this level of performance depends on cutting-edge engineering tools and processes that are frictionless, streamlined, and continually push the boundaries of innovation.
-
More than 1,000 Roblox engineers use three main products—our microservice lifecycle platform, our code center, and our advanced observability platform.
-
We’ve significantly reduced downtime and consistently decreased the mean time to mitigate by 50% in two consecutive years.
-
Our newly developed engineering productivity index (EPI) provides a comprehensive view of our engineering efforts, and in Q4 2024, the overall EPI improved by 12.9% year-over-year.
At Roblox, we’re reimagining the way people connect and communicate in immersive worlds. It’s an ambitious endeavor, and manifesting it requires a wide range of innovative and industry-leading infrastructure.
Roblox is a global platform, and our 85.3 million daily active users* can communicate with each other in real time across many different languages, platforms, and devices, from low-end Android phones to high-end consoles. We support numerous modalities of content—text, voice, 3D data, and more—all with extremely high reliability and low latency.
For our engineers to support these specialized needs, they rely on frictionless, streamlined and reliable tools and processes, most of which we’ve built ourselves. Today, we’re excited to showcase some of the innovative tools and strategies we’re using to build the future of Roblox, as well as a preview of some we’re planning on building soon. With fast and efficient tools and practices like these, our goal is to make Roblox a highly attractive destination for talent.
Innovating With State-of-the-Art Engineering Tools
At the heart of our engineering productivity strategy are three tools: our microservice lifecycle platform, our code center—an inner loop development tool—and our advanced observability platform. Together, these tools enable more than a thousand Roblox engineers to tackle challenging problems.
Application Lifestyle Management Platform
Our application lifestyle management platform is a homegrown microservice that allows engineers to easily create, deploy, monitor, and debug thousands of microservices—all in a single, streamlined interface. Prior to this platform, managing microservices at Roblox came with a steep learning curve, inefficient manual processes, and frequent context switching between internal tools.
The application lifestyle management platform eliminated this dynamic and empowered our engineers to spend less time managing tools and processes and more time solving complex technical challenges, improving systems, and delivering impactful features for our users.
Code Center
Designed within Roblox to refine our inner-loop processes, our code center reduced the time-consuming process and friction that engineers faced during code reviews. The tool expedites reviews and enhances communication through real-time Slack notifications and scheduled digests. In this way, the tool ensures timely feedback that leads to higher-quality code reviews and quicker iteration.
The code center has quickly become a vital tool for Roblox engineers looking to optimize their coding activities, with pull requests already seeing a 20% improvement in the P75 time needed to land changes.
Advanced Observability Platform
Our advanced observability platform seamlessly integrates homegrown, open-source, and vendor solutions, offering a highly specialized infrastructure with a strong focus on reliability and scalability. Every day, this infrastructure collects billions of time series and tens of terabytes of structured runtime information—logs, traces, system events, profiling data, and more—that help our engineers monitor, debug, and test efficiently with confidence.
At the same time, we’re also dedicated to exploring how automation and AI can further improve our processes. For instance, we integrated a comprehensive set of default alerts covering latency, traffic, errors, and saturation across more than 1,500 microservices using our common microservice framework, all without a single line of code. We also enhanced our continuous deployment system with automated canary analysis, which successfully prevented hundreds of bugs from reaching our production environment in just the first six months after launch.
These innovations not only have a major impact internally, they also enhance the everyday experience of Roblox users. We’ve significantly reduced downtime and consistently decreased the mean time to mitigate (MTTM) by 50% in two consecutive years. The end result is a more seamless and reliable experience for everyone in our ecosystem.
Measuring and Enhancing Engineering Productivity
Building tools isn’t very helpful without ways to tell if they’re working. So we’ve worked hard to improve productivity at Roblox while also working to understand productivity at Roblox and what impacts it.
To that end, our newly developed engineering productivity index (EPI) provides a comprehensive view of our engineering efforts that’s similar to how a car’s dashboard displays an overview of the vehicle’s performance and health.
In the fourth quarter of 2024, we improved the overall EPI by 12.9% year-over-year, primarily driven by velocity, while maintaining the same quality bar.
While simpler and more applicable for Roblox purposes than frameworks like DORA or SPACE, this metric offers a holistic evaluation of productivity across our myriad groups and developer archetypes, like Engine and ML. We use the EPI to provide targeted feedback and recommendations that empower teams to monitor and increase their productivity quarter-over-quarter.
The EPI is composed of three elements:
-
Velocity: This element measures the speed of engineering activities by leveraging a broad set of signals, including cycle time and deployment cadence, to measure the pace of development.
-
Quality: This element relies on metrics like code coverage and trunk health to ensure that we know what we need to do to move fast and produce quality products.
-
Self-Reported Productivity: We constantly seek direct and targeted feedback from our engineers. This feedback provides crucial insight into pain points that impact productivity and satisfaction. This type of information has been vital in understanding the challenges that our engineers face and can’t be captured via other metrics. This element also helps shape our roadmap by directly informing our decisions to build solutions like our code center and application lifestyle management platform.
Each of these metrics plays a vital role in overall productivity at Roblox. For example, we don’t want a high velocity score and a low quality score, or vice versa. By improving the EPI, we’re able to optimize all three.
AI and the Next Frontier of Engineering
At Roblox, we’re focused on the craft of engineering. We are also reimagining the future of engineering by pioneering AI-driven tools that transform how engineers create, collaborate, and innovate. With a relentless pursuit of excellence, we are integrating AI into every facet of the development process by incorporating it into our existing tools like our application lifecycle manager platform and code center, leveraging AI-powered coding assistants to accelerate code authoring and reviews, redefine collaboration, and revolutionize how we deploy and maintain services.
Our vision extends beyond automation; we are creating an engineering experience where AI acts as a proactive partner, streamlining workflows, enhancing code quality, and boosting sentiment. By investing in our inner-loop development lifecycle, proactive quality assurance, and Roblox-specific AI integrations, we are paving a path where engineering velocity, innovation, and collaboration reach new heights. The next few years will mark an exciting evolution, and we are committed to making Roblox the ultimate destination for world-class engineers eager to build the next generation of immersive experiences.
* As of the three months ending Dec. 31, 2024.