SRE - Orchestration

at Roblox

(View all jobs)


San Mateo, CA


Production Engineering

Why Roblox

Roblox is ushering in the next generation of entertainment, allowing people to imagine, create, and play together in immersive, user-generated worlds. We’re the one and only fastest-growing entertainment platform that lets anyone teach themselves how to code, publish, and monetize any experience imaginable—across any device—reaching millions of players across the globe.

The impact that you can have at Roblox is powerful. We’re looking for someone who’s eager to take on a meaningful role in the success of Roblox on a massive scale. Someone who takes play seriously, but also isn’t afraid to have some fun either. Someone who’s ready to take Roblox—and their career—to the next level.

In 2018, we were honored to be recognized as a Certified Great Place to Work®. We’ve fostered a company culture that empowers people to do the most defining work of their career in an environment that’s made up of the most passionate, team-oriented, visionary, crazy-smart people you’ll ever meet. Join the Roblox team where play rules and the possibilities are endless.

As a Site Reliability Engineer, you’ll play a contributing role in helping us scale our global network infrastructure at a time of incredible growth for our business. At Roblox, you’ll have boundless opportunities to shape the future of the Imagination Platform™ and demonstrate your passion for delivering awesome solutions in front of a global audience. If you know what it takes to build and maintain large scale infrastructure that can sustain over two million concurrent players year-round and you take play as seriously as we do, you’ll fit right into our highly-skilled and ever-expanding engineering team.

You are:

  • Experienced: you have a BS degree (or equivalent professional experience) in Computer Science or related engineering field with at least 2 years of experience working with containers and orchestration systems (such as Docker and Nomad).
  • A troubleshooting connoisseur: you come ready for action wielding a wealth of container and orchestration experience able to solve issues with containers not launching or other issues that arise from using containerized services.
  • An expert planner: you’re always thinking two steps ahead; you have first-hand experience at capacity planning with different orchestration systems.
  • Self-organized: you’re excited about getting in front of complex problems, effectively organizing your work to get the job done by any means possible; overcoming emergent high-impact issues and contributing to long-running projects comes naturally to you.
  • Problem Solver: you ask the right questions to solve issues within your expertise and you use data to test your theories.

You will:

  • Maintain and deploy Hashicorp products including Nomad, Consul, and Vault.
  • Maintain and deploy Portworx to manage our container storage.
  • Help other teams move their applications to run inside containers on our Hashicorp stack.
  • Maintain our Chef servers and update Chef cookbooks as needed to help manage our infrastructure.
  • Innovate on our platform by helping build out our CI/CD pipelines to automate all our systems.
  • Help standardize our infrastructure using products like MaaS to deploy and manage our physical hardware.
  • Work with other teams in our organization to provide support for the tools they are working to deploy.
  • Participate in the on-call rotation for our critical infrastructure pieces around the globe.

You'll Love:

  • Excellent medical, dental, and vision coverage
  • A rewarding 401k program
  • Flexible vacation policy
  • Free catered lunches five times a week and several fully stocked kitchens with unlimited snacks
  • Onsite fitness center and fitness program credit
  • Annual CalTrain Go Pass
  • A Roblox Admin badge for your avatar

Roblox – Powering Imagination

#LI-EW1

[ID1422]