SRE - Telemetry

at Roblox

(View all jobs)


San Mateo, CA


Production Engineering

WHY ROBLOX?

Roblox is ushering in the next generation of entertainment, allowing people to imagine, create, and play together in immersive, user-generated worlds. We’re the one and only fastest-growing entertainment platform that lets anyone teach themselves how to code, publish, and monetize any experience imaginable—across any device—reaching millions of players across the globe. 

The impact that you can have at Roblox is powerful. We’re looking for someone who’s eager to take on a meaningful role in the success of Roblox on a massive scale. Someone who takes play seriously, but also isn’t afraid to have some fun either. Someone who’s ready to take Roblox—and their career—to the next level.
 
In 2018, we were honored to be recognized as a Certified Great Place to Work®. We’ve fostered a company culture that empowers people to do the most defining work of their career in an environment that’s made up of the most passionate, team-oriented, visionary, crazy-smart people you’ll ever meet. Join the Roblox team where play rules and the possibilities are endless.
 
Work with the most passionate, team-oriented, visionary, crazy-smart people you’ll ever meet. The engineers at Roblox are working on the hardest problems in tech today -- distributed systems, real-time communication, 3D co-experience, massive data processing, social networking, rendering, physics, and more. As a Roblox engineer, you will have real ownership and impact across one or more of these domains.
 
The Roblox Telemetry Team is looking for a Site Reliability Engineer (SRE) to help build and operate our core telemetry systems responsible for collection, visualization, and actioning of metrics and logs. You will have the opportunity to be an early part of a team building out a global infrastructure.

You Will:

  • Contribute to the development of systems and tools associated with our production infrastructure
  • Develop and execute on the telemetry strategy
  • Improve user experience by reducing Mean-Time-To-Recovery and preventing/predicting outages
  • Own the telemetry infrastructure

You Have:

  • BA/BS degree in a relevant engineering field or equivalent practical experience
  • Experience with Linux systems and shells, daemons, and processes
  • Experience with programming languages, like Python or Go
  • Experience with distributed systems in a production operations environment
  • Experience working within large scale Internet infrastructure
  • Self-organized and comfortable working in a fast-paced environment
  • Strong knowledge of telemetry systems/tools, like ELK and TICK
  • Systems configuration management experience with automation tools

You'll Love:

  • Excellent medical, dental, and vision coverage
  • A rewarding 401k program
  • Flexible vacation policy
  • Free catered lunches five times a week and several fully stocked kitchens with unlimited snacks
  • Onsite fitness center and fitness program credit
  • Annual CalTrain Go Pass
  • A Roblox Admin badge for your avatar

Roblox – Powering Imagination

LI-EW1

[ID1507]