Inside the Tech - Solving for Safety in Immersive Voice Communication

ByRoblox

PublishedJan 18, 2024

Inside the Tech is a blog series that accompanies our Tech Talks Podcast. In episode 20 of the podcast, The Evolution of Roblox Avatars, Roblox CEO David Baszucki spoke with Senior Director of Engineering Kiran Bhat, Senior Director of Product Mahesh Ramasubramanian, and Principal Product Manager Effie Goenawan, about the future of immersive communication through avatars and the technical challenges we’re solving to power it. In this edition of Inside the Tech, we talked with Senior Engineering Manager Andrew Portner to learn more about one of those technical challenges, safety in immersive voice communication, and how the team’s work is helping to foster a safe and civil digital environment for all on our platform.

Episode 20

The Evolution of Roblox Avatars

With Kiran Bhat, Senior Director of Engineering, Mahesh Ramasubramanian, Senior Director of Product, and Effie Goenawan, Principal Product Manager

Senior Director of Engineering Kiran Bhat, Senior Director of Product Mahesh Ramasubramanian, and Principal Product Manager Effie Goenawan join CEO David Baszucki for a look into the future of immersive communication through avatars and the technical challenges we’re solving to power it. They’ll discuss in depth how expressive avatars not only allow us to express ourselves digitally, but also communicate more immersively through voice, facial expressions, and body language.

What are the biggest technical challenges your team is taking on?

We prioritize maintaining a safe and positive experience for our users. Safety and civility are always top of mind for us, but handling it in real time can be a big technical challenge. Whenever there’s an issue, we want to be able to review it and take action in real time, but this is challenging given our scale. In order to handle this scale effectively, we need to leverage automated safety systems.

Another technical challenge that we’re focused on is the accuracy of our safety measures for moderation. There are two moderation approaches to address policy violations and provide accurate feedback in real time: reactive and proactive moderation. For reactive moderation, we're developing machine learning (ML) models to accurately identify different types of policy violations, which work by responding to reports from people on the platform. Proactively, we're working on real-time detection of potential content that violates our policies, educating users about their behavior. Understanding the spoken word and improving audio quality is a complex process. We’re already seeing progress, but our ultimate goal is to have a highly precise model that can detect policy-violating behavior in real time.

What are some of the innovative approaches and solutions we’re using to tackle these technical challenges?

We have developed an end-to-end ML model that can analyze audio data and provides a confidence level based on the type of policy violations (e.g. how likely is this bullying, profanity, etc.). This model has significantly improved our ability to automatically close certain reports. We take action when our model is confident and can be sure that it outperforms humans. Within just a handful of months after launching, we were able to moderate almost all English voice abuse reports with this model. We've developed these models in-house and it's a testament to the collaboration between a lot of open source technologies and our own work to create the tech behind it.

Determining what is appropriate in real time seems pretty complex. How does that work?

There's a lot of thought put into making the system contextually aware. We also look at patterns over time before we take action so we can be sure that our actions are justified. Our policies are nuanced depending on a person's age, whether they're in a public space or a private chat, and many other factors. We are exploring new ways to promote civility in real time and ML is at the heart of it. We recently launched automated push notifications to remind users of our policies. We’re also looking into other factors like tone of voice to better understand a person’s intentions and distinguish things like sarcasm or jokes. Lastly, we're also building a multilingual model since some people speak multiple languages or even switch languages mid-sentence. For any of this to be possible, we have to have an accurate model.

Currently, we are focused on addressing the most prominent forms of abuse, such as harassment, discrimination, and profanity. These make up the majority of abuse reports. Our aim is to have a significant impact in these areas and set the industry norms for what promoting and maintaining a civil online conversation looks like. We're excited about the potential of using ML in real time, as it enables us to effectively foster a safe and civil experience for everyone.

How are the challenges we’re solving at Roblox unique? What are we in a position to solve first?

Our Chat with Spatial Voice technology creates a more immersive experience, mimicking real-world communication. For instance, if I’m standing to the left of someone, they’ll hear me in their left ear. We’re creating an analog to how communication works in the real world and this is a challenge we’re in the position to solve first.

As a gamer myself, I've witnessed a lot of harassment and bullying in online gaming. It's a problem that often goes unchecked due to user anonymity and a lack of consequences. However, the technical challenges that we’re tackling around this are unique to what other platforms are facing in a couple of areas. On some gaming platforms, interactions are limited to teammates. Roblox offers a variety of ways to hangout in a social environment that more closely mimics real life. With advancements in ML and real-time signal processing, we're able to effectively detect and address abusive behavior which means we’re not only a more realistic environment, but also one where everyone feels safe to interact and connect with others. The combination of our technology, our immersive platform, and our commitment to educating users about our policies puts us in a position to tackle these challenges head on.

What are some of the key things that you’ve learned from doing this technical work?

I feel like I've learned a considerable deal. I'm not an ML engineer. I’ve worked mostly on the front end in gaming, so just being able to go deeper than I have about how these models work has been huge. My hope is that the actions we’re taking to promote civility translate to a level of empathy in the online community that has been lacking.

One last learning is that everything depends on the training data you put in. And for the data to be accurate, humans have to agree on the labels being used to categorize certain policy-violating behaviors. It's really important to train on quality data that everyone can agree on. It's a really hard problem to solve. You begin to see areas where ML is way ahead of everything else, and then other areas where it's still in the early stages. There are still many areas where ML is still growing, so being cognizant of its current limits is key.

Which Roblox value does your team most align with?

Respecting the community is our guiding value throughout this process. First, we need to focus on improving civility and reducing policy violations on our platform. This has a significant impact on the overall user experience. Second, we must carefully consider how we roll out these new features. We need to be mindful of false positives (e.g. incorrectly marking something as abuse) in the model and avoid incorrectly penalizing users. Monitoring the performance of our models and their impact on user engagement is crucial.

What excites you the most about where Roblox and your team are headed?

We have made significant progress in improving public voice communication, but there is still much more to be done. Private communication is an exciting area to explore. I think there's a huge opportunity to improve private communication, to allow users to express themselves to close friends, to have a voice call going across experiences or during an experience while they interact with their friends. I think there's also an opportunity to foster these communities with better tools to enable users to self-organize, join communities, share content, and share ideas.

As we continue to grow, how do we scale our chat technology to support these expanding communities? We're just scratching the surface on a lot of what we can do, and I think there's a chance to improve the civility of online communication and collaboration across the industry in a way that has not been done before. With the right technology and ML capabilities, we're in a unique position to shape the future of civil online communication.