Revolutionizing Creation on Roblox with Generative AI
Earlier this year, we shared our vision for generative artificial intelligence (AI) on Roblox and the intuitive new tools that will enable every user to become a creator. 随着这些工具在行业中的快速演变,我想介绍一些我们所取得的最新进展、未来将如何普及生成式 AI 创作,以及我们为什么认为生成式 AI 是决定 Roblox 发展方向的关键要素之一。
生成式 AI 和大型语言模型 (LLM) 领域的进步简化并加快了创作,同时确保了安全性,且无需占用大量计算资源,向我们展示了未来沉浸式作品的无限可能。 此外,AI 模型的发展是多模态的,这意味着我们要使用多种类型的内容来训练它们,如图像、代码、文本、3D 模型和音频等,从而让创作工实现新的突破。 These same models are beginning to also produce multimodal outputs, such as a model that can create a text output, as well as some visuals that complement the text. We see these AI breakthroughs as an enormous opportunity to simultaneously increase efficiency for more experienced creators and to enable even more people to bring great ideas to life on Roblox. At this year’s Roblox Developers Conference (RDC), we announced several new tools that will bring generative AI into Roblox Studio and beyond to help anyone on Roblox scale faster, iterate more quickly, and augment their skills to create even better content.
Roblox Assistant
Roblox has always provided creators with the tools, services, and support they need to build immersive 3D experiences. 与此同时,我们看到我们的创作者也开始使用第三方生成式及对话式 AI 来助力自身创作。 尽管它们能够很好地帮助创作者减轻工作量,但现有版本要么不是针对端到端 Roblox 工作流程量身定制,要么就是没进行过 Roblox 代码、俚语和隐语方面的训练。 That means creators face significant additional work to use these versions to create content for Roblox. 我们一直在研究如何将这些工具的价值整合到 Roblox Studio,而且也在 RDC 上分享了“助手”应用的早期范例。
Assistant is our conversational AI that enables creators of all skill levels to spend significantly less time on the mundane, repetitive tasks involved in creating and more time on high-value activities, like narrative, game-play, and experience design. Roblox is uniquely positioned to build this conversational AI model for immersive 3D worlds, thanks to our access to a large set of public 3D models to train on, our ability to integrate a model with our platform APIs, and our growing suite of innovative AI solutions. Creators will be able to use natural language text prompts to create scenes, edit 3D models, and apply interactive behaviors to objects. Assistant will support the three phases of creation: learning, coding, and building:
- Learning:Whether a creator is brand-new to developing on Roblox or a seasoned veteran, Roblox Assistant will help answer questions across a wide range of surfaces using natural language.
- Coding:Assistant will expand on our recent Code Assist tool. 例如,开发者可以让“助手”改进他们的代码,解释某段代码、帮助调试运行不正常的代码或提供修复建议。
- Building:Assistant will help creators rapidly prototype new ideas. For example, a new creator could generate entire scenes and try out different versions simply by typing a prompt like “Add some streetlights along this road” or “Make a forest with different kinds of trees. Now add some bushes and flowers.”
Working with Assistant will be collaborative, interactive, and iterative, enabling creators to provide feedback and have Assistant work to provide the right solution. It will be like having an expert creator as a partner that you can bounce ideas off of and try out ideas until you get it right.
To make Assistant the best partner it can be, we made another announcement at RDC: We invited developers to opt in to contribute their anonymized Luau script data. This script data will help make our AI tools, like Code Assist and Assistant, significantly better at suggesting and creating more efficient code, giving back to the Roblox developers who use them. Further, if developers opt to share beyond Roblox, their script data will be added to a data set made available to third parties to train their AI chat tools to be better at suggesting Luau code, giving back to Luau developers everywhere.
To be clear, through comprehensive user research and transparent conversations with top developers, we’ve designed this to be opt-in and will help ensure that all participants understand and consent to what the program entails. As a thank you to those who choose to participate in sharing script data with Roblox, we will grant access to the more powerful versions of Assistant and Code Assist that are powered by this community-trained model. 未参与计划的人员将能够继续使用现有版本的“Roblox 助手”和“代码助手”。
Easier Avatar Creation
最后,我们希望 6550 万日常用户当中的每一位都拥有一个能真正代表自己,表达自己身份的虚拟形象。 We recently released the ability for our UGC Program members to create and sell both avatar bodies and standalone heads. Today, that process requires access to Studio or our UGC Program, a fairly high level of skill, and multiple days of work to enable facial expression, body movement, 3D rigging, etc. This makes avatars time-consuming to create and has, to date, limited the number of options available. We want to go even further.
To enable everyone on Roblox to have a personalized, expressive avatar, we need to make avatars very easy to generate and customize. 在 RDC 上,我们宣布将在 2024 年推出一款新工具,它将简化使用一张或多张图像制作自定义虚拟形象。 With this tool, any creator with access to Studio or our UGC program will be able to upload an image, have an avatar created for them, and then modify it as they like. Longer term, we intend to also make this available directly within experiences on Roblox.
为实现这一目标,我们正使用 Roblox 的虚拟形象架构和一套 Roblox 自有的 3D 虚拟形象模型来训练 AI 模型。 One approach leverages research for generating 3D stylized avatars from 2D images. We are also looking at using pre-trained text-to-image diffusion models to augment limited 3D training data with 2D generative techniques, and using a generative adversarial network (GAN)-based 3D generation network for training. Finally, we are working on using ControlNet to layer in predefined poses to guide the resulting multi-view images of the avatars.
这个过程会为虚拟形象生成 3D 网格。 Next, we leverage 3D semantic segmentation research, trained on 3D avatar poses, to take that 3D mesh and adjust it to add appropriate facial features, caging, rigging, and textures, in essence, making the static 3D mesh into a Roblox avatar. Finally, a mesh-editing tool allows users to morph and adjust the model to make it look more like the version they are imagining. And all of this happens fast—within minutes—generating a new avatar that can be imported into Roblox and used in an experience.
Moderating Voice Communication
AI for us isn't just about creation, it's also a much more efficient system for ensuring a diverse, safe, and civil community, at scale. 随着我们开始推广新的语音功能,包括语音聊天和 Roblox Connect(在 RDC 上宣布推出虚拟形象功能和 API 时所使用的新名称),我们将面对一项新的挑战——实时审查通信语音。 The current industry standard for this is a process known as Automatic Speech Recognition (ASR), which essentially takes an audio file, transcribes it to convert it into text, then analyzes the text to look for inappropriate language, keywords, etc.
This works well for companies using it at a smaller scale, but as we explored using this same ASR process to moderate voice communication, we quickly realized that it’s difficult and inefficient at our scale. 此方法还会丢失潜藏于说话者音量和语气中的重要信息,并且忽视更广义的对话语境。 在我们每天转录的,涉及不同语言的几百万分钟对话中,只有很不起眼的一小部分听起来似乎是不恰当的。 And as we continue to scale, that system would require more and more compute power to keep up. So we took a closer look at how we could do this more efficiently, by building a pipeline that goes directly from the live audio to labeling content to indicate whether it violates our policies or not.
Ultimately, we were able to build an in-house custom voice-detection system by using ASR to classify our in-house voice data sets, then use that classified voice data to train the system. More specifically, to train this new system, we begin with audio and create a transcript. We then run the transcript through our Roblox text filter system to classify the audio. 这种文本过滤系统非常擅长在 Roblox 上检测违反政策的语言,因为我们多年来一直在针对特定于 Roblox 的俚语、缩写和隐语对该过滤系统进行优化。 At the end of these layers of training, we have a model that’s capable of detecting policy violations directly from audio in real time.
While this system does have the ability to detect specific keywords such as profanity, policy violations are rarely just one word. One word can often seem problematic in one context and just fine in a different context. Essentially, these types of violations involve what you’re saying, how you’re saying it, and the context in which the statements are made.
To get better at understanding context, we leverage the native power of a transformer-based architecture, which is very good at sequence summarization. It can take a sequence of data, like an audio stream, and summarize it for you. This architecture enables us to preserve a longer audio sequence so we can detect not only words but also context and intonations. 在将所有这些要素整合在一起以后,所得的最终系统将以音频为输入并输出分类,亦即是否违反政策。 此系统不仅可以检测关键字和违反政策的短语,还能检测对确定意图来说十分重要的语气、情绪和其他语境。 This new system, which detects policy-violating speech directly from audio, is significantly more compute efficient than a traditional ASR system, which will make it much easier to scale as we continue to reimagine how people come together.
We also needed a new way to warn those on our voice communication tools of the potential consequences of this type of language. With this innovative detection system at our disposal, we are now experimenting with ways to affect online behavior to maintain a safe environment. We know people sometimes violate our policies unintentionally and we want to understand if an occasional reminder might help prevent further offenses. To help with this, we are experimenting with real-time user feedback through notifications. 如果系统检测到您所说的话多次违反我们的政策,我们将在您的屏幕上弹出通知,告知您的言语违反了我们的政策并将您引导至我们的政策以了解更多信息。
Voice stream notifications are just one element of the moderation system, however. We also look at behavioral patterns on the platform, as well as complaints from others on Roblox, to drive our overall moderation decisions. The aggregate of these signals could result in stronger consequences, including having access to audio features revoked, or for more serious infractions, being banned from the platform entirely. Keeping our community safe and civil is critical as these advances in multimodal AI models, generative AI, and LLMs come together to enable incredible new tools and capabilities for creators.
我们相信,为创作者提供这些工具不仅能降低新手创作者的入门门槛,还能帮助资深创作者简化此过程中的单调任务。 This will allow them to spend more time on the inventive aspects of fine-tuning and ideating. Our goal with all of this is to enable everyone, everywhere to bring their ideas to life and to vastly increase the diversity of avatars, items, and experiences available on Roblox. We are also sharing information and tools to help protect new creations.
We’re already imagining amazing possibilities: Say someone is able to create an avatar doppelganger directly from a photo, they could then customize their avatar to make them taller or render them in anime style. Or they could build an experience by asking Assistant to add cars, buildings, and scenery, set lighting or wind conditions, or change the terrain. From there, they could iterate to refine things just by typing back and forth with Assistant. We know the reality of what people create with these tools, as they become available, will go well beyond what we can even imagine.