Revolutionizing Creation on Roblox with Generative AI

發佈者Daniel Sturman, Chief Technology Officer, Roblox

發佈日期2023年9月11日

Revolutionizing Creation on Roblox with Generative AI

Earlier this year, we shared our vision for generative artificial intelligence (AI) on Roblox and the intuitive new tools that will enable every user to become a creator. 隨著這些工具在整個產業快速發展，我想提供一些最新消息，報告我們所取得的進展、生成式 AI 創作民主化仍須努力的事項，以及為什麼我們認為生成式 AI 是 Roblox 未來發展的關鍵元素。

生成式 AI 和大型語言模型（LLM）的進步帶來了開拓沉浸式體驗未來的絕佳機會，因為這些技術能實現更輕鬆且快速的創作方式，同時保持安全且無需大量運算資源。此外，多模態 AI 模型（亦即接受多種類型內容的訓練，包括影像、程式碼、文字、3D 模型和音訊）的進步，更為創作工具的新進步開啟了大門。 These same models are beginning to also produce multimodal outputs, such as a model that can create a text output, as well as some visuals that complement the text. We see these AI breakthroughs as an enormous opportunity to simultaneously increase efficiency for more experienced creators and to enable even more people to bring great ideas to life on Roblox. At this year’s Roblox Developers Conference (RDC), we announced several new tools that will bring generative AI into Roblox Studio and beyond to help anyone on Roblox scale faster, iterate more quickly, and augment their skills to create even better content.

Roblox Assistant

Roblox has always provided creators with the tools, services, and support they need to build immersive 3D experiences. 同時，我們也看到創作者們開始使用第三方的生成式和交談式 AI 來協助他們創作。雖然這有助於減少創作者的工作量，但這些現成版本的 AI 並非專為端對端 Roblox 工作流程所設計，訓練過程中也沒有學習過 Roblox 的程式碼、俚語和術語。 That means creators face significant additional work to use these versions to create content for Roblox. 我們一直在研究如何將這些工具的價值引進 Roblox Studio，並且在 RDC 上分享了 Assistant 的初期範例。

Assistant is our conversational AI that enables creators of all skill levels to spend significantly less time on the mundane, repetitive tasks involved in creating and more time on high-value activities, like narrative, game-play, and experience design. Roblox is uniquely positioned to build this conversational AI model for immersive 3D worlds, thanks to our access to a large set of public 3D models to train on, our ability to integrate a model with our platform APIs, and our growing suite of innovative AI solutions. Creators will be able to use natural language text prompts to create scenes, edit 3D models, and apply interactive behaviors to objects. Assistant will support the three phases of creation: learning, coding, and building:

Learning:Whether a creator is brand-new to developing on Roblox or a seasoned veteran, Roblox Assistant will help answer questions across a wide range of surfaces using natural language.
Coding:Assistant will expand on our recent Code Assist tool. 例如，開發者可要求 Assistant 改善其程式碼、解釋一段程式碼，或協助偵錯並建議如何修正無法正常運作的程式碼。
Building:Assistant will help creators rapidly prototype new ideas. For example, a new creator could generate entire scenes and try out different versions simply by typing a prompt like “Add some streetlights along this road” or “Make a forest with different kinds of trees. Now add some bushes and flowers.”

Working with Assistant will be collaborative, interactive, and iterative, enabling creators to provide feedback and have Assistant work to provide the right solution. It will be like having an expert creator as a partner that you can bounce ideas off of and try out ideas until you get it right.

To make Assistant the best partner it can be, we made another announcement at RDC: We invited developers to opt in to contribute their anonymized Luau script data. This script data will help make our AI tools, like Code Assist and Assistant, significantly better at suggesting and creating more efficient code, giving back to the Roblox developers who use them. Further, if developers opt to share beyond Roblox, their script data will be added to a data set made available to third parties to train their AI chat tools to be better at suggesting Luau code, giving back to Luau developers everywhere.

To be clear, through comprehensive user research and transparent conversations with top developers, we’ve designed this to be opt-in and will help ensure that all participants understand and consent to what the program entails. As a thank you to those who choose to participate in sharing script data with Roblox, we will grant access to the more powerful versions of Assistant and Code Assist that are powered by this community-trained model. 未選擇加入的使用者將可繼續使用我們現有版本的 Assistant 和程式碼助手。

Easier Avatar Creation

我們的最高理想，是每日 6550 萬使用者之中的每一個人都擁有能真正代表自己並表達自我的虛擬人偶。 We recently released the ability for our UGC Program members to create and sell both avatar bodies and standalone heads. Today, that process requires access to Studio or our UGC Program, a fairly high level of skill, and multiple days of work to enable facial expression, body movement, 3D rigging, etc. This makes avatars time-consuming to create and has, to date, limited the number of options available. We want to go even further.

To enable everyone on Roblox to have a personalized, expressive avatar, we need to make avatars very easy to generate and customize. 在 RDC，我們宣布將於 2024 年發布一款新工具，支援從一張或多張影像輕鬆建立自訂虛擬人偶。 With this tool, any creator with access to Studio or our UGC program will be able to upload an image, have an avatar created for them, and then modify it as they like. Longer term, we intend to also make this available directly within experiences on Roblox.

為實現此目標，我們正以 Roblox 的虛擬人偶結構和 Roblox 擁有的 3D 虛擬人偶模型集來訓練 AI 模型。 One approach leverages research for generating 3D stylized avatars from 2D images. We are also looking at using pre-trained text-to-image diffusion models to augment limited 3D training data with 2D generative techniques, and using a generative adversarial network (GAN)-based 3D generation network for training. Finally, we are working on using ControlNet to layer in predefined poses to guide the resulting multi-view images of the avatars.

此程序會為虛擬人偶產生 3D 網格。 Next, we leverage 3D semantic segmentation research, trained on 3D avatar poses, to take that 3D mesh and adjust it to add appropriate facial features, caging, rigging, and textures, in essence, making the static 3D mesh into a Roblox avatar. Finally, a mesh-editing tool allows users to morph and adjust the model to make it look more like the version they are imagining. And all of this happens fast—within minutes—generating a new avatar that can be imported into Roblox and used in an experience.

Moderating Voice Communication

AI for us isn't just about creation, it's also a much more efficient system for ensuring a diverse, safe, and civil community, at scale. 有鑑於我們將開始推出各種全新語音功能，包括語音聊天以及用虛擬人偶身分呼叫好友的新功能 Roblox Connect，並且在 RDC 上宣布推出 API，目前我們面臨了新的挑戰：即時審核口語。 The current industry standard for this is a process known as Automatic Speech Recognition (ASR), which essentially takes an audio file, transcribes it to convert it into text, then analyzes the text to look for inappropriate language, keywords, etc.

This works well for companies using it at a smaller scale, but as we explored using this same ASR process to moderate voice communication, we quickly realized that it’s difficult and inefficient at our scale. 這種方法也會流失非常有用的資訊，例如發言者的音量和語氣，以及更廣泛的對話上下文。在我們每天必須以不同語言轉錄的數百萬分鐘對話中，只有很小一部分可能聽起來不當。 And as we continue to scale, that system would require more and more compute power to keep up. So we took a closer look at how we could do this more efficiently, by building a pipeline that goes directly from the live audio to labeling content to indicate whether it violates our policies or not.

Ultimately, we were able to build an in-house custom voice-detection system by using ASR to classify our in-house voice data sets, then use that classified voice data to train the system. More specifically, to train this new system, we begin with audio and create a transcript. We then run the transcript through our Roblox text filter system to classify the audio. 這個文字篩選器系統非常適合偵測在 Roblox 上違反政策的言論，因為多年來我們一直在針對 Roblox 特有的俚語、縮寫和術語最佳化這套篩選器系統。 At the end of these layers of training, we have a model that’s capable of detecting policy violations directly from audio in real time.

While this system does have the ability to detect specific keywords such as profanity, policy violations are rarely just one word. One word can often seem problematic in one context and just fine in a different context. Essentially, these types of violations involve what you’re saying, how you’re saying it, and the context in which the statements are made.

To get better at understanding context, we leverage the native power of a transformer-based architecture, which is very good at sequence summarization. It can take a sequence of data, like an audio stream, and summarize it for you. This architecture enables us to preserve a longer audio sequence so we can detect not only words but also context and intonations. 這些元素通通結合在一起後，我們就有了一個最終的系統，輸入是音訊，輸出則是分類：違反政策或未違反政策。這個系統可以偵測關鍵字和違反政策的短語，還可以偵測語氣、情緒和其他對於判定意圖很重要的上下文。 This new system, which detects policy-violating speech directly from audio, is significantly more compute efficient than a traditional ASR system, which will make it much easier to scale as we continue to reimagine how people come together.

We also needed a new way to warn those on our voice communication tools of the potential consequences of this type of language. With this innovative detection system at our disposal, we are now experimenting with ways to affect online behavior to maintain a safe environment. We know people sometimes violate our policies unintentionally and we want to understand if an occasional reminder might help prevent further offenses. To help with this, we are experimenting with real-time user feedback through notifications. 如果系統偵測到您多次說出違反我們政策的言論，我們會在您的畫面上顯示快顯通知，讓您知道您的言論違反了我們的政策，並引導您參閱我們的政策以了解更多資訊。

Voice stream notifications are just one element of the moderation system, however. We also look at behavioral patterns on the platform, as well as complaints from others on Roblox, to drive our overall moderation decisions. The aggregate of these signals could result in stronger consequences, including having access to audio features revoked, or for more serious infractions, being banned from the platform entirely. Keeping our community safe and civil is critical as these advances in multimodal AI models, generative AI, and LLMs come together to enable incredible new tools and capabilities for creators.

我們相信，為創作者提供這些工具既可以降低經驗不足創作者的進入門檻，也可以讓經驗豐富的創作者擺脫過程中較繁瑣乏味的事務。 This will allow them to spend more time on the inventive aspects of fine-tuning and ideating. Our goal with all of this is to enable everyone, everywhere to bring their ideas to life and to vastly increase the diversity of avatars, items, and experiences available on Roblox. We are also sharing information and tools to help protect new creations.

We’re already imagining amazing possibilities: Say someone is able to create an avatar doppelganger directly from a photo, they could then customize their avatar to make them taller or render them in anime style. Or they could build an experience by asking Assistant to add cars, buildings, and scenery, set lighting or wind conditions, or change the terrain. From there, they could iterate to refine things just by typing back and forth with Assistant. We know the reality of what people create with these tools, as they become available, will go well beyond what we can even imagine.