Scaling Safety and Civility on Roblox

Roblox December 4, 2025

0 445 Views

Roblox has always been designed to protect our youngest users; we are now adapting to a growing audience of older users.
With text, voice, visuals, 3D models, and code, Roblox is in a unique position to succeed with multimodal AI solutions.
We improve safety across the industry wherever we can, via open source, collaboration with partners, or support for legislation.

Safety and civility have been foundational to Roblox since its inception nearly two decades ago. On day one, we committed to building safety features, tools, and moderation capabilities into the design of our products. Before we launch any new feature, we’ve already begun thinking about how to keep the community safe from potential harms. This process of designing features for safety and civility from the outset, including early testing to see how a new feature might be misused, helps us innovate. We continually evaluate the latest research and technology available to keep our policies, tools, and systems as accurate and efficient as possible.

When it comes to safety, Roblox is uniquely positioned. Most platforms began as a place for adults and are now retroactively working to build in protections for teens and children. But our platform was developed from the beginning as a safe, protective space for children to create and learn, and we are now adapting to a rapidly growing audience that’s aging up. In addition, the volume of content we moderate has grown exponentially, thanks to exciting new generative AI features and tools that empower even more people to easily create and communicate on Roblox. These are not unexpected challenges—our mission is to connect a billion people with optimism and civility. We are always looking at the future to understand what new safety policies and tools we’ll need as we grow and adapt.

Many of our safety features and tools are based on innovative AI solutions that run alongside an expert team of thousands who are dedicated to safety. This strategic blend of experienced humans and intelligent automation is imperative as we work to scale the volume of content we moderate 24/7. We also believe in nurturing partnerships with organizations focused on online safety, and, when relevant, we support legislation that we strongly believe will improve the industry as a whole.

Leading with AI to Safely Scale

The sheer scale of our platform demands AI systems that meet or top industry-leading benchmarks for accuracy and efficiency, allowing us to quickly respond as the community grows, policies and requirements evolve, and new challenges arise. Today, more than 71 million daily active users in 190 countries communicate and share content on Roblox. Every day, people send billions of chat messages to their friends on Roblox. Our Creator Store has millions of items for sale—and creators add new avatars and items to Marketplace every day. And this will only get larger as we continue to grow and enable new ways for people to create and communicate on Roblox.

As the broader industry makes great leaps in machine learning (ML), large language models (LLMs), and multimodal AI, we invest heavily in ways to leverage these new solutions to make Roblox even safer. AI solutions already help us moderate text chat, immersive voice communication, images, and 3D models and meshes. We are now using many of these same technologies to make creation on Roblox faster and easier for our community.

Innovating with Multimodal AI Systems

By its very nature, our platform combines text, voice, images, 3D models, and code. Multimodal AI, in which systems are trained on multiple types of data together to produce more accurate, sophisticated results than a unimodal system, presents a unique opportunity for Roblox. Multimodal systems are capable of detecting combinations of content types (such as images and text) that may be problematic in ways that the individual elements aren’t. To imagine how this might work, let’s say a kid is using an avatar that looks like a pig—totally fine, right? Now imagine someone else sends a chat message that says “This looks just like you! ” That message might violate our policies around bullying.

A model trained only on 3D models would approve the avatar. And a model trained only on text would approve the text and ignore the context of the avatar. Only something trained across text and 3D models would be able to quickly detect and flag the issue in this example. We are in the early days for these multimodal models, but we see a world, in the not too distant future, where our system responds to an abuse report by reviewing an entire experience. It could process the code, the visuals, the avatars, and communications within it as input and determine whether further investigation or consequence is warranted.

We’ve already made significant advances using multimodal techniques, such as our model that detects policy violations in voice communications in near real time. We intend to share advances like these when we see the opportunity to increase safety and civility not just on Roblox but across the industry. In fact, we are sharing our first open source model, a voice safety classifier, with the industry.

Moderating Content at Scale

At Roblox, we review most content types to catch critical policy violations before they appear on the platform. Doing this without causing noticeable delays for the people publishing their content requires speed as well as accuracy. Groundbreaking AI solutions help us make better decisions in real time to help keep problematic content off of Roblox—and if anything does make it through to the platform, we have systems in place to identify and remove that content, including our robust user reporting systems.

We’ve seen the accuracy of our automated moderation tools surpass that of human moderators when it comes to repeatable, simple tasks. By automating these simpler cases, we free up our human moderators to spend the bulk of their time on what they do best—the more complex tasks that require critical thinking and deeper investigation. When it comes to safety, however, we know that automation cannot completely replace human review. Our human moderators are invaluable for helping us continually oversee and test our ML models for quality and consistency, and for creating high-quality labeled data sets to keep our systems current. They help identify new slang and abbreviations in all 16 languages we support and flag cases that come up frequently so that the system can be trained to recognize them.

We know that even high-quality ML systems can make mistakes, so we have human moderators in our appeals process. Our moderators help us get it right for the individual who filed the appeal, and can flag the need for further training on the types of cases where mistakes were made. With this, our system grows increasingly accurate over time, essentially learning from its mistakes.Most important, humans are always involved in any critical investigations involving high-risk cases, such as extremism or child endangerment. For these cases, we have a dedicated internal team working to proactively identify and remove malicious actors and to investigate difficult cases in our most critical areas. This team also partners with our product team, sharing insights from the work they are doing to continually improve the safety of our platform and products.

Moderating Communication

Our text filter has been trained on Roblox-specific language, including slang and abbreviations. The 2.5 billion chat messages sent every day on Roblox go through this filter, which is adept at detecting policy-violating language. This filter detects violations in all the languages we support, which is especially important now that we’ve released real-time AI chat translations.

We’ve previously shared how we moderate voice communication in real time via an in-house custom voice detection system. The innovation here is the ability to go directly from the live audio to having the AI system label the audio as policy violating or not—in a matter of seconds. As we began testing our voice moderation system, we found that, in many cases, people were unintentionally violating our policies because they weren’t familiar with our rules. We developed a real-time safety system to help notify people when their speech violates one of our policies.

These notifications are an early, mild warning, akin to being politely asked to watch your language in a public park with young children around. In testing, these interventions have proved successful in reminding people to be respectful and directing them to our policies to learn more. When compared against engagement data, the results of our testing are encouraging and indicate that these tools may effectively keep bad actors off the platform while encouraging truly engaged users to improve their behavior on Roblox. Since rolling out real-time safety to all English-speaking users in January, we have seen a 53 percent reduction in abuse reports per daily active user, when related to voice communication.

Moderating Creation

For visual assets, including avatars and avatar accessories, we use computer vision (CV). One technique involves taking photographs of the item from multiple angles. The system then reviews those photographs to determine what the next step should be. If nothing seems amiss, the item is approved. If something is clearly violating a policy, the item is blocked and we tell the creator what we think is wrong. If the system is not sure, the item is sent to a human moderator to take a closer look and make the final decision.

We do a version of this same process for avatars, accessories, code, and full 3D models. For full models, we go a step further and assess all the code and other elements that make up the model. If we are assessing a car, we break it down into its components—the steering wheel, seats, tires, and the code underneath it all—to determine whether any might be problematic. If there’s an avatar that looks like a puppy, we need to assess whether the ears and the nose and the tongue are problematic.

We need to be able to assess in the other direction as well. What if the individual components are all perfectly fine but their overall effect violates our policies? A mustache, a khaki jacket, and a red armband, for example, are not problematic on their own. But imagine these assembled together on someone’s avatar, with a cross-like symbol on the armband and one arm raised in a Nazi salute, and a problem becomes clear.

This is where our in-house models differ from the available off-the-shelf CV models. Those are generally trained on real-world items. They can recognize a car or a dog but not the component parts of those things. Our models have been trained and optimized to assess items down to the smallest component parts.

Collaborating with Partners

We use all the tools available to us to keep everyone on Roblox safe—but we feel equally strongly about sharing what we learn beyond Roblox. In fact, we are sharing our first open source model, a voice safety classifier, to help others improve their own voice safety systems. We also partner with third-party groups to share knowledge and best practices as the industry evolves. We build and maintain close relationships with a wide range of organizations, including parental advocacy groups, mental health organizations, government agencies, and law enforcement agencies. They give us valuable insights into the concerns that parents, policymakers, and other groups have about online safety. In return, we are able to share our learnings and the technology we use to keep the platform safe and civil.

We have a track record of putting the safety of the youngest and most vulnerable people on our platform first. We have established programs, such as our Trusted Flagger Program, to help us scale our reach as we work to protect the people on our platform. We collaborate with policymakers on key child safety initiatives, legislation, and other efforts. For example, we were the first and one of the only companies to support the California Age-Appropriate Design Code Act, because we believe it’s in the best interest of young people. When we believe something will help young people, we want to propagate it to everyone. More recently, we signed a letter of support for California Bill SB 933, which updates state laws to expressly prohibit AI-generated child sexual abuse material.

Working Toward a Safer Future

This work is never finished. We are already working on the next generation of safety tools and features, even as we make it easier for anyone to create on Roblox. As we grow and provide new ways to create and share, we will continue to develop new, groundbreaking solutions to keep everyone safe and civil on Roblox—and beyond.

The post Scaling Safety and Civility on Roblox appeared first on Roblox Blog.

Source