Reddit has introduced an AI-powered safety filter that will help sift out posts that contain harassing or other objectionable content.
The "harassment filter" — quietly added to the platform's support page last week and detected by Android Authority — uses a Large Language Model (LLM) "trained on moderator actions and content removed by Reddit’s internal tools and enforcement teams," Reddit explains. The tool intends to support the already tenuous work of reddit moderators tasked with supervising the online communities they're a part of.
Just last month, Bloomberg reported that Reddit had signed a content licensing deal to a major "AI player," which would offer site and user data to train potential AI tech.
When a community and its moderators turn on the filter, a new flag will appear in the site's mod queue indicating content (posts and comments) that has been flagged as “potential harassment." Moderators can then approve or remove the content, and report back to Reddit if it was accurately detected.
The platform has introduced a slew of new features and updated experiences in recent months, ahead of its stock market debut this month. Last year, Reddit announced the Modmail Harassment Filter, which acts like a "spam" folder for moderator messages containing potentially abusive content.
How to set up Reddit's harassment filter
For desktop, go to the About Community tab on the right sidebar and select Mod Tools. For iOS and Android, click on the Mod Tools button below your community's banner.
Go to Moderation. Click on Safety.
Select the Harassment filter option, and toggle on.
Choose between the Low or High filter options. Low filtering blocks the least amount of content, but is more accurate in spotting harassment. High filter does a broader sweep of posts, and thus will block more posts. Reddit recommends using the High option if your community encounters a "significant amount of harassing content."
While Reddit says administrators will continue to automatically remove posts that directly violate Reddit’s Content Policy, the harassment filter provides communities oversight on objectionable but still "policy-complying" content that might slip through the cracks.