This tool will review threads when certain trigger words are included, will pass the content of the post to an AI based tool for assessment against a number of trigger categories, and based on the assessment will flag for human moderation if deemed appropriate before it goes live on the site.
Given we have very very few people moderators, this gives the site a mechanism to try and apply some sensible moderation.
The categories moderation can cover are:
- Harassment
- Harassment/Threatening
- Sexual
- Hate
- Hate/Threatening
- Illegal
- Illegal/Violent
- Self Harm/Intent
- Self Harm/Instructional
- Self Harm
- Sexual/Minors
- Violence
- Violence/Graphic
We will initially set the threshold for intervention pretty high, and as stated it will only apply to posts that contain trigger words. There will inevitably be some fine tuning required as time goes on and feedback will be welcome as always.
If a poster posts something that the tool decides requires moderation (review by a site mod), you'll get a notification and it won't go live to the site until it gets reviewed and approved.
As always with WHO the goal is to be light touch...and before anyone asks, 'cսnt' will not be a trigger word.
There are also options for us to target all posts on certain threads or all posts by certain posters (though these won't be enabled at launch).
Hopefully this will be something that fills the gap of active moderation that is something we've struggled with for a while now.