If I understood that correctly, it is still a way to detect separate words. But I think it wont work if the “bad word” is part of a larger single word. Like, if “Blue” was a bad word in the list, this system would detect BLUE but not BLUEMAN. Maybe I can try something with the OpenAI integration. A bad words detector prompt that returns TRUE or FALSE…
If I was going to do this, then I think I would definitely turn to OpenAI.
But don’t just look for words, because blindly filtering keywords will never work and is easy to get around. Words on their own are not inherently good or bad. It’s the way that they are used and how they are combined that can cause offence. So ask the AI to do a bit of sentiment analysis and pick out phrases that could be considered offensive or harmful.
I think what I would do is flag for moderation, rather than just rejecting outright. But that’s up to you of course.
After trying different approaches, I am convinced that the best one is using OpenAI to do the job.
It’s very easy to setup and with the right prompt it can do everything.
My App is in 3 languages and I’m being able to filter any bad word from all the 3 languages with just this one setting.
The previous approach was almost impossible. Very time consuming and it was only able to find isolated words from a giant list, that was not enough to prevent all the possible combinations in 3 languages.