Toxicity Removal in Large Language Models

Built LSTM and Transformer models from scratch and a BERT-based transformer model to predict sentence toxicity
Incorporated feature engineering with a parts of speech tagger to capture sentiment and style of toxic comments
Successfully ranked responses of an LLM to prompts based on their toxicity and used style transfer decrease toxicity

Download Slides ; Github Repository

Share on

Twitter Facebook LinkedIn