Toxicity Removal in Large Language Models

  • Built LSTM and Transformer models from scratch and a BERT-based transformer model to predict sentence toxicity
  • Incorporated feature engineering with a parts of speech tagger to capture sentiment and style of toxic comments
  • Successfully ranked responses of an LLM to prompts based on their toxicity and used style transfer decrease toxicity

Download Slides ; Github Repository