Toxicity Removal in Large Language Models
- Built LSTM and Transformer models from scratch and a BERT-based transformer model to predict sentence toxicity
- Incorporated feature engineering with a parts of speech tagger to capture sentiment and style of toxic comments
- Successfully ranked responses of an LLM to prompts based on their toxicity and used style transfer decrease toxicity