SemEval 2020 - Dong and Choi

XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language Identification in Social Media Using Transformer Encoders

Xiangjue Dong, Jinho D. Choi


This paper presents six document classification models using the latest transformer encoders and a high-performing ensemble model for a task of offensive language identification in social media. For the individual models, deep transformer layers are applied to perform multi-head attentions. For the ensemble model, the utterance representations taken from those individual models are concatenated and fed into a linear decoder to make the final decisions. Our ensemble model outperforms the individual models and shows up to 8.6% improvement over the individual models on the development set. On the test set, it achieves macro-F1 of 90.9% and becomes one of the high performing systems among 85 participants in the sub-task A of this shared task. Our analysis shows that although the ensemble model significantly improves the accuracy on the development set, the improvement is not as evident on the test set.

Venue / Year

Proceedings of the International Workshop on Semantic Evaluation 2020 Task 12: OffensEval 2: Multilingual Offensive Language Identification in Social Media (SemEval) / 2020


Anthology | Paper | BibTeX