Text Classification and Mining

This project aims to advance NLP capabilities by developing sophisticated models for various text analysis tasks. It focuses on creating robust algorithms to categorize and extract insights from diverse textual data. Key areas of research include sentiment analysis, hate speech detection, sarcasm identification, and confidence-level classification. Our goal is to enhance the accuracy and efficiency of automated text understanding across multiple domains, potentially improving content moderation, market research, and social media analysis applications.

Director

Jinho Choi - Associate Professor at Emory University

Funding

ECLAIR: Competence-level Analysis (01/2019 ~ 05/2021)
Alfresco Software Inc. (01/2016 ~ 12/2016)
Infosys Inc. (09/2015 ~ 05/2016)

Publications

Competence-Level Prediction & Job Description Matching Using Context-Aware Transformer Models. Li, C.; Fisher, E.; Thomas, R.; Pittard, S.; Hertzberg, V.; and Choi, J. D. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language Identification in Social Media Using Transformer Encoders. Dong, X.; and Choi, J. D. Proceedings of the International Workshop on Semantic Evaluation 2020 Task 12: OffensEval 2: Multilingual Offensive Language Identification in Social Media (SemEval), 2020.
Transformer-based Context-aware Sarcasm Detection in Conversation Threads from Social Media. Dong, X.; Li, C.; and Choi, J. D. Proceedings of the ACL Workshop on Figurative Language Processing: Shared Task on Sarcasm Detection (FigLang:ST), 2020.
Event Analysis on the 2016 U.S. Presidential Election Using Social Media. Shaban, T.; Hexter, L.; and Choi, J. D. Proceedings of the International Conference on Social Informatics (SocInfo), Oxford, UK, 2017.
Lexicon Integrated CNN Models with Attention for Sentiment Analysis. Shin, B.; Lee, T.; and Choi, J. D. In Proceedings of the EMNLP Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), 2017.
Improving Document Clustering by Eliminating Unnatural Language. Jang, M.; Choi, J. D.; and Allan, J. Proceedings of the EMNLP Workshop on Noisy User-generated Text (WNUT), 2017.
Computational Exploration of the Linguistic Structures of Future-Oriented Expression: Classification and Categorization. Nie, A.; Shepard, J.; Choi, J. D.; Copley, B.; and Wolff, P. Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop (NAACL:SRW), 2015.