PhD 2021F: Kaustubh Dhole

NL-Augmenter : Task-specific Natural Language Transformations

Kaustubh Dhole

Date: 2021-10-22 / 3:00 ~ 4:00 PM


Natural Language Transformation or Augmentation comprises methods for increasing the variety of training data for natural language tasks without having to manually collect additional examples. Most strategies either modify existing data, called transformations, or create synthetic data, for example through counterfactual data augmentation, with the aim of having the extended data act as a regularizer to reduce overfitting or biases when training ML models. However, the space of natural language is discrete and simple perturbations cannot capture the entirety and complexity of natural language phenomena. Due to this complexity, we all need to work together to ensure that datasets can be properly evaluated. Toward this goal, NL-Augmenter seeks to gather transformations, perturbations, and filters which can generate additional data to serve for training or to test model robustness. Following the success of open collaborative efforts like BIG-bench and many others, we invited submissions via our participant driven repository called NL-Augmenter. The repository received 162 submissions from around 116 participants.