Fact Checking in Low-Resource Languages: A New Dataset and Transformer Model for the Burmese Language
Wednesday, 13 November 2024 | 09:00 to 11:00 EDT | Room 857, Eighth Floor, Kaneff Tower, Keele Campus, York University and virtually via Zoom
With Lwin Moe, York University
Respondent: Ye Min Tun, Adjunct Professor, Johns Hopkins School of Advanced International Studies
Misinformation on Burmese social media has become a serious problem, fueling hate speech and violence, especially during the 2017 Rohingya genocide. Despite efforts by platforms like Facebook to restrain harmful content using AI and Burmese-speaking moderators, the lack of resources and fact checking datasets in Burmese remains a major obstacle. This paper addresses this challenge by creating a large fact checking dataset and NLP models for Burmese. Using a machine-translated version of the Fake News Challenge (FNC-1) dataset, named FNC-1B, three BERT-based classifiers were trained and evaluated. The best-performing model achieved 81\% accuracy on the translated FNC-1B dataset, and 79\% on a manually annotated Burmese dataset. This performance is comparable to that of the BERT model for fact checking in English, which yielded 82\% accuracy on the English version of the dataset. This research represents a crucial first step toward creating fact checking datasets and tools tailored to Burmese, which are essential for combating misinformation online. The results also show that language-specific pre-training for BERT models is critical in handling misinformation in low-resource languages like Burmese. Moreover, the study demonstrates that models trained on a translated dataset can effectively perform fact checking tasks in real-world Burmese contexts, offering a viable solution for mitigating the spread of false information online.
This event is part of the Burma Past and Present: Religion, Ethnicity and Power, a series of readings and discussion of works in progress. We will be reading and discussing work in progress with the author. Please email hmlwin@yorku.ca to receive a copy of the reading.
This event is hybrid. Virtual attendees should register at this link.