Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering

Arisoy, EbruMenevse, Merve UnluManav, YusufcanOzgur, Arzucan2025-12-052025-12-0520252308-457Xhttps://doi.org/10.21437/Interspeech.2025-1965Large Language Models (LLMs) are powerful tools for generating synthetic data, offering a promising solution to data scarcity in low-resource scenarios. This study evaluates the effectiveness of LLMs in generating question-answer pairs to enhance the performance of question answering (QA) models trained with limited annotated data. While synthetic data generation has been widely explored for text-based QA, its impact on spoken QA remains underexplored. We specifically investigate the role of LLM-generated data in improving spoken QA models, showing performance gains across both text-based and spoken QA tasks. Experimental results on subsets of the SQuAD, Spoken SQuAD, and a Turkish spoken QA dataset demonstrate significant relative F1 score improvements of 7.8%, 7.0%, and 2.7%, respectively, over models trained solely on restricted human-annotated data. Furthermore, our findings highlight the robustness of LLM-generated data in spoken QA settings, even in the presence of noise.eninfo:eu-repo/semantics/closedAccessinfo:eu-repo/semantics/closedAccessSpoken Question AnsweringLarge Language ModelsData GenerationEvaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question AnsweringEvaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question AnsweringConference Object10.21437/Interspeech.2025-19652-s2.0-105020060826