Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering

Arisoy, Ebru; Menevse, Merve Unlu; Manav, Yusufcan; Ozgur, Arzucan

Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering

Date

2025

Authors

Publisher

International Speech Communication Association
ISCA-Int Speech Communication Assoc

Abstract

Large Language Models (LLMs) are powerful tools for generating synthetic data, offering a promising solution to data scarcity in low-resource scenarios. This study evaluates the effectiveness of LLMs in generating question-answer pairs to enhance the performance of question answering (QA) models trained with limited annotated data. While synthetic data generation has been widely explored for text-based QA, its impact on spoken QA remains underexplored. We specifically investigate the role of LLM-generated data in improving spoken QA models, showing performance gains across both text-based and spoken QA tasks. Experimental results on subsets of the SQuAD, Spoken SQuAD, and a Turkish spoken QA dataset demonstrate significant relative F1 score improvements of 7.8%, 7.0%, and 2.7%, respectively, over models trained solely on restricted human-annotated data. Furthermore, our findings highlight the robustness of LLM-generated data in spoken QA settings, even in the presence of noise.

Keywords

Spoken Question Answering, Large Language Models, Data Generation

WoS Q

N/A

Scopus Q

N/A

OpenCitations Citation Count

N/A

Source

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH -- 26th Interspeech Conference 2025 -- 2025-08-17 through 2025-08-21 -- Rotterdam -- 213554
2025 Interspeech Conference -- Aug 17-21, 2025 -- Rotterdam, Netherlands

Start Page

1773

End Page

1777

URI

https://doi.org/10.21437/Interspeech.2025-1965

Collections

WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

PlumX Metrics

Citations

Scopus : 0

Captures

Mendeley Readers : 1

Full item page

Google Scholar™

Check

Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

Research Projects

Journal Issue

Abstract

Description

Keywords

Fields of Science

Citation

WoS Q

Scopus Q

OpenCitations Citation Count

Source

Volume

Issue

Start Page

End Page

URI

Collections

PlumX Metrics

Citations

Captures

Google Scholar™

OpenAlex FWCI

2.8039

Sustainable Development Goals