Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering

Loading...
Publication Logo

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

International Speech Communication Association
ISCA-Int Speech Communication Assoc

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

Research Projects

Journal Issue

Abstract

Large Language Models (LLMs) are powerful tools for generating synthetic data, offering a promising solution to data scarcity in low-resource scenarios. This study evaluates the effectiveness of LLMs in generating question-answer pairs to enhance the performance of question answering (QA) models trained with limited annotated data. While synthetic data generation has been widely explored for text-based QA, its impact on spoken QA remains underexplored. We specifically investigate the role of LLM-generated data in improving spoken QA models, showing performance gains across both text-based and spoken QA tasks. Experimental results on subsets of the SQuAD, Spoken SQuAD, and a Turkish spoken QA dataset demonstrate significant relative F1 score improvements of 7.8%, 7.0%, and 2.7%, respectively, over models trained solely on restricted human-annotated data. Furthermore, our findings highlight the robustness of LLM-generated data in spoken QA settings, even in the presence of noise.

Description

Keywords

Spoken Question Answering, Large Language Models, Data Generation

Fields of Science

Citation

WoS Q

N/A

Scopus Q

N/A
OpenCitations Logo
OpenCitations Citation Count
N/A

Source

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH -- 26th Interspeech Conference 2025 -- 2025-08-17 through 2025-08-21 -- Rotterdam -- 213554
2025 Interspeech Conference -- Aug 17-21, 2025 -- Rotterdam, Netherlands

Volume

Issue

Start Page

1773

End Page

1777
PlumX Metrics
Citations

Scopus : 0

Captures

Mendeley Readers : 1

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
2.8039

Sustainable Development Goals