Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering

dc.contributor.author Arisoy, Ebru
dc.contributor.author Menevse, Merve Unlu
dc.contributor.author Manav, Yusufcan
dc.contributor.author Ozgur, Arzucan
dc.date.accessioned 2025-12-05T17:08:11Z
dc.date.available 2025-12-05T17:08:11Z
dc.date.issued 2025
dc.description.abstract Large Language Models (LLMs) are powerful tools for generating synthetic data, offering a promising solution to data scarcity in low-resource scenarios. This study evaluates the effectiveness of LLMs in generating question-answer pairs to enhance the performance of question answering (QA) models trained with limited annotated data. While synthetic data generation has been widely explored for text-based QA, its impact on spoken QA remains underexplored. We specifically investigate the role of LLM-generated data in improving spoken QA models, showing performance gains across both text-based and spoken QA tasks. Experimental results on subsets of the SQuAD, Spoken SQuAD, and a Turkish spoken QA dataset demonstrate significant relative F1 score improvements of 7.8%, 7.0%, and 2.7%, respectively, over models trained solely on restricted human-annotated data. Furthermore, our findings highlight the robustness of LLM-generated data in spoken QA settings, even in the presence of noise. en_US
dc.identifier.doi 10.21437/Interspeech.2025-1965
dc.identifier.issn 2308-457X
dc.identifier.scopus 2-s2.0-105020060826
dc.identifier.uri https://doi.org/10.21437/Interspeech.2025-1965
dc.language.iso en
dc.language.iso en en_US
dc.publisher International Speech Communication Association
dc.publisher ISCA-Int Speech Communication Assoc en_US
dc.relation.ispartof Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH -- 26th Interspeech Conference 2025 -- 2025-08-17 through 2025-08-21 -- Rotterdam -- 213554
dc.relation.ispartof 2025 Interspeech Conference -- Aug 17-21, 2025 -- Rotterdam, Netherlands en_US
dc.relation.ispartofseries Interspeech
dc.rights info:eu-repo/semantics/closedAccess
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Spoken Question Answering en_US
dc.subject Large Language Models en_US
dc.subject Data Generation en_US
dc.title Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering
dc.title Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering en_US
dc.type Conference Object
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.institutional Arısoy, Ebru
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.collaboration.industrial false
gdc.description.department Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümü
gdc.description.department Mef University en_US
gdc.description.departmenttemp [Arisoy, Ebru] MEF Univ, Elect & Elect Engn, Maslak, Turkiye; [Menevse, Merve Unlu; Ozgur, Arzucan] Bogazici Univ, Comp Engn, Istanbul, Turkiye; [Manav, Yusufcan] Allianz Partners, Munich, Germany en_US
gdc.description.endpage 1777 en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality N/A
gdc.description.startpage 1773 en_US
gdc.description.woscitationindex Conference Proceedings Citation Index - Science - Conference Proceedings Citation Index - Social Science & Humanities
gdc.description.wosquality N/A
gdc.identifier.openalex W4415432975
gdc.identifier.wos WOS:001585350500359
gdc.index.type WoS
gdc.index.type Scopus
gdc.openalex.fwci 2.8039
gdc.openalex.normalizedpercentile 0.93
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 0
gdc.plumx.mendeley 1
gdc.plumx.scopuscites 0
gdc.publishedmonth Ağustos
gdc.scopus.citedcount 0
gdc.virtual.author Arısoy Saraçlar, Ebru
gdc.wos.citedcount 0
gdc.yokperiod YÖK - 2024-25
relation.isAuthorOfPublication 0b895153-5793-4e46-bc2f-06a28b30f531
relation.isAuthorOfPublication.latestForDiscovery 0b895153-5793-4e46-bc2f-06a28b30f531
relation.isOrgUnitOfPublication a6e60d5c-b0c7-474a-b49b-284dc710c078
relation.isOrgUnitOfPublication 0d54cd31-4133-46d5-b5cc-280b2c077ac3
relation.isOrgUnitOfPublication de19334f-6a5b-4f7b-9410-9433c48d1e5a
relation.isOrgUnitOfPublication.latestForDiscovery a6e60d5c-b0c7-474a-b49b-284dc710c078

Files