Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering

Arisoy, Ebru; Menevse, Merve Unlu; Manav, Yusufcan; Ozgur, Arzucan

Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering

dc.contributor.author	Arisoy, Ebru
dc.contributor.author	Menevse, Merve Unlu
dc.contributor.author	Manav, Yusufcan
dc.contributor.author	Ozgur, Arzucan
dc.date.accessioned	2025-12-05T17:08:11Z
dc.date.available	2025-12-05T17:08:11Z
dc.date.issued	2025
dc.description.abstract	Large Language Models (LLMs) are powerful tools for generating synthetic data, offering a promising solution to data scarcity in low-resource scenarios. This study evaluates the effectiveness of LLMs in generating question-answer pairs to enhance the performance of question answering (QA) models trained with limited annotated data. While synthetic data generation has been widely explored for text-based QA, its impact on spoken QA remains underexplored. We specifically investigate the role of LLM-generated data in improving spoken QA models, showing performance gains across both text-based and spoken QA tasks. Experimental results on subsets of the SQuAD, Spoken SQuAD, and a Turkish spoken QA dataset demonstrate significant relative F1 score improvements of 7.8%, 7.0%, and 2.7%, respectively, over models trained solely on restricted human-annotated data. Furthermore, our findings highlight the robustness of LLM-generated data in spoken QA settings, even in the presence of noise.	en_US
dc.identifier.doi	10.21437/Interspeech.2025-1965
dc.identifier.issn	2308-457X
dc.identifier.scopus	2-s2.0-105020060826
dc.identifier.uri	https://doi.org/10.21437/Interspeech.2025-1965
dc.language.iso	en
dc.language.iso	en	en_US
dc.publisher	International Speech Communication Association
dc.publisher	ISCA-Int Speech Communication Assoc	en_US
dc.relation.ispartof	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH -- 26th Interspeech Conference 2025 -- 2025-08-17 through 2025-08-21 -- Rotterdam -- 213554
dc.relation.ispartof	2025 Interspeech Conference -- Aug 17-21, 2025 -- Rotterdam, Netherlands	en_US
dc.relation.ispartofseries	Interspeech
dc.rights	info:eu-repo/semantics/closedAccess
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Spoken Question Answering	en_US
dc.subject	Large Language Models	en_US
dc.subject	Data Generation	en_US
dc.title	Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering
dc.title	Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering	en_US
dc.type	Conference Object
dc.type	Conference Object	en_US
dspace.entity.type	Publication
gdc.author.institutional	Arısoy, Ebru
gdc.coar.access	metadata only access
gdc.coar.type	text::conference output
gdc.collaboration.industrial	false
gdc.description.department	Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümü
gdc.description.department	Mef University	en_US
gdc.description.departmenttemp	[Arisoy, Ebru] MEF Univ, Elect & Elect Engn, Maslak, Turkiye; [Menevse, Merve Unlu; Ozgur, Arzucan] Bogazici Univ, Comp Engn, Istanbul, Turkiye; [Manav, Yusufcan] Allianz Partners, Munich, Germany	en_US
gdc.description.endpage	1777	en_US
gdc.description.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
gdc.description.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
gdc.description.scopusquality	N/A
gdc.description.startpage	1773	en_US
gdc.description.woscitationindex	Conference Proceedings Citation Index - Science - Conference Proceedings Citation Index - Social Science & Humanities
gdc.description.wosquality	N/A
gdc.identifier.openalex	W4415432975
gdc.identifier.wos	WOS:001585350500359
gdc.index.type	WoS
gdc.index.type	Scopus
gdc.openalex.fwci	2.8039
gdc.openalex.normalizedpercentile	0.93
gdc.openalex.toppercent	TOP 10%
gdc.opencitations.count	0
gdc.plumx.mendeley	1
gdc.plumx.scopuscites	0
gdc.publishedmonth	Ağustos
gdc.scopus.citedcount	0
gdc.virtual.author	Arısoy Saraçlar, Ebru
gdc.wos.citedcount	0
gdc.yokperiod	YÖK - 2024-25
relation.isAuthorOfPublication	0b895153-5793-4e46-bc2f-06a28b30f531
relation.isAuthorOfPublication.latestForDiscovery	0b895153-5793-4e46-bc2f-06a28b30f531
relation.isOrgUnitOfPublication	a6e60d5c-b0c7-474a-b49b-284dc710c078
relation.isOrgUnitOfPublication	0d54cd31-4133-46d5-b5cc-280b2c077ac3
relation.isOrgUnitOfPublication	de19334f-6a5b-4f7b-9410-9433c48d1e5a
relation.isOrgUnitOfPublication.latestForDiscovery	a6e60d5c-b0c7-474a-b49b-284dc710c078

Collections

WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering

Files

Collections