Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering
| dc.contributor.author | Arisoy, Ebru | |
| dc.contributor.author | Menevse, Merve Unlu | |
| dc.contributor.author | Manav, Yusufcan | |
| dc.contributor.author | Ozgur, Arzucan | |
| dc.date.accessioned | 2025-12-05T17:08:11Z | |
| dc.date.available | 2025-12-05T17:08:11Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Large Language Models (LLMs) are powerful tools for generating synthetic data, offering a promising solution to data scarcity in low-resource scenarios. This study evaluates the effectiveness of LLMs in generating question-answer pairs to enhance the performance of question answering (QA) models trained with limited annotated data. While synthetic data generation has been widely explored for text-based QA, its impact on spoken QA remains underexplored. We specifically investigate the role of LLM-generated data in improving spoken QA models, showing performance gains across both text-based and spoken QA tasks. Experimental results on subsets of the SQuAD, Spoken SQuAD, and a Turkish spoken QA dataset demonstrate significant relative F1 score improvements of 7.8%, 7.0%, and 2.7%, respectively, over models trained solely on restricted human-annotated data. Furthermore, our findings highlight the robustness of LLM-generated data in spoken QA settings, even in the presence of noise. | en_US |
| dc.identifier.doi | 10.21437/Interspeech.2025-1965 | |
| dc.identifier.issn | 2308-457X | |
| dc.identifier.scopus | 2-s2.0-105020060826 | |
| dc.identifier.uri | https://doi.org/10.21437/Interspeech.2025-1965 | |
| dc.language.iso | en | |
| dc.language.iso | en | en_US |
| dc.publisher | International Speech Communication Association | |
| dc.publisher | ISCA-Int Speech Communication Assoc | en_US |
| dc.relation.ispartof | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH -- 26th Interspeech Conference 2025 -- 2025-08-17 through 2025-08-21 -- Rotterdam -- 213554 | |
| dc.relation.ispartof | 2025 Interspeech Conference -- Aug 17-21, 2025 -- Rotterdam, Netherlands | en_US |
| dc.relation.ispartofseries | Interspeech | |
| dc.rights | info:eu-repo/semantics/closedAccess | |
| dc.rights | info:eu-repo/semantics/closedAccess | en_US |
| dc.subject | Spoken Question Answering | en_US |
| dc.subject | Large Language Models | en_US |
| dc.subject | Data Generation | en_US |
| dc.title | Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering | |
| dc.title | Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering | en_US |
| dc.type | Conference Object | |
| dc.type | Conference Object | en_US |
| dspace.entity.type | Publication | |
| gdc.author.institutional | Arısoy, Ebru | |
| gdc.coar.access | metadata only access | |
| gdc.coar.type | text::conference output | |
| gdc.collaboration.industrial | false | |
| gdc.description.department | Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümü | |
| gdc.description.department | Mef University | en_US |
| gdc.description.departmenttemp | [Arisoy, Ebru] MEF Univ, Elect & Elect Engn, Maslak, Turkiye; [Menevse, Merve Unlu; Ozgur, Arzucan] Bogazici Univ, Comp Engn, Istanbul, Turkiye; [Manav, Yusufcan] Allianz Partners, Munich, Germany | en_US |
| gdc.description.endpage | 1777 | en_US |
| gdc.description.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | |
| gdc.description.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
| gdc.description.scopusquality | N/A | |
| gdc.description.startpage | 1773 | en_US |
| gdc.description.woscitationindex | Conference Proceedings Citation Index - Science - Conference Proceedings Citation Index - Social Science & Humanities | |
| gdc.description.wosquality | N/A | |
| gdc.identifier.openalex | W4415432975 | |
| gdc.identifier.wos | WOS:001585350500359 | |
| gdc.index.type | WoS | |
| gdc.index.type | Scopus | |
| gdc.openalex.fwci | 2.8039 | |
| gdc.openalex.normalizedpercentile | 0.93 | |
| gdc.openalex.toppercent | TOP 10% | |
| gdc.opencitations.count | 0 | |
| gdc.plumx.mendeley | 1 | |
| gdc.plumx.scopuscites | 0 | |
| gdc.publishedmonth | Ağustos | |
| gdc.scopus.citedcount | 0 | |
| gdc.virtual.author | Arısoy Saraçlar, Ebru | |
| gdc.wos.citedcount | 0 | |
| gdc.yokperiod | YÖK - 2024-25 | |
| relation.isAuthorOfPublication | 0b895153-5793-4e46-bc2f-06a28b30f531 | |
| relation.isAuthorOfPublication.latestForDiscovery | 0b895153-5793-4e46-bc2f-06a28b30f531 | |
| relation.isOrgUnitOfPublication | a6e60d5c-b0c7-474a-b49b-284dc710c078 | |
| relation.isOrgUnitOfPublication | 0d54cd31-4133-46d5-b5cc-280b2c077ac3 | |
| relation.isOrgUnitOfPublication | de19334f-6a5b-4f7b-9410-9433c48d1e5a | |
| relation.isOrgUnitOfPublication.latestForDiscovery | a6e60d5c-b0c7-474a-b49b-284dc710c078 |
