Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering
| dc.contributor.author | Arisoy, E. | |
| dc.contributor.author | Menevşe, M.U. | |
| dc.contributor.author | Manav, Y. | |
| dc.contributor.author | Özgür, A. | |
| dc.date.accessioned | 2025-12-05T17:08:11Z | |
| dc.date.available | 2025-12-05T17:08:11Z | |
| dc.date.issued | 2025 | |
| dc.description | Meta | en_US |
| dc.description.abstract | Large Language Models (LLMs) are powerful tools for generating synthetic data, offering a promising solution to data scarcity in low-resource scenarios. This study evaluates the effectiveness of LLMs in generating question-answer pairs to enhance the performance of question answering (QA) models trained with limited annotated data. While synthetic data generation has been widely explored for text-based QA, its impact on spoken QA remains underexplored. We specifically investigate the role of LLM-generated data in improving spoken QA models, showing performance gains across both text-based and spoken QA tasks. Experimental results on subsets of the SQuAD, Spoken SQuAD, and a Turkish spoken QA dataset demonstrate significant relative F1 score improvements of 7.8%, 7.0%, and 2.7%, respectively, over models trained solely on restricted human-annotated data. Furthermore, our findings highlight the robustness of LLM-generated data in spoken QA settings, even in the presence of noise. © 2025 International Speech Communication Association. All rights reserved. | en_US |
| dc.identifier.doi | 10.21437/Interspeech.2025-1965 | |
| dc.identifier.isbn | 9781713836902 | |
| dc.identifier.isbn | 9781713820697 | |
| dc.identifier.isbn | 9781605603162 | |
| dc.identifier.isbn | 9781617821233 | |
| dc.identifier.isbn | 9781604234497 | |
| dc.identifier.issn | 1990-9772 | |
| dc.identifier.issn | 2958-1796 | |
| dc.identifier.scopus | 2-s2.0-105020060826 | |
| dc.identifier.uri | https://doi.org/10.21437/Interspeech.2025-1965 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.11779/3141 | |
| dc.language.iso | en | |
| dc.publisher | International Speech Communication Association | |
| dc.relation.ispartof | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH -- 26th Interspeech Conference 2025 -- 2025-08-17 through 2025-08-21 -- Rotterdam -- 213554 | |
| dc.rights | info:eu-repo/semantics/closedAccess | |
| dc.subject | Data Generation | en_US |
| dc.subject | Large Language Models | en_US |
| dc.subject | Spoken Question Answering | en_US |
| dc.title | Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering | |
| dc.type | Conference Object | |
| dspace.entity.type | Publication | |
| gdc.author.institutional | Arısoy, Ebru | |
| gdc.author.scopusid | 14030977200 | |
| gdc.author.scopusid | 58137783500 | |
| gdc.author.scopusid | 57219551922 | |
| gdc.author.scopusid | 56230487200 | |
| gdc.coar.access | metadata only access | |
| gdc.coar.type | text::conference output | |
| gdc.description.department | Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümü | |
| gdc.description.endpage | 1777 | en_US |
| gdc.description.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | |
| gdc.description.scopusquality | N/A | |
| gdc.description.startpage | 1773 | en_US |
| gdc.description.wosquality | N/A | |
| gdc.identifier.openalex | W4415432975 | |
| gdc.index.type | Scopus | |
| gdc.openalex.fwci | 0.0 | |
| gdc.openalex.normalizedpercentile | 0.19 | |
| gdc.openalex.toppercent | TOP 10% | |
| gdc.opencitations.count | 0 | |
| gdc.plumx.mendeley | 1 | |
| gdc.plumx.scopuscites | 0 | |
| gdc.publishedmonth | Ağustos | |
| gdc.scopus.citedcount | 0 | |
| gdc.virtual.author | Arısoy Saraçlar, Ebru | |
| gdc.yokperiod | YÖK - 2024-25 | |
| relation.isAuthorOfPublication | 0b895153-5793-4e46-bc2f-06a28b30f531 | |
| relation.isAuthorOfPublication.latestForDiscovery | 0b895153-5793-4e46-bc2f-06a28b30f531 | |
| relation.isOrgUnitOfPublication | a6e60d5c-b0c7-474a-b49b-284dc710c078 | |
| relation.isOrgUnitOfPublication | 0d54cd31-4133-46d5-b5cc-280b2c077ac3 | |
| relation.isOrgUnitOfPublication | de19334f-6a5b-4f7b-9410-9433c48d1e5a | |
| relation.isOrgUnitOfPublication.latestForDiscovery | a6e60d5c-b0c7-474a-b49b-284dc710c078 |
