Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.11779/2303
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Arısoy, Ebru | - |
dc.contributor.author | Özgür, Arzucan | - |
dc.contributor.author | Ünlü Menevşe, Merve | - |
dc.contributor.author | Manav, Yusufcan | - |
dc.date.accessioned | 2024-06-21T17:28:17Z | - |
dc.date.available | 2024-06-21T17:28:17Z | - |
dc.date.issued | 2024 | - |
dc.identifier.isbn | 9782493814104 | - |
dc.identifier.uri | https://hdl.handle.net/20.500.11779/2303 | - |
dc.description | Aequa-Tech; Baidu; Bloomberg; Dataforce (Transperfect); et al.; Intesa San Paolo Bank | en_US |
dc.description.abstract | This paper focuses on dealing with data scarcity in spoken question answering (QA) using automatic question-answer generation and a carefully selected fine-tuning strategy that leverages limited annotated data (paragraphs and question-answer pairs). Spoken QA is a challenging task due to using spoken documents, i.e., erroneous automatic speech recognition (ASR) transcriptions, and the scarcity of spoken QA data. We propose a framework for utilizing limited annotated data effectively to improve spoken QA performance. To deal with data scarcity, we train a question-answer generation model with annotated data and then produce large amounts of question-answer pairs from unannotated data (paragraphs). Our experiments demonstrate that incorporating limited annotated data and the automatically generated data through a carefully selected fine-tuning strategy leads to 5.5% relative F1 gain over the model trained only with annotated data. Moreover, the proposed framework is also effective in high ASR errors. © 2024 ELRA Language Resource Association: CC BY-NC 4.0. | en_US |
dc.language.iso | en | en_US |
dc.publisher | European Language Resources Association (ELRA) | en_US |
dc.relation.ispartof | 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings -- Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 -- 20 May 2024 through 25 May 2024 -- Hybrid, Torino -- 199620 | en_US |
dc.subject | Spoken question answering | en_US |
dc.subject | Question generation | en_US |
dc.title | Dealing With Data Scarcity in Spoken Question Answering | en_US |
dc.type | Conference Object | en_US |
dc.identifier.scopus | 2-s2.0-85195947153 | en_US |
dc.authorscopusid | 58137783500 | - |
dc.authorscopusid | 57219551922 | - |
dc.authorscopusid | 14030977200 | - |
dc.authorscopusid | 56230487200 | - |
dc.description.PublishedMonth | Mayıs | en_US |
dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
dc.identifier.endpage | 4455 | en_US |
dc.identifier.startpage | 4449 | en_US |
dc.department | Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümü | en_US |
dc.institutionauthor | Arısoy, Ebru | - |
dc.identifier.citationcount | 0 | - |
item.grantfulltext | restricted | - |
item.fulltext | With Fulltext | - |
item.languageiso639-1 | en | - |
item.openairetype | Conference Object | - |
item.openairecristype | http://purl.org/coar/resource_type/c_18cf | - |
item.cerifentitytype | Publications | - |
crisitem.author.dept | 02.05. Department of Electrical and Electronics Engineering | - |
Appears in Collections: | Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Full Text - Article.pdf Restricted Access | 991.32 kB | Adobe PDF | View/Open Request a copy |
CORE Recommender
Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.