Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.11779/2303
Title: | Dealing With Data Scarcity in Spoken Question Answering | Authors: | Arısoy, Ebru Özgür, Arzucan Ünlü Menevşe, Merve Manav, Yusufcan |
Keywords: | Spoken question answering Question generation |
Publisher: | European Language Resources Association (ELRA) | Abstract: | This paper focuses on dealing with data scarcity in spoken question answering (QA) using automatic question-answer generation and a carefully selected fine-tuning strategy that leverages limited annotated data (paragraphs and question-answer pairs). Spoken QA is a challenging task due to using spoken documents, i.e., erroneous automatic speech recognition (ASR) transcriptions, and the scarcity of spoken QA data. We propose a framework for utilizing limited annotated data effectively to improve spoken QA performance. To deal with data scarcity, we train a question-answer generation model with annotated data and then produce large amounts of question-answer pairs from unannotated data (paragraphs). Our experiments demonstrate that incorporating limited annotated data and the automatically generated data through a carefully selected fine-tuning strategy leads to 5.5% relative F1 gain over the model trained only with annotated data. Moreover, the proposed framework is also effective in high ASR errors. © 2024 ELRA Language Resource Association: CC BY-NC 4.0. | Description: | Aequa-Tech; Baidu; Bloomberg; Dataforce (Transperfect); et al.; Intesa San Paolo Bank | URI: | https://hdl.handle.net/20.500.11779/2303 | ISBN: | 9782493814104 |
Appears in Collections: | Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Full Text - Article.pdf Restricted Access | 991.32 kB | Adobe PDF | View/Open Request a copy |
CORE Recommender
Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.