Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11779/2303
Title: Dealing With Data Scarcity in Spoken Question Answering
Authors: Arısoy, Ebru
Özgür, Arzucan
Ünlü Menevşe, Merve
Manav, Yusufcan
Keywords: Spoken question answering
Question generation
Publisher: European Language Resources Association (ELRA)
Abstract: This paper focuses on dealing with data scarcity in spoken question answering (QA) using automatic question-answer generation and a carefully selected fine-tuning strategy that leverages limited annotated data (paragraphs and question-answer pairs). Spoken QA is a challenging task due to using spoken documents, i.e., erroneous automatic speech recognition (ASR) transcriptions, and the scarcity of spoken QA data. We propose a framework for utilizing limited annotated data effectively to improve spoken QA performance. To deal with data scarcity, we train a question-answer generation model with annotated data and then produce large amounts of question-answer pairs from unannotated data (paragraphs). Our experiments demonstrate that incorporating limited annotated data and the automatically generated data through a carefully selected fine-tuning strategy leads to 5.5% relative F1 gain over the model trained only with annotated data. Moreover, the proposed framework is also effective in high ASR errors. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Description: Aequa-Tech; Baidu; Bloomberg; Dataforce (Transperfect); et al.; Intesa San Paolo Bank
URI: https://hdl.handle.net/20.500.11779/2303
ISBN: 9782493814104
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Files in This Item:
File SizeFormat 
Full Text - Article.pdf
  Restricted Access
991.32 kBAdobe PDFView/Open    Request a copy
Show full item record



CORE Recommender

Page view(s)

64
checked on Nov 18, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.