Dealing With Data Scarcity in Spoken Question Answering

dc.contributor.author Arısoy, Ebru
dc.contributor.author Özgür, Arzucan
dc.contributor.author Ünlü Menevşe, Merve
dc.contributor.author Manav, Yusufcan
dc.date.accessioned 2024-06-21T17:28:17Z
dc.date.available 2024-06-21T17:28:17Z
dc.date.issued 2024
dc.description Aequa-Tech; Baidu; Bloomberg; Dataforce (Transperfect); et al.; Intesa San Paolo Bank
dc.description.abstract This paper focuses on dealing with data scarcity in spoken question answering (QA) using automatic question-answer generation and a carefully selected fine-tuning strategy that leverages limited annotated data (paragraphs and question-answer pairs). Spoken QA is a challenging task due to using spoken documents, i.e., erroneous automatic speech recognition (ASR) transcriptions, and the scarcity of spoken QA data. We propose a framework for utilizing limited annotated data effectively to improve spoken QA performance. To deal with data scarcity, we train a question-answer generation model with annotated data and then produce large amounts of question-answer pairs from unannotated data (paragraphs). Our experiments demonstrate that incorporating limited annotated data and the automatically generated data through a carefully selected fine-tuning strategy leads to 5.5% relative F1 gain over the model trained only with annotated data. Moreover, the proposed framework is also effective in high ASR errors. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.
dc.identifier.citationcount 0
dc.identifier.isbn 9782493814104
dc.identifier.scopus 2-s2.0-85195947153
dc.identifier.uri https://hdl.handle.net/20.500.11779/2303
dc.language.iso en
dc.publisher European Language Resources Association (ELRA)
dc.relation.ispartof 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings -- Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 -- 20 May 2024 through 25 May 2024 -- Hybrid, Torino -- 199620
dc.rights info:eu-repo/semantics/openAccess
dc.subject Spoken question answering
dc.subject Question generation
dc.title Dealing With Data Scarcity in Spoken Question Answering
dc.type Conference Object
dspace.entity.type Publication
gdc.author.institutional Arısoy, Ebru
gdc.author.institutional Arısoy Saraçlar, Ebru
gdc.author.scopusid 58137783500
gdc.author.scopusid 57219551922
gdc.author.scopusid 14030977200
gdc.author.scopusid 56230487200
gdc.coar.access open access
gdc.coar.type text::conference output
gdc.description.department Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümü
gdc.description.endpage 4455
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
gdc.description.scopusquality N/A
gdc.description.startpage 4449
gdc.description.wosquality N/A
gdc.publishedmonth Mayıs
gdc.scopus.citedcount 0
gdc.wos.publishedmonth Mayıs
gdc.wos.yokperiod YÖK - 2023-24
relation.isAuthorOfPublication 0b895153-5793-4e46-bc2f-06a28b30f531
relation.isAuthorOfPublication.latestForDiscovery 0b895153-5793-4e46-bc2f-06a28b30f531
relation.isOrgUnitOfPublication de19334f-6a5b-4f7b-9410-9433c48d1e5a
relation.isOrgUnitOfPublication 0d54cd31-4133-46d5-b5cc-280b2c077ac3
relation.isOrgUnitOfPublication a6e60d5c-b0c7-474a-b49b-284dc710c078
relation.isOrgUnitOfPublication.latestForDiscovery de19334f-6a5b-4f7b-9410-9433c48d1e5a

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Full Text - Article.pdf
Size:
991.32 KB
Format:
Adobe Portable Document Format