Dealing With Data Scarcity in Spoken Question Answering

Arısoy, Ebru; Özgür, Arzucan; Ünlü Menevşe, Merve; Manav, Yusufcan

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11779/2303

Full metadata record

DC Field	Value	Language
dc.contributor.author	Arısoy, Ebru	-
dc.contributor.author	Özgür, Arzucan	-
dc.contributor.author	Ünlü Menevşe, Merve	-
dc.contributor.author	Manav, Yusufcan	-
dc.date.accessioned	2024-06-21T17:28:17Z	-
dc.date.available	2024-06-21T17:28:17Z	-
dc.date.issued	2024	-
dc.identifier.isbn	9782493814104	-
dc.identifier.uri	https://hdl.handle.net/20.500.11779/2303	-
dc.description	Aequa-Tech; Baidu; Bloomberg; Dataforce (Transperfect); et al.; Intesa San Paolo Bank	en_US
dc.description.abstract	This paper focuses on dealing with data scarcity in spoken question answering (QA) using automatic question-answer generation and a carefully selected fine-tuning strategy that leverages limited annotated data (paragraphs and question-answer pairs). Spoken QA is a challenging task due to using spoken documents, i.e., erroneous automatic speech recognition (ASR) transcriptions, and the scarcity of spoken QA data. We propose a framework for utilizing limited annotated data effectively to improve spoken QA performance. To deal with data scarcity, we train a question-answer generation model with annotated data and then produce large amounts of question-answer pairs from unannotated data (paragraphs). Our experiments demonstrate that incorporating limited annotated data and the automatically generated data through a carefully selected fine-tuning strategy leads to 5.5% relative F1 gain over the model trained only with annotated data. Moreover, the proposed framework is also effective in high ASR errors. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.	en_US
dc.language.iso	en	en_US
dc.publisher	European Language Resources Association (ELRA)	en_US
dc.relation.ispartof	2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings -- Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 -- 20 May 2024 through 25 May 2024 -- Hybrid, Torino -- 199620	en_US
dc.subject	Spoken question answering	en_US
dc.subject	Question generation	en_US
dc.title	Dealing With Data Scarcity in Spoken Question Answering	en_US
dc.type	Conference Object	en_US
dc.identifier.scopus	2-s2.0-85195947153	en_US
dc.authorscopusid	58137783500	-
dc.authorscopusid	57219551922	-
dc.authorscopusid	14030977200	-
dc.authorscopusid	56230487200	-
dc.description.PublishedMonth	Mayıs	en_US
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.identifier.endpage	4455	en_US
dc.identifier.startpage	4449	en_US
dc.department	Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümü	en_US
dc.institutionauthor	Arısoy, Ebru	-
dc.identifier.citationcount	0	-
item.grantfulltext	restricted	-
item.fulltext	With Fulltext	-
item.languageiso639-1	en	-
item.openairetype	Conference Object	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.cerifentitytype	Publications	-
crisitem.author.dept	02.05. Department of Electrical and Electronics Engineering	-
Appears in Collections:	Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Files in This Item:

File	Size	Format
Full Text - Article.pdf Restricted Access	991.32 kB	Adobe PDF	View/Open Request a copy

Show simple item record

CORE Recommender

Page view(s)

64

checked on Nov 18, 2024

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Google ScholarTM

Altmetric

Google Scholar^TM