Dealing With Data Scarcity in Spoken Question Answering
| dc.contributor.author | Arısoy, Ebru | |
| dc.contributor.author | Özgür, Arzucan | |
| dc.contributor.author | Ünlü Menevşe, Merve | |
| dc.contributor.author | Manav, Yusufcan | |
| dc.date.accessioned | 2024-06-21T17:28:17Z | |
| dc.date.available | 2024-06-21T17:28:17Z | |
| dc.date.issued | 2024 | |
| dc.description | Aequa-Tech; Baidu; Bloomberg; Dataforce (Transperfect); et al.; Intesa San Paolo Bank | |
| dc.description.abstract | This paper focuses on dealing with data scarcity in spoken question answering (QA) using automatic question-answer generation and a carefully selected fine-tuning strategy that leverages limited annotated data (paragraphs and question-answer pairs). Spoken QA is a challenging task due to using spoken documents, i.e., erroneous automatic speech recognition (ASR) transcriptions, and the scarcity of spoken QA data. We propose a framework for utilizing limited annotated data effectively to improve spoken QA performance. To deal with data scarcity, we train a question-answer generation model with annotated data and then produce large amounts of question-answer pairs from unannotated data (paragraphs). Our experiments demonstrate that incorporating limited annotated data and the automatically generated data through a carefully selected fine-tuning strategy leads to 5.5% relative F1 gain over the model trained only with annotated data. Moreover, the proposed framework is also effective in high ASR errors. © 2024 ELRA Language Resource Association: CC BY-NC 4.0. | |
| dc.identifier.citationcount | 0 | |
| dc.identifier.isbn | 9782493814104 | |
| dc.identifier.scopus | 2-s2.0-85195947153 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.11779/2303 | |
| dc.language.iso | en | |
| dc.publisher | European Language Resources Association (ELRA) | |
| dc.relation.ispartof | 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings -- Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 -- 20 May 2024 through 25 May 2024 -- Hybrid, Torino -- 199620 | |
| dc.rights | info:eu-repo/semantics/openAccess | |
| dc.subject | Spoken question answering | |
| dc.subject | Question generation | |
| dc.title | Dealing With Data Scarcity in Spoken Question Answering | |
| dc.type | Conference Object | |
| dspace.entity.type | Publication | |
| gdc.author.institutional | Arısoy, Ebru | |
| gdc.author.institutional | Arısoy Saraçlar, Ebru | |
| gdc.author.scopusid | 58137783500 | |
| gdc.author.scopusid | 57219551922 | |
| gdc.author.scopusid | 14030977200 | |
| gdc.author.scopusid | 56230487200 | |
| gdc.coar.access | open access | |
| gdc.coar.type | text::conference output | |
| gdc.description.department | Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümü | |
| gdc.description.endpage | 4455 | |
| gdc.description.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | |
| gdc.description.scopusquality | N/A | |
| gdc.description.startpage | 4449 | |
| gdc.description.wosquality | N/A | |
| gdc.publishedmonth | Mayıs | |
| gdc.scopus.citedcount | 0 | |
| gdc.wos.publishedmonth | Mayıs | |
| gdc.wos.yokperiod | YÖK - 2023-24 | |
| relation.isAuthorOfPublication | 0b895153-5793-4e46-bc2f-06a28b30f531 | |
| relation.isAuthorOfPublication.latestForDiscovery | 0b895153-5793-4e46-bc2f-06a28b30f531 | |
| relation.isOrgUnitOfPublication | de19334f-6a5b-4f7b-9410-9433c48d1e5a | |
| relation.isOrgUnitOfPublication | 0d54cd31-4133-46d5-b5cc-280b2c077ac3 | |
| relation.isOrgUnitOfPublication | a6e60d5c-b0c7-474a-b49b-284dc710c078 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | de19334f-6a5b-4f7b-9410-9433c48d1e5a |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Full Text - Article.pdf
- Size:
- 991.32 KB
- Format:
- Adobe Portable Document Format