Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.11779/1998
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Manav, Y. | - |
dc.contributor.author | Menevşe, M.Ü. | - |
dc.contributor.author | Özgür, A. | - |
dc.contributor.author | Arısoy, Ebru | - |
dc.date.accessioned | 2023-10-18T12:13:23Z | |
dc.date.available | 2023-10-18T12:13:23Z | |
dc.date.issued | 2022 | - |
dc.identifier.citation | Menevşe, M. Ü., Manav, Y., Arisoy, E., & Özgür, A. (2022, December). A Framework for Automatic Generation of Spoken Question-Answering Data. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 4659-4666). | en_US |
dc.identifier.uri | https://hdl.handle.net/20.500.11779/1998 | - |
dc.description | The authors would like to thank Şeniz Demir for providing the Turkish Wikipedia dataset, Emrah Budur for providing the English to Turkish machine translated SQuAD dataset and the anonymous reviewers for their valuable feedback. | en_US |
dc.description.abstract | This paper describes a framework to automatically generate a spoken question answering (QA) dataset. The framework consists of a question generation (QG) module to generate questions automatically from given text documents, a text-to-speech (TTS) module to convert the text documents into spoken form and an automatic speech recognition (ASR) module to transcribe the spoken content. The final dataset contains question-answer pairs for both the reference text and ASR transcriptions as well as the audio files corresponding to each reference text. For QG and ASR systems we used pre-trained multilingual encoder-decoder transformer models and fine-tuned these models using a limited amount of manually generated QA data and TTS-based speech data, respectively. As a proof of concept, we investigated the proposed framework for Turkish and generated the Turkish Question Answering (TurQuAse) dataset using Wikipedia articles. Manual evaluation of the automatically generated question-answer pairs and QA performance evaluation with state-of-the-art models on TurQuAse show that the proposed framework is efficient for automatically generating spoken QA datasets. To the best of our knowledge, TurQuAse is the first publicly available spoken question answering dataset for Turkish. The proposed framework can be easily extended to other languages where a limited amount of QA data is available. © 2022 Association for Computational Linguistics. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Association for Computational Linguistics (ACL) | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Speech module | en_US |
dc.subject | Turkishs | en_US |
dc.subject | Speech-recognition modules | en_US |
dc.subject | Question-answer pairs | en_US |
dc.subject | Question answering | en_US |
dc.subject | Speech recognition | en_US |
dc.subject | Computational linguistics | en_US |
dc.subject | Audio files | en_US |
dc.subject | Text to speech | en_US |
dc.subject | Automatic speech recognition | en_US |
dc.subject | Text document | en_US |
dc.subject | Character recognition | en_US |
dc.subject | Automatic generation | en_US |
dc.title | A Framework for Automatic Generation of Spoken Question-Answering Data | en_US |
dc.type | Conference Object | en_US |
dc.identifier.scopus | 2-s2.0-85149897199 | en_US |
dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
dc.identifier.endpage | 4695 | en_US |
dc.identifier.startpage | 4688 | en_US |
dc.department | Mühendislik Fakültesi, Elektrik Elektronik Mühendisligi Bölümü | en_US |
dc.relation.journal | 2022 Findings of the Association for Computational Linguistics: EMNLP 2022 -- 7 December 2022 through 11 December 2022 -- 186900 | en_US |
dc.relation.journal | Findings of the Association for Computational Linguistics: EMNLP 2022 | en_US |
dc.institutionauthor | Arısoy, Ebru | - |
item.cerifentitytype | Publications | - |
item.openairecristype | http://purl.org/coar/resource_type/c_18cf | - |
item.openairetype | Conference Object | - |
item.languageiso639-1 | en | - |
item.grantfulltext | embargo_20400101 | - |
item.fulltext | With Fulltext | - |
crisitem.author.dept | 02.05. Department of Electrical and Electronics Engineering | - |
Appears in Collections: | Elektrik Elektronik Mühendisliği Bölümü Koleksiyonu Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2022.findings-emnlp.342.pdf Until 2040-01-01 | Full Text- Article | 178.7 kB | Adobe PDF | View/Open Request a copy |
CORE Recommender
Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.