A Framework for Automatic Generation of Spoken Question-Answering Data

Manav, Y.; Menevşe, M.Ü.; Özgür, A.; Arısoy, Ebru

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11779/1998

Full metadata record

DC Field	Value	Language
dc.contributor.author	Manav, Y.	-
dc.contributor.author	Menevşe, M.Ü.	-
dc.contributor.author	Özgür, A.	-
dc.contributor.author	Arısoy, Ebru	-
dc.date.accessioned	2023-10-18T12:13:23Z
dc.date.available	2023-10-18T12:13:23Z
dc.date.issued	2022	-
dc.identifier.citation	Menevşe, M. Ü., Manav, Y., Arisoy, E., & Özgür, A. (2022, December). A Framework for Automatic Generation of Spoken Question-Answering Data. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 4659-4666).	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.11779/1998	-
dc.description	The authors would like to thank Şeniz Demir for providing the Turkish Wikipedia dataset, Emrah Budur for providing the English to Turkish machine translated SQuAD dataset and the anonymous reviewers for their valuable feedback.	en_US
dc.description.abstract	This paper describes a framework to automatically generate a spoken question answering (QA) dataset. The framework consists of a question generation (QG) module to generate questions automatically from given text documents, a text-to-speech (TTS) module to convert the text documents into spoken form and an automatic speech recognition (ASR) module to transcribe the spoken content. The final dataset contains question-answer pairs for both the reference text and ASR transcriptions as well as the audio files corresponding to each reference text. For QG and ASR systems we used pre-trained multilingual encoder-decoder transformer models and fine-tuned these models using a limited amount of manually generated QA data and TTS-based speech data, respectively. As a proof of concept, we investigated the proposed framework for Turkish and generated the Turkish Question Answering (TurQuAse) dataset using Wikipedia articles. Manual evaluation of the automatically generated question-answer pairs and QA performance evaluation with state-of-the-art models on TurQuAse show that the proposed framework is efficient for automatically generating spoken QA datasets. To the best of our knowledge, TurQuAse is the first publicly available spoken question answering dataset for Turkish. The proposed framework can be easily extended to other languages where a limited amount of QA data is available. © 2022 Association for Computational Linguistics.	en_US
dc.language.iso	en	en_US
dc.publisher	Association for Computational Linguistics (ACL)	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Speech module	en_US
dc.subject	Turkishs	en_US
dc.subject	Speech-recognition modules	en_US
dc.subject	Question-answer pairs	en_US
dc.subject	Question answering	en_US
dc.subject	Speech recognition	en_US
dc.subject	Computational linguistics	en_US
dc.subject	Audio files	en_US
dc.subject	Text to speech	en_US
dc.subject	Automatic speech recognition	en_US
dc.subject	Text document	en_US
dc.subject	Character recognition	en_US
dc.subject	Automatic generation	en_US
dc.title	A Framework for Automatic Generation of Spoken Question-Answering Data	en_US
dc.type	Conference Object	en_US
dc.identifier.scopus	2-s2.0-85149897199	en_US
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.identifier.endpage	4695	en_US
dc.identifier.startpage	4688	en_US
dc.department	Mühendislik Fakültesi, Elektrik Elektronik Mühendisligi Bölümü	en_US
dc.relation.journal	2022 Findings of the Association for Computational Linguistics: EMNLP 2022 -- 7 December 2022 through 11 December 2022 -- 186900	en_US
dc.relation.journal	Findings of the Association for Computational Linguistics: EMNLP 2022	en_US
dc.institutionauthor	Arısoy, Ebru	-
item.cerifentitytype	Publications	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.openairetype	Conference Object	-
item.languageiso639-1	en	-
item.grantfulltext	embargo_20400101	-
item.fulltext	With Fulltext	-
crisitem.author.dept	02.05. Department of Electrical and Electronics Engineering	-
Appears in Collections:	Elektrik Elektronik Mühendisliği Bölümü Koleksiyonu Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Files in This Item:

File	Description	Size	Format
2022.findings-emnlp.342.pdf Until 2040-01-01	Full Text- Article	178.7 kB	Adobe PDF	View/Open Request a copy

Show simple item record

CORE Recommender

Page view(s)

50

checked on Nov 25, 2024

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Google ScholarTM

Google Scholar^TM