Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11779/1998
Full metadata record
DC FieldValueLanguage
dc.contributor.authorManav, Y.-
dc.contributor.authorMenevşe, M.Ü.-
dc.contributor.authorÖzgür, A.-
dc.contributor.authorArısoy, Ebru-
dc.date.accessioned2023-10-18T12:13:23Z
dc.date.available2023-10-18T12:13:23Z
dc.date.issued2022-
dc.identifier.citationMenevşe, M. Ü., Manav, Y., Arisoy, E., & Özgür, A. (2022, December). A Framework for Automatic Generation of Spoken Question-Answering Data. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 4659-4666).en_US
dc.identifier.urihttps://hdl.handle.net/20.500.11779/1998-
dc.descriptionThe authors would like to thank Şeniz Demir for providing the Turkish Wikipedia dataset, Emrah Budur for providing the English to Turkish machine translated SQuAD dataset and the anonymous reviewers for their valuable feedback.en_US
dc.description.abstractThis paper describes a framework to automatically generate a spoken question answering (QA) dataset. The framework consists of a question generation (QG) module to generate questions automatically from given text documents, a text-to-speech (TTS) module to convert the text documents into spoken form and an automatic speech recognition (ASR) module to transcribe the spoken content. The final dataset contains question-answer pairs for both the reference text and ASR transcriptions as well as the audio files corresponding to each reference text. For QG and ASR systems we used pre-trained multilingual encoder-decoder transformer models and fine-tuned these models using a limited amount of manually generated QA data and TTS-based speech data, respectively. As a proof of concept, we investigated the proposed framework for Turkish and generated the Turkish Question Answering (TurQuAse) dataset using Wikipedia articles. Manual evaluation of the automatically generated question-answer pairs and QA performance evaluation with state-of-the-art models on TurQuAse show that the proposed framework is efficient for automatically generating spoken QA datasets. To the best of our knowledge, TurQuAse is the first publicly available spoken question answering dataset for Turkish. The proposed framework can be easily extended to other languages where a limited amount of QA data is available. © 2022 Association for Computational Linguistics.en_US
dc.language.isoenen_US
dc.publisherAssociation for Computational Linguistics (ACL)en_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectSpeech moduleen_US
dc.subjectTurkishsen_US
dc.subjectSpeech-recognition modulesen_US
dc.subjectQuestion-answer pairsen_US
dc.subjectQuestion answeringen_US
dc.subjectSpeech recognitionen_US
dc.subjectComputational linguisticsen_US
dc.subjectAudio filesen_US
dc.subjectText to speechen_US
dc.subjectAutomatic speech recognitionen_US
dc.subjectText documenten_US
dc.subjectCharacter recognitionen_US
dc.subjectAutomatic generationen_US
dc.titleA Framework for Automatic Generation of Spoken Question-Answering Dataen_US
dc.typeConference Objecten_US
dc.identifier.scopus2-s2.0-85149897199en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.identifier.endpage4695en_US
dc.identifier.startpage4688en_US
dc.departmentMühendislik Fakültesi, Elektrik Elektronik Mühendisligi Bölümüen_US
dc.relation.journal2022 Findings of the Association for Computational Linguistics: EMNLP 2022 -- 7 December 2022 through 11 December 2022 -- 186900en_US
dc.relation.journalFindings of the Association for Computational Linguistics: EMNLP 2022en_US
dc.institutionauthorArısoy, Ebru-
item.cerifentitytypePublications-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.openairetypeConference Object-
item.languageiso639-1en-
item.grantfulltextembargo_20400101-
item.fulltextWith Fulltext-
crisitem.author.dept02.05. Department of Electrical and Electronics Engineering-
Appears in Collections:Elektrik Elektronik Mühendisliği Bölümü Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
Files in This Item:
File Description SizeFormat 
2022.findings-emnlp.342.pdf
  Until 2040-01-01
Full Text- Article178.7 kBAdobe PDFView/Open    Request a copy
Show simple item record



CORE Recommender

Page view(s)

50
checked on Nov 25, 2024

Google ScholarTM

Check





Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.