Turkish Data-To Generation Using Sequence-To Neural Networks

Demir, Şeniz

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11779/1985

Full metadata record

DC Field	Value	Language
dc.contributor.author	Demir, Şeniz	-
dc.date.accessioned	2023-10-18T12:06:13Z	-
dc.date.available	2023-10-18T12:06:13Z	-
dc.date.issued	2023	-
dc.identifier.citation	Demir, S. (2023). Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural Networks. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(2), 1-27.	en_US
dc.identifier.issn	2375-4699	-
dc.identifier.issn	2375-4702	-
dc.identifier.uri	https://hdl.handle.net/20.500.11779/1985	-
dc.identifier.uri	https://doi.org/10.1145/3543826	-
dc.description	TUBITAK-ARDEB [117E977]	en_US
dc.description	This work is supported by TUBITAK-ARDEB under the grant number 117E977.	en_US
dc.description.abstract	End-to-end data-driven approaches lead to rapid development of language generation and dialogue systems. Despite the need for large amounts of well-organized data, these approaches jointly learn multiple components of the traditional generation pipeline without requiring costly human intervention. End-to-end approaches also enable the use of loosely aligned parallel datasets in system development by relaxing the degree of semantic correspondences between training data representations and text spans. However, their potential in Turkish language generation has not yet been fully exploited. In this work, we apply sequenceto-sequence (Seq2Seq) neural models to Turkish data-to-text generation where the input data given in the form of a meaning representation is verbalized. We explore encoder-decoder architectures with attention mechanism in unidirectional, bidirectional, and stacked recurrent neural network (RNN) models. Our models generate one-sentence biographies and dining venue descriptions using a crowdsourced dataset where all field value pairs that appear in meaning representations are fully captured in reference sentences. To support this work, we also explore the performances of our models on a more challenging dataset, where the content of a meaning representation is too large to fit into a single sentence, and hence content selection and surface realization need to be learned jointly. This dataset is retrieved by coupling introductory sentences of person-related Turkish Wikipedia articles with their contained infobox tables. Our empirical experiments on both datasets demonstrate that Seq2Seq models are capable of generating coherent and fluent biographies and venue descriptions from field value pairs. We argue that the wealth of knowledge residing in our datasets and the insights obtained fromthis study hold the potential to give rise to the development of new end-to-end generation approaches for Turkish and other morphologically rich languages.	en_US
dc.language.iso	en	en_US
dc.publisher	Assoc Computing Machinery	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Of-the-art	en_US
dc.subject	Sequence-to-sequence model	en_US
dc.subject	Turkish	en_US
dc.subject	Wikipedia	en_US
dc.subject	Natural-language generation	en_US
dc.subject	Data-to-text generation	en_US
dc.title	Turkish Data-To Generation Using Sequence-To Neural Networks	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1145/3543826	-
dc.identifier.scopus	2-s2.0-85152906599	en_US
dc.description.PublishedMonth	Aralık	en_US
dc.description.woscitationindex	Science Citation Index Expanded	-
dc.identifier.wosquality	Q4	-
dc.description.WoSDocumentType	article	-
dc.description.WoSInternationalCollaboration	Uluslararası işbirliği ile yapılmayan - HAYIR	en_US
dc.description.WoSPublishedMonth	Nisan	en_US
dc.description.WoSIndexDate	2023	en_US
dc.description.WoSYOKperiod	YÖK - 2022-23	en_US
dc.identifier.scopusquality	Q2	-
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.identifier.issue	2	en_US
dc.identifier.volume	22	en_US
dc.department	Mühendislik Fakültesi, Elektrik Elektronik Mühendisligi Bölümü	en_US
dc.relation.journal	Acm Transactions on Asian and Low-Resource Language Information Processing	en_US
dc.identifier.wos	WOS:000963394900006	en_US
dc.institutionauthor	Demir, Şeniz	-
item.grantfulltext	none	-
item.fulltext	No Fulltext	-
item.languageiso639-1	en	-
item.openairetype	Article	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.cerifentitytype	Publications	-
crisitem.author.dept	02.02. Department of Computer Engineering	-
Appears in Collections:	Elektrik Elektronik Mühendisliği Bölümü Koleksiyonu Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Show simple item record

CORE Recommender

SCOPUS^TM
Citations

2

checked on Nov 16, 2024

WEB OF SCIENCE^TM
Citations

1

checked on Nov 16, 2024

Page view(s)

66

checked on Nov 18, 2024

Google Scholar^TM

Check

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM