Turkish Data-To Generation Using Sequence-To Neural Networks

dc.contributor.author Demir, Şeniz
dc.date.accessioned 2023-10-18T12:06:13Z
dc.date.available 2023-10-18T12:06:13Z
dc.date.issued 2023
dc.description TUBITAK-ARDEB [117E977]
dc.description This work is supported by TUBITAK-ARDEB under the grant number 117E977.
dc.description.abstract End-to-end data-driven approaches lead to rapid development of language generation and dialogue systems. Despite the need for large amounts of well-organized data, these approaches jointly learn multiple components of the traditional generation pipeline without requiring costly human intervention. End-to-end approaches also enable the use of loosely aligned parallel datasets in system development by relaxing the degree of semantic correspondences between training data representations and text spans. However, their potential in Turkish language generation has not yet been fully exploited. In this work, we apply sequenceto-sequence (Seq2Seq) neural models to Turkish data-to-text generation where the input data given in the form of a meaning representation is verbalized. We explore encoder-decoder architectures with attention mechanism in unidirectional, bidirectional, and stacked recurrent neural network (RNN) models. Our models generate one-sentence biographies and dining venue descriptions using a crowdsourced dataset where all field value pairs that appear in meaning representations are fully captured in reference sentences. To support this work, we also explore the performances of our models on a more challenging dataset, where the content of a meaning representation is too large to fit into a single sentence, and hence content selection and surface realization need to be learned jointly. This dataset is retrieved by coupling introductory sentences of person-related Turkish Wikipedia articles with their contained infobox tables. Our empirical experiments on both datasets demonstrate that Seq2Seq models are capable of generating coherent and fluent biographies and venue descriptions from field value pairs. We argue that the wealth of knowledge residing in our datasets and the insights obtained fromthis study hold the potential to give rise to the development of new end-to-end generation approaches for Turkish and other morphologically rich languages.
dc.identifier.citation Demir, S. (2023). Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural Networks. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(2), 1-27.
dc.identifier.doi 10.1145/3543826
dc.identifier.issn 2375-4699
dc.identifier.issn 2375-4702
dc.identifier.scopus 2-s2.0-85152906599
dc.identifier.uri https://hdl.handle.net/20.500.11779/1985
dc.identifier.uri https://doi.org/10.1145/3543826
dc.language.iso en
dc.publisher Assoc Computing Machinery
dc.relation.ispartof ACM Transactions on Asian and Low-Resource Language Information Processing
dc.rights info:eu-repo/semantics/closedAccess
dc.subject Of-the-art
dc.subject Sequence-to-sequence model
dc.subject Turkish
dc.subject Wikipedia
dc.subject Natural-language generation
dc.subject Data-to-text generation
dc.title Turkish Data-To Generation Using Sequence-To Neural Networks
dc.type Article
dspace.entity.type Publication
gdc.author.institutional Demir, Şeniz
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.description.department Mühendislik Fakültesi, Elektrik Elektronik Mühendisligi Bölümü
gdc.description.endpage 27
gdc.description.issue 2
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
gdc.description.scopusquality Q2
gdc.description.startpage 1
gdc.description.volume 22
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q3
gdc.identifier.openalex W4284959267
gdc.identifier.wos WOS:000963394900006
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 2.0
gdc.oaire.influence 2.6678442E-9
gdc.oaire.isgreen true
gdc.oaire.popularity 3.1228258E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.openalex.collaboration National
gdc.openalex.fwci 0.78319528
gdc.openalex.normalizedpercentile 0.7
gdc.opencitations.count 2
gdc.plumx.crossrefcites 2
gdc.plumx.mendeley 15
gdc.plumx.scopuscites 4
gdc.publishedmonth Aralık
gdc.relation.journal Acm Transactions on Asian and Low-Resource Language Information Processing
gdc.scopus.citedcount 4
gdc.virtual.author Demir, Şeniz
gdc.wos.citedcount 1
gdc.wos.collaboration Uluslararası işbirliği ile yapılmayan - HAYIR
gdc.wos.documenttype article
gdc.wos.indexdate 2023
gdc.wos.publishedmonth Aralık
gdc.yokperiod YÖK - 2023-24
relation.isAuthorOfPublication 93fa0200-13f7-446a-bdc2-118401cab062
relation.isAuthorOfPublication.latestForDiscovery 93fa0200-13f7-446a-bdc2-118401cab062
relation.isOrgUnitOfPublication 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3
relation.isOrgUnitOfPublication 0d54cd31-4133-46d5-b5cc-280b2c077ac3
relation.isOrgUnitOfPublication a6e60d5c-b0c7-474a-b49b-284dc710c078
relation.isOrgUnitOfPublication.latestForDiscovery 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3

Files

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description: