A Benchmark Dataset for Turkish Data-To Generation
Loading...
Date
2022
Authors
Demir, Şeniz
Journal Title
Journal ISSN
Volume Title
Publisher
Elsevier
Open Access Color
Green Open Access
Yes
OpenAIRE Downloads
OpenAIRE Views
Publicly Funded
No
Abstract
In the last decades, data-to-text (D2T) systems that directly learn from data have gained a lot of attention in natural language generation. These systems need data with high quality and large volume, but unfortunately some natural languages suffer from the lack of readily available generation datasets. This article describes our efforts to create a new Turkish dataset (Tr-D2T) that consists of meaning representation and reference sentence pairs without fine-grained word alignments. We utilize Turkish web resources and existing datasets in other languages for producing meaning representations and collect reference sentences by crowdsourcing native speakers. We particularly focus on the generation of single-sentence biographies and dining venue descriptions. In order to motivate future Turkish D2T studies, we present detailed benchmarking results of different sequence-to-sequence neural models trained on this dataset. To the best of our knowledge, this work is the first of its kind that provides preliminary findings and lessons learned from the creation of a new Turkish D2T dataset. Moreover, our work is the first extensive study that presents generation performances of transformer and recurrent neural network models from meaning representations in this morphologically-rich language.
Description
Keywords
Turkish, Neural models, Dining venue domain, Biography domain, Data-to-text generation, Crowdsourcing
Turkish CoHE Thesis Center URL
Fields of Science
0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology
Citation
Demir, S., & Oktem, S. (16 July 2022). A benchmark dataset for Turkish data-to-text generation. Computer Speech & Language. pp.1-45. https://doi.org/10.1016/j.csl.2022.101433
WoS Q
Q2
Scopus Q
Q1

OpenCitations Citation Count
1
Source
Computer Speech & Language
Volume
77
Issue
Start Page
1
End Page
45
PlumX Metrics
Citations
Scopus : 2
Captures
Mendeley Readers : 12
SCOPUS™ Citations
2
checked on Feb 03, 2026
Web of Science™ Citations
3
checked on Feb 03, 2026
Page Views
247
checked on Feb 03, 2026
Downloads
30
checked on Feb 03, 2026
Google Scholar™

OpenAlex FWCI
0.58739646
Sustainable Development Goals
4
QUALITY EDUCATION

5
GENDER EQUALITY

8
DECENT WORK AND ECONOMIC GROWTH

10
REDUCED INEQUALITIES

16
PEACE, JUSTICE AND STRONG INSTITUTIONS


