A Benchmark Dataset for Turkish Data-To Generation

Loading...
Thumbnail Image

Date

2022

Authors

Demir, Şeniz

Journal Title

Journal ISSN

Volume Title

Publisher

Elsevier

Open Access Color

Green Open Access

Yes

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

In the last decades, data-to-text (D2T) systems that directly learn from data have gained a lot of attention in natural language generation. These systems need data with high quality and large volume, but unfortunately some natural languages suffer from the lack of readily available generation datasets. This article describes our efforts to create a new Turkish dataset (Tr-D2T) that consists of meaning representation and reference sentence pairs without fine-grained word alignments. We utilize Turkish web resources and existing datasets in other languages for producing meaning representations and collect reference sentences by crowdsourcing native speakers. We particularly focus on the generation of single-sentence biographies and dining venue descriptions. In order to motivate future Turkish D2T studies, we present detailed benchmarking results of different sequence-to-sequence neural models trained on this dataset. To the best of our knowledge, this work is the first of its kind that provides preliminary findings and lessons learned from the creation of a new Turkish D2T dataset. Moreover, our work is the first extensive study that presents generation performances of transformer and recurrent neural network models from meaning representations in this morphologically-rich language.

Description

Keywords

Turkish, Neural models, Dining venue domain, Biography domain, Data-to-text generation, Crowdsourcing

Turkish CoHE Thesis Center URL

Fields of Science

0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

Citation

Demir, S., & Oktem, S. (16 July 2022). A benchmark dataset for Turkish data-to-text generation. Computer Speech & Language. pp.1-45. https://doi.org/10.1016/j.csl.2022.101433

WoS Q

Q2

Scopus Q

Q1
OpenCitations Logo
OpenCitations Citation Count
1

Source

Computer Speech & Language

Volume

77

Issue

Start Page

1

End Page

45
PlumX Metrics
Citations

Scopus : 2

Captures

Mendeley Readers : 12

SCOPUS™ Citations

2

checked on Feb 03, 2026

Web of Science™ Citations

3

checked on Feb 03, 2026

Page Views

247

checked on Feb 03, 2026

Downloads

30

checked on Feb 03, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.58739646

Sustainable Development Goals

4

QUALITY EDUCATION
QUALITY EDUCATION Logo

5

GENDER EQUALITY
GENDER EQUALITY Logo

8

DECENT WORK AND ECONOMIC GROWTH
DECENT WORK AND ECONOMIC GROWTH Logo

10

REDUCED INEQUALITIES
REDUCED INEQUALITIES Logo

16

PEACE, JUSTICE AND STRONG INSTITUTIONS
PEACE, JUSTICE AND STRONG INSTITUTIONS Logo