A Benchmark Dataset for Turkish Data-To Generation
| dc.contributor.author | Demir, Şeniz | |
| dc.contributor.author | Öktem, Seza | |
| dc.date.accessioned | 2022-07-22T11:32:03Z | |
| dc.date.available | 2022-07-22T11:32:03Z | |
| dc.date.issued | 2022 | |
| dc.description.abstract | In the last decades, data-to-text (D2T) systems that directly learn from data have gained a lot of attention in natural language generation. These systems need data with high quality and large volume, but unfortunately some natural languages suffer from the lack of readily available generation datasets. This article describes our efforts to create a new Turkish dataset (Tr-D2T) that consists of meaning representation and reference sentence pairs without fine-grained word alignments. We utilize Turkish web resources and existing datasets in other languages for producing meaning representations and collect reference sentences by crowdsourcing native speakers. We particularly focus on the generation of single-sentence biographies and dining venue descriptions. In order to motivate future Turkish D2T studies, we present detailed benchmarking results of different sequence-to-sequence neural models trained on this dataset. To the best of our knowledge, this work is the first of its kind that provides preliminary findings and lessons learned from the creation of a new Turkish D2T dataset. Moreover, our work is the first extensive study that presents generation performances of transformer and recurrent neural network models from meaning representations in this morphologically-rich language. | |
| dc.description.sponsorship | TUBITAK-ARDEB, Turkey [117E977] | |
| dc.description.sponsorship | Artun Burak Mecik; Batuhan Bilgin; TUBITAK-ARDEB, (117E977) | |
| dc.description.sponsorship | This work is supported by TUBITAK-ARDEB, Turkey under the grant number 117E977 . The dataset is available for research purposes and non-commercial use. To obtain the dataset, you are required to send an email to the corresponding author, and agree to general terms and conditions for data usage according to TUBITAK Open Science Policy. The authors want to thank Uluc Furkan Vardar and Ilkay Tevfik Devran for implementing the XML parser and building input meaning representations, and Artun Burak Mecik, Batuhan Bilgin, and Volkan Ozer for delexicalizing the collected dataset. | |
| dc.description.sponsorship | Acknowledgments This work is supported by TUBITAK-ARDEB, Turkey under the grant number 117E977. The dataset is available for research purposes and non-commercial use. To obtain the dataset, you are required to send an email to the corresponding author, and agree to general terms and conditions for data usage according to TUBITAK Open Science Policy. The authors want to thank Uluc Furkan Vardar and Ilkay Tevfik Devran for implementing the XML parser and building input meaning representations, and Artun Burak Mecik, Batuhan Bilgin, and Volkan Ozer for delexicalizing the collected dataset. | |
| dc.identifier.citation | Demir, S., & Oktem, S. (16 July 2022). A benchmark dataset for Turkish data-to-text generation. Computer Speech & Language. pp.1-45. https://doi.org/10.1016/j.csl.2022.101433 | |
| dc.identifier.doi | 10.1016/j.csl.2022.101433 | |
| dc.identifier.issn | 0885-2308 | |
| dc.identifier.issn | 1095-8363 | |
| dc.identifier.scopus | 2-s2.0-85134849907 | |
| dc.identifier.uri | https://doi.org/10.1016/j.csl.2022.101433 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.11779/1807 | |
| dc.language.iso | en | |
| dc.publisher | Elsevier | |
| dc.relation.ispartof | Computer Speech & Language | |
| dc.rights | info:eu-repo/semantics/closedAccess | |
| dc.subject | Turkish | |
| dc.subject | Neural models | |
| dc.subject | Dining venue domain | |
| dc.subject | Biography domain | |
| dc.subject | Data-to-text generation | |
| dc.subject | Crowdsourcing | |
| dc.title | A Benchmark Dataset for Turkish Data-To Generation | |
| dc.type | Article | |
| dspace.entity.type | Publication | |
| gdc.author.id | Şeniz Demir / 0000-0003-4897-4616 | |
| gdc.author.id | Seza Öktem / 0000-0003-2885-7359 | |
| gdc.author.id | Demir, Şeniz/0000-0003-4897-4616 | |
| gdc.author.institutional | Demir, Şeniz | |
| gdc.author.institutional | Öktem, Seza | |
| gdc.author.scopusid | 57818047100 | |
| gdc.author.scopusid | 14044928200 | |
| gdc.author.wosid | Demir, Şeniz/AAB-5451-2021 | |
| gdc.bip.impulseclass | C5 | |
| gdc.bip.influenceclass | C5 | |
| gdc.bip.popularityclass | C5 | |
| gdc.coar.access | metadata only access | |
| gdc.coar.type | text::journal::journal article | |
| gdc.collaboration.industrial | false | |
| gdc.description.department | Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü | |
| gdc.description.departmenttemp | [Demir, Seniz] MEF Univ, Dept Comp Engn, Istanbul, Turkiye; [Oktem, Seza] MEF Univ, Dept English Language Teaching, Istanbul, Turkiye | |
| gdc.description.endpage | 45 | |
| gdc.description.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | |
| gdc.description.scopusquality | Q1 | |
| gdc.description.startpage | 1 | |
| gdc.description.volume | 77 | |
| gdc.description.woscitationindex | Science Citation Index Expanded | |
| gdc.description.wosquality | Q2 | |
| gdc.identifier.openalex | W4285606469 | |
| gdc.identifier.wos | WOS:000834597200001 | |
| gdc.index.type | WoS | |
| gdc.index.type | Scopus | |
| gdc.oaire.diamondjournal | false | |
| gdc.oaire.impulse | 2.0 | |
| gdc.oaire.influence | 2.6821256E-9 | |
| gdc.oaire.isgreen | true | |
| gdc.oaire.popularity | 3.469167E-9 | |
| gdc.oaire.publicfunded | false | |
| gdc.oaire.sciencefields | 0202 electrical engineering, electronic engineering, information engineering | |
| gdc.oaire.sciencefields | 02 engineering and technology | |
| gdc.openalex.collaboration | National | |
| gdc.openalex.fwci | 0.4138 | |
| gdc.openalex.normalizedpercentile | 0.68 | |
| gdc.opencitations.count | 2 | |
| gdc.plumx.mendeley | 12 | |
| gdc.plumx.newscount | 1 | |
| gdc.plumx.scopuscites | 2 | |
| gdc.publishedmonth | Temmuz | |
| gdc.relation.journal | Computer Speech & Language | |
| gdc.scopus.citedcount | 3 | |
| gdc.virtual.author | Demir, Şeniz | |
| gdc.wos.citedcount | 3 | |
| gdc.wos.collaboration | Uluslararası işbirliği ile yapılmayan - HAYIR | |
| gdc.wos.documenttype | Article | |
| gdc.wos.indexdate | 2022 | |
| gdc.wos.publishedmonth | Temmuz | |
| gdc.yokperiod | YÖK - 2021-22 | |
| relation.isAuthorOfPublication | 93fa0200-13f7-446a-bdc2-118401cab062 | |
| relation.isAuthorOfPublication.latestForDiscovery | 93fa0200-13f7-446a-bdc2-118401cab062 | |
| relation.isOrgUnitOfPublication | 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3 | |
| relation.isOrgUnitOfPublication | 0d54cd31-4133-46d5-b5cc-280b2c077ac3 | |
| relation.isOrgUnitOfPublication | a6e60d5c-b0c7-474a-b49b-284dc710c078 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3 |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- 1-s2.0-S0885230822000614-main.pdf
- Size:
- 1.46 MB
- Format:
- Adobe Portable Document Format
- Description:
- Full Text - Article
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.44 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
