Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11779/1794
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTopçu, Berkay-
dc.contributor.authorDemir, Şeniz-
dc.date.accessioned2022-07-01T06:31:34Z
dc.date.available2022-07-01T06:31:34Z
dc.date.issued2022-
dc.identifier.citationDemir, S., & Topcu, B. (June 2022). Graph-based Turkish text normalization and its impact on noisy text processing. Engineering Science and Technology, an International Journal. pp.1-13. https://doi.org/10.1016/j.jestch.2022.101192en_US
dc.identifier.issn2215-0986-
dc.identifier.urihttps://doi.org/10.1016/j.jestch.2022.101192-
dc.identifier.urihttps://hdl.handle.net/20.500.11779/1794-
dc.description.abstractUser generated texts on the web are freely-available and lucrative sources of data for language technology researchers. Unfortunately, these texts are often dominated by informal writing styles and the language used in user generated content poses processing difficulties for natural language tools. Experienced performance drops and processing issues can be addressed either by adapting language tools to user generated content or by normalizing noisy texts before being processed. In this article, we propose a Turkish text normalizer that maps non-standard words to their appropriate standard forms using a graph-based methodology and a context-tailoring approach. Our normalizer benefits from both contextual and lexical similarities between normalization pairs as identified by a graph-based subnormalizer and a transformation-based subnormalizer. The performance of our normalizer is demonstrated on a tweet dataset in the most comprehensive intrinsic and extrinsic evaluations reported so far for Turkish. In this article, we present the first graph-based solution to Turkish text normalization with a novel context-tailoring approach, which advances the state-of-the-art results by outperforming other publicly available normalizers. For the first time in the literature, we measure the extent to which the accuracy of a Turkish language processing tool is affected by normalizing noisy texts before being processed. An analysis of these extrinsic evaluations that focus on more than one Turkish NLP task (i.e., part-of-speech tagger and dependency parser) reveals that Turkish language tools are not robust to noisy texts and a normalizer leads to remarkable performance improvements once used as a preprocessing tool in this morphologically-rich language.en_US
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectNoisy texten_US
dc.subjectGraph-based representationen_US
dc.subjectTurkishen_US
dc.subjectText normalizationen_US
dc.titleGraph-Based Turkish Text Normalization and Its Impact on Noisy Text Processingen_US
dc.typeArticleen_US
dc.identifier.doi10.1016/j.jestch.2022.101192-
dc.identifier.scopus2-s2.0-85135925837en_US
dc.authoridŞeniz Demir / 0000-0003-4897-4616-
dc.description.PublishedMonthHaziranen_US
dc.description.woscitationindexScience Citation Index Expanded-
dc.identifier.wosqualityQ1-
dc.identifier.scopusqualityQ1-
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.identifier.endpage13en_US
dc.identifier.startpage1en_US
dc.departmentMühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.relation.journalEngineering Science and Technology, an International Journalen_US
dc.identifier.wosWOS:000892526300014en_US
dc.institutionauthorDemir, Şeniz-
item.grantfulltextopen-
item.fulltextWith Fulltext-
item.languageiso639-1en-
item.openairetypeArticle-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
crisitem.author.dept02.02. Department of Computer Engineering-
Appears in Collections:Bilgisayar Mühendisliği Bölümü Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Files in This Item:
File Description SizeFormat 
1-s2.0-S221509862200101X-main.pdfFull Text - Article1.06 MBAdobe PDFThumbnail
View/Open
Show simple item record



CORE Recommender

SCOPUSTM   
Citations

5
checked on Nov 16, 2024

WEB OF SCIENCETM
Citations

2
checked on Nov 16, 2024

Page view(s)

54
checked on Nov 18, 2024

Download(s)

24
checked on Nov 18, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.