Bilgisayar Mühendisliği Bölümü Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.11779/1940

Browse

Search Results

Now showing 1 - 4 of 4
  • Article
    Ön Eğitimli Dil Modelleriyle Duygu Analizi
    (İstanbul Sabahattin Zaim Üniversitesi Fen Bilimleri Enstitüsü, 2023) Demir, Şeniz; Demir, Şeniz; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    Duygu analizi, çeşitli platformlarda bir konu hakkında düşünce, duygu ya da tutumu irdelemek, analiz etmek ve yorumlamak amacıyla kullanılan yöntemlerden biridir. Farklı konulardaki metinlerin öznel içeriklerine göre sınıflandırılabildiği duygu analizinde makine öğrenmesi ve derin öğrenme modellerinden sıklıkla faydalanılmaktadır.Bu çalışmada, önceden eğitilmiş dil modellerinden yararlanılarak Covid-19 tweet metinleri üzerinde duygu analizi yapılmıştır. Naive Bayes sınıflandırıcıya ek olarak BERT, RoBERTa ve BERTweet dil modelleri kullanılarak farklı sınıflandırıcılar eğitilmiş ve tweet veri kümesi üzerinde elde edilen sonuçlar kıyaslanmıştır. Bildiride aktarılan çalışmanın ileride bu alanda yürütülecek araştırmalara bir zemin oluşturacağı öngörülmektedir.
  • Article
    Neural Coreference Resolution for Turkish
    (2023) Demir, Şeniz; Demir, Şeniz; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    Coreference resolution deals with resolving mentions of the same underlying entity in a given text. This challenging task is an indispensable aspect of text understanding and has important applications in various language processing systems such as question answering and machine translation. Although a significant amount of studies is devoted to coreference resolution, the research on Turkish is scarce and mostly limited to pronoun resolution. To our best knowledge, this article presents the first neural Turkish coreference resolution study where two learning-based models are explored. Both models follow the mention-ranking approach while forming clusters of mentions. The first model uses a set of hand-crafted features whereas the second coreference model relies on embeddings learned from large-scale pre-trained language models for capturing similarities between a mention and its candidate antecedents. Several language models trained specifically for Turkish are used to obtain mention representations and their effectiveness is compared in conducted experiments using automatic metrics. We argue that the results of this study shed light on the possible contributions of neural architectures to Turkish coreference resolution.
  • Article
    Citation - WoS: 7
    Citation - Scopus: 12
    Graph-Based Turkish Text Normalization and Its Impact on Noisy Text Processing
    (Elsevier, 2022) Topçu, Berkay; Demir, Şeniz; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    User generated texts on the web are freely-available and lucrative sources of data for language technology researchers. Unfortunately, these texts are often dominated by informal writing styles and the language used in user generated content poses processing difficulties for natural language tools. Experienced performance drops and processing issues can be addressed either by adapting language tools to user generated content or by normalizing noisy texts before being processed. In this article, we propose a Turkish text normalizer that maps non-standard words to their appropriate standard forms using a graph-based methodology and a context-tailoring approach. Our normalizer benefits from both contextual and lexical similarities between normalization pairs as identified by a graph-based subnormalizer and a transformation-based subnormalizer. The performance of our normalizer is demonstrated on a tweet dataset in the most comprehensive intrinsic and extrinsic evaluations reported so far for Turkish. In this article, we present the first graph-based solution to Turkish text normalization with a novel context-tailoring approach, which advances the state-of-the-art results by outperforming other publicly available normalizers. For the first time in the literature, we measure the extent to which the accuracy of a Turkish language processing tool is affected by normalizing noisy texts before being processed. An analysis of these extrinsic evaluations that focus on more than one Turkish NLP task (i.e., part-of-speech tagger and dependency parser) reveals that Turkish language tools are not robust to noisy texts and a normalizer leads to remarkable performance improvements once used as a preprocessing tool in this morphologically-rich language.
  • Article
    Citation - WoS: 19
    Citation - Scopus: 28
    An Evaluation of Recent Neural Sequence Tagging Models in Turkish Named Entity Recognition
    (Elsevier, 2021) Makaroğlu, Didem; Demir, Şeniz; Demir, Şeniz; Aras, Gizem; Çakır, Altan; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    Named entity recognition (NER) is an extensively studied task that extracts and classifies named entities in a text. NER is crucial not only in downstream language processing applications such as relation extraction and question answering but also in large scale big data operations such as real-time analysis of online digital media content. Recent research efforts on Turkish, a less studied language with morphologically rich nature, have demonstrated the effectiveness of neural architectures on well-formed texts and yielded state-of-the art results by formulating the task as a sequence tagging problem. In this work, we empirically investigate the use of recent neural architectures (Bidirectional long short-term memory (BiLSTM) and Transformer-based networks) proposed for Turkish NER tagging in the same setting. Our results demonstrate that transformer-based networks which can model long-range context overcome the limitations of BiLSTM networks where different input features at the character, subword, and word levels are utilized. We also propose a transformer-based network with a conditional random field (CRF) layer that leads to the state-of-the-art result (95.95% f-measure) on a common dataset. Our study contributes to the literature that quantifies the impact of transfer learning on processing morphologically rich languages.