Bilgisayar Mühendisliği Bölümü Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.11779/1940

Browse

Search Results

Now showing 1 - 8 of 8
  • Conference Object
    Citation - WoS: 3
    Citation - Scopus: 3
    Detecting Autism From Head Movements Using Kinesics
    (Assoc Computing Machinery, 2024) Gokmen, Muhittin; Gökmen, Muhittin; Yankowitz, Lisa; Zampella, Casey J.; Schultz, Robert T.; Tunc, Birkan; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    Head movements play a crucial role in social interactions. The quantification of communicative movements such as nodding, shaking, orienting, and backchanneling is significant in behavioral and mental health research. However, automated localization of such head movements within videos remains challenging in computer vision due to their arbitrary start and end times, durations, and frequencies. In this work, we introduce a novel and efficient coding system for head movements, grounded in Birdwhistell's kinesics theory, to automatically identify basic head motion units such as nodding and shaking. Our approach first defines the smallest unit of head movement, termed kine, based on the anatomical constraints of the neck and head. We then quantify the location, magnitude, and duration of kines within each angular component of head movement. Through defining possible combinations of identified kines, we define a higher-level construct, kineme, which corresponds to basic head motion units such as nodding and shaking. We validate the proposed framework by predicting autism spectrum disorder (ASD) diagnosis from video recordings of interacting partners. We show that the multi-scale property of the proposed framework provides a significant advantage, as collapsing behavior across temporal scales reduces performance consistently. Finally, we incorporate another fundamental behavioral modality, namely speech, and show that distinguishing between speaking- and listening-time head movements significantly improves ASD classification performance.
  • Conference Object
    Citation - WoS: 1
    Citation - Scopus: 1
    Face Recognition With Local Zernike Moments Features Around Landmarks
    (IEEE, 2016) Gökmen, Muhittin; Gökmen, Muhittin; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    In this paper, a new method that extracts the features from the complex Local Zernike Moments (LZM) images around facial landmarks is proposed. In this method, multiple grids which are in different sizes are located on landmarks and Phase-Magnitude (PM) histograms are calculated in each cells of these grids. The PM histograms are calculated for every component of LZM and the feature vectors are created by concatenating these histograms. By reducing the dimensionality of these vectors using Whitened Principle Component Analysis, more robust descriptors are constructed. It is shown that the state-of-the-art results are obtained in the experiments performed on FERET database using the proposed method. © 2016 IEEE.
  • Article
    Citation - WoS: 30
    Citation - Scopus: 44
    An Efficient Framework for Visible-Infrared Cross Modality Person Re-Identification
    (Elsevier, 2020) Gökmen, Muhittin; Gökmen, Muhittin; Başaran, Emrah; Kamasak, Mustafa E.; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    Visible-infrared cross-modality person re-identification (VI-ReId) is an essential task for video surveillance in poorly illuminated or dark environments. Despite many recent studies on person re-identification in the visible domain (ReId), there are few studies dealing specifically with VI-ReId. Besides challenges that are common for both ReId and VI-ReId such as pose/illumination variations, background clutter and occlusion, VI-ReId has additional challenges as color information is not available in infrared images. As a result, the performance of VI-ReId systems is typically lower than that of ReId systems. In this work, we propose a four-stream framework to improve VI-ReId performance. We train a separate deep convolutional neural network in each stream using different representations of input images. We expect that different and complementary features can be learned from each stream. In our framework, grayscale and infrared input images are used to train the ResNet in the first stream. In the second stream, RGB and three-channel infrared images (created by repeating the infrared channel) are used. In the remaining two streams, we use local pattern maps as input images. These maps are generated utilizing local Zernike moments transformation. Local pattern maps are obtained from grayscale and infrared images in the third stream and from RGB and three-channel infrared images in the last stream. We improve the performance of the proposed framework by employing a re-ranking algorithm for post-processing. Our results indicate that the proposed framework outperforms current state-of-the-art with a large margin by improving Rank-1/mAP by 29.79%/30.91% on SYSU-MM01 dataset, and by 9.73%/16.36% on RegDB dataset.
  • Research Project
    Özyinelemeli Sinir Ağları ile Türkçe Doğal Dil Üretimi
    (TÜBİTAK, 2018) Demir, Şeniz; Gökmen, Muhittin; Gökmen, Muhittin; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    İnsanlar arasındaki iletişimi sağlayan doğal diller, zaman içinde insanlarla etkin ve kullanıcı dostu etkileşim kurabilmek amacıyla sistemler ve yazılımlar tarafından kullanılmaya başlanmıştır. Tıpkı insanlar gibi sesli veya yazılı doğal dil ifadelerini anlayabilen ve sonrasında kullanıcıların beklentilerini karşılayabilen dil tabanlı teknolojiler (örn. arama motorları, bilgisayar destekli eğitici sistemler ve diyalog sistemleri) bu motivasyonla ortaya çıkmıştır. Bu çalışmalarda, problemin doğası ve hedef dilin yapısındaki zorluklara ek olarak insanların doğal dilleri nasıl öğrendiğini ve kullandığını modellemedeki kısıtlar başarım oranlarını etkilemiştir. Günümüzde, dil tabanlı teknolojiler insanlar tarafından yaygın şekilde kullanılıyor olsalar da (örn. Google Arama Motoru ve Apple Siri), ulaşılan teknolojik seviye hedef dile göre çeşitlilik göstermektedir. Sondan eklemeli ve zengin dil yapısı ile Türkçe geliştirilen teknolojik çözümler ve üretilen veri kaynakları açısından pek çok doğal dilin gerisinde kalmaktadır. Ayrıca, bugüne kadar Türkçe dil teknolojileri konusunda yapılan çalışmaların ağırlıklı olarak dili işleme, anlama ve analiz etmeye dönük (örn. kelimelerin morfolojik analizi, özel isim tespiti, bağlılık çözümlemesi, metin sınıflandırma ve metin özetleme) olduğu gözlemlenmektedir. Türkçe dil üretimi konusunda sınırlı yeteneklere sahip ve akademik seviyede kalarak devamı getirilmemiş birkaç çalışma mevcuttur. Fakat bu çalışmalar karmaşık sayılabilecek dilbilimi teorileri ile ifade edilen içerik ifadelerini cümlelere dönüştürmekten öteye geçmemiştir ve başka uygulamalarla entegre olarak test edilmemiştir. Bu çalışmada, Türkçe dilinin derin öğrenme tabanlı bir sistem (dil aracı) ile otomatik olarak üretimi hedeflenmektedir. Bu sistemin, girdi olarak verilen içerik ifadelerini Türkçe dili kurallarına uygun ve anlaşılır cümlelere dönüştüreceği öngörülmektedir. Literatürdeki en kapsamlı Türkçe dil üretimi sistemi olması planlanan bu çalışmada son yıllarda pek çok dil teknolojisinde başarımı ispat edilmiş diziden diziye öğrenebilen (örn. kelime dizisinden başka bir kelime dizisi) özyinelemeli sinir ağı yapıları kullanılacaktır. Bu ağların sağladığı dinamiklik ile farklı çeşitler (örn. uzun kısa süreli bellek ve girişli özyinelemeli birim) ve genişlemeler (örn. dikkat mekanizması) denenecektir ve başarımı en yüksek sinir ağı mimarisi belirlenecektir. Buna ek olarak, sinir ağlarının kullanımı bazı faktörlerin (örn. bağlam bilgisi ve kullanıcı tercihleri) sisteme entegrasyonuna ve üretim aşamasına olan etkilerinin incelenmesine imkân sağlayacaktır.
  • Conference Object
    Citation - WoS: 2
    Citation - Scopus: 2
    Facial Expression Recognition From Still Images
    (Springer International Publishing AG, 2017) Gökmen, Muhittin; Gökmen, Muhittin; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    With the development of technology, Facial Expression Recognition (FER) become one of the important research areas in Human Computer Interaction. Changes in the movement of some muscles in face create the facial expressions. By defining these changes, facial expressions can be recognized. In this study, a cascaded structure consists of Local Zernike Moments (LZM), Local XOR Patterns (LXP) and Global Zernike Moments (GZM) methods is proposed for the FER problem. The generally used database is the Extended Chon - Kanade (CK +) in FER problems. The database consists of image sequences of 327 expressions of 118 people. Most FER system includes recognition of 7 classes of emotions happiness, sadness, surprise, anger, disgust, fear and contempt, and we use Library of Support Vector Machines (LIBSVM) classifier for multi class classification with the leave one out cross-validation method. Our overall system performance is measured as 90.34% for FER.
  • Article
    Citation - WoS: 13
    Citation - Scopus: 16
    Face Recognition With Patch-Based Local Walsh Transform
    (Elsevier, 2018) Uzun-Per, Meryem; Gökmen, Muhittin; Gökmen, Muhittin; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    In this paper, we present a novel dense local image representation method called Local Walsh Transform (LWT)by applying the well-known Walsh Transform (WT) to each pixel of an image. The LWT decomposes an image into multiple components, and produces LWT complex images by using the symmetrical relationship between them. Cascaded LWT (CLWT) is also a dense local image representation obtained by applying the LWT again to real and imaginary parts of LWT complex images. Applying the LWT once more to real and imaginary parts of LWT complex images increases the success rate especially on low resolution images. In order to combine the advantages of sparse and dense local image representations, we present Patch-based LWT (PLWT) and Patch-based CLWT (PCLWT) by applying the LWT and CLWT, respectively, to patches extracted around landmarks of multi-scaled face images. The extracted high dimensional features of the patches are reduced through the application of the Whitened Principal Component Analysis (WPCA). Experimental results show that both thePLWT and PCLWT are robust to illumination and expression changes, occlusion and low resolution. The state-of-the-art performance is achieved on the FERET and SCface databases, and the second best unsupervised category result is achieved on the LFW database.
  • Conference Object
    Citation - WoS: 549
    Citation - Scopus: 667
    Human Semantic Parsing for Person Re-Identification
    (IEEE, 2018) Kalayeh, Mahdi M; Gökmen, Muhittin; Shah, Mubarak; Kamasak, Mustafa E; Gökmen, Muhittin; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    Person re-identification is a challenging task mainly dueto factors such as background clutter, pose, illuminationand camera point of view variations. These elements hinder the process of extracting robust and discriminative representations, hence preventing different identities from being successfully distinguished. To improve the representation learning, usually local features from human body partsare extracted. However, the common practice for such aprocess has been based on bounding box part detection.In this paper, we propose to adopt human semantic parsing which, due to its pixel-level accuracy and capabilityof modeling arbitrary contours, is naturally a better alternative. Our proposed SPReID integrates human semanticparsing in person re-identification and not only considerably outperforms its counter baseline, but achieves stateof-the-art performance. We also show that, by employinga simple yet effective training strategy, standard populardeep convolutional architectures such as Inception-V3 andResNet-152, with no modification, while operating solelyon full image, can dramatically outperform current stateof-the-art. Our proposed methods improve state-of-the-artperson re-identification on: Market-1501 [48] by ~17% inmAP and ~6% in rank-1, CUHK03 [24] by ~4% in rank-1and DukeMTMC-reID [50] by ~24% in mAP and ~10% inrank-1.
  • Article
    Citation - WoS: 9
    Citation - Scopus: 11
    An Efficient Multiscale Scheme Using Local Zernike Moments for Face Recognition
    (MDPI, 2018) Gökmen, Muhittin; Gökmen, Muhittin; Kamasak, Mustafa E.; 02.02. Department of Computer Engineering; 02. Faculty of Engineering; 01. MEF University
    In this study, we propose a face recognition scheme using local Zernike moments (LZM), which can be used for both identification and verification. In this scheme, local patches around the landmarks are extracted from the complex components obtained by LZM transformation. Then, phase magnitude histograms are constructed within these patches to create descriptors for face images. An image pyramid is utilized to extract features at multiple scales, and the descriptors are constructed for each image in this pyramid. We used three different public datasets to examine the performance of the proposed method:Face Recognition Technology (FERET), Labeled Faces in the Wild (LFW), and Surveillance Cameras Face (SCface). The results revealed that the proposed method is robust against variations such as illumination, facial expression, and pose. Aside from this, it can be used for low-resolution face images acquired in uncontrolled environments or in the infrared spectrum. Experimental results show that our method outperforms state-of-the-art methods on FERET and SCface datasets.