Endüstri Mühendisliği Bölümü Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.11779/1942

Browse

Now showing 1 - 4 of 4

Dialogue Enhancement Using Kernel Additive Modelling
(Institute of Electrical and Electronics Engineers Inc., 2015) Liutkus, A.; Kırbız, Serap; Cemgil, A. Taylan
It is a major problem for the sound engineers to find the right balance between the dialogue signals and the ambient sources. This problem also makes one of the main causes of the audience concerns. The audience wants to arrange the sound balance based on their personal preferences, listening environment and their hearing. In this work, a method is proposed for enhancing the dialogue signals in stereo recordings that consist of more than one source. The kernel additive modelling that has been used successfully in sound source separation is used to extract the dialogues and the ambient sources from the movie sounds. The separated dialogue and ambient sources can later be upmixed by the user to make a personal mix. The separation performance of the proposed method is evaluated on the sounds generated by mixing the sources which were taken from the only dialogue and only music parts of the movies. It has been shown that the Kernel Additive Modelling (KAM) based method can be successfully used for dialogue enhancement. © 2015 IEEE.
Citation - WoS: 1
Facial Emotion Recognition Using Residual Neural Networks
(Aves, 2024) Kırbız, Serap
Facial emotion recognition (FER) has been an emerging research topic in recent years. Recent automatic FER systems generally apply deep learning methods and focus on two important issues: lack of sufficient labeled training data and variations in images such as illumination, pose, or expression-related variations among different cultures. Although Convolutional Neural Networks (CNNs) are widely used in automatic FER, they cannot be used when the number of layers is large. Therefore, a residual technique is applied to CNNs and this architecture is named residual neural network. In this paper, an automatic facial emotion recognition method using residual networks with random data augmentation is proposed on a merged FER dataset consisting of 41,598 facial images of size 48 × 48 pixels from seven basic emotion classes. Experimental results show that ResNet34 with data augmentation performs better than CNN with a classification accuracy of 81%.
Citation - WoS: 9
Citation - Scopus: 14
Mixcycle: Unsupervised Speech Separation Via Cyclic Mixture Permutation Invariant Training
(IEEE, 2022) Karamatlı, Ertuğ; Kırbız, Serap
We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the supervised baseline (PIT-DM) while circumventing the over-separation issue of MixIT. Also, we propose a self-evaluation technique inspired by MixCycle that estimates model performance without utilizing any reference sources. We show that it yields results consistent with an evaluation on reference sources (LibriMix) and also with an informal listening test conducted on a real-life mixtures dataset (REAL-M).
Citation - WoS: 4
Citation - Scopus: 4
Perceptual Coding-Based Informed Source Separation
(IEEE, 2014) Girin, Laurent; Kırbız, Serap; Ozerov, Alexey; Liutkus, Antoine
Informed Source Separation (ISS) techniques enable manipulation of the source signals that compose an audio mixture, based on a coder-decoder configuration. Provided the source signals are known at the encoder, a low-bitrate side-information is sent to the decoder and permits to achieve efficient source separation. Recent research has focused on a Coding-based ISS framework, which has an advantage to encode the desired audio objects, while exploiting their mixture in an information-theoretic framework. Here, we show how the perceptual quality of the separated sources can be improved by inserting perceptual source coding techniques in this framework, achieving a continuum of optimal bitrate-perceptual distortion trade-offs.

Browse

Browsing Endüstri Mühendisliği Bölümü Koleksiyonu by Institution Author "Kırbız, Serap"