Elektrik Elektronik Mühendisliği Bölümü Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.11779/1941

Browse

Now showing 1 - 5 of 5

Citation - WoS: 1
Citation - Scopus: 1
Domain Adaptation Approaches for Acoustic Modeling
(IEEE, 2020) Arısoy, Ebru; Fakhan, Enver
In the recent years, with the development of neural network based models, ASR systems have achieved a tremendous performance increase. However, this performance increase mostly depends on the amount of training data and the computational power. In a low-resource data scenario, publicly available datasets can be utilized to overcome data scarcity. Furthermore, using a pre-trained model and adapting it to the in-domain data can help with computational constraint. In this paper we have leveraged two different publicly available datasets and investigate various acoustic model adaptation approaches. We show that 4% word error rate can be achieved using a very limited in-domain data.
Highlighting of Lecture Video Closed Captions
(IEEE, 2020) Yıldırım, Göktuğ; Öztufan, Huseyin Efe; Arısoy, Ebru
The main purpose of this study is to automatically highlight important regions of lecture video subtitles. Even though watching videos is an effective way of learning, the main disadvantage of video-based education is limited interaction between the learner and the video. With the developed system, important regions that are automatically determined in lecture subtitles will be highlighted with the aim of increasing the learner's attention to these regions. In this paper first the lecture videos are converted into text by using an automatic speech recognition system. Then continuous space representations for sentences or word sequences in the transcriptions are generated using Bidirectional Encoder Representations from Transformers (BERT). Important regions of the subtitles are selected using a clustering method based on the similarity of these representations. The developed system is applied to the lecture videos and it is found that using word sequence representations in determining the important regions of subtitles gives higher performance than using sentence representations. This result is encouraging in terms of automatic highlighting of speech recognition outputs where sentence boundaries are not defined explicitly.
Impact of Hardware Sources on Feature Selection for Online Signature Verification
(IEEE, 2020) Ayhan, Tuba; Orak, Remzi
This work analyzes time series features gathered from a touchpad which is a part of online signature verification system. A DTW processing unit is implemented on FPGA to be used in time series analysis. To support different feature groups, this unit can be reconfigured without altering the memory structure. By using this reconfigurable unit, features are evaluated according to the area cost that they introduce. Moreover, a method to predict the value of features for classification is introduced. This way, minimum requirements to implement an online signature verification system on FPGA are partially obtained.
Citation - WoS: 1
Citation - Scopus: 2
Improving the Usage of Subword-Based Units for Turkish Speech Recognition
(IEEE, 2020) Çetinkaya, Gözde; Saraçlar, Murat; Arısoy, Ebru
Subword units are often utilized to achieve better performance in speech recognition because of the high number of observed words in agglutinative languages. In this study, the proper use of subword units is explored in recognition by a reconsideration of details such as silence modeling and position-dependent phones. A modified lexicon by finite-state transducers is implemented to represent the subword units correctly. Also, we experiment with different types of word boundary markers and achieve the best performance by adding a marker both to the left and right side of a subword unit. In our experiments on a Turkish broadcast news dataset, the subword models do outperform word-based models and naive subword implementations. Results show that using proper subword units leads to a relative word error rate (WER) reductions, which is 2.4%, compared with the word level automatic speech recognition (ASR) system for Turkish.
Joint Source Separation and Classiﬁcation Using Variational Autoencoders
(IEEE, 2020) Karamatlı, Ertuğ; Kırbız, Serap; Hızlı, Çağlar
In this paper, we propose a novel multi-task variational auto encoder (VAE) based approach for joint source separation and classification. The network uses a probabilistic encoder for each sources to map the input data to latent space. The latent representation is then used by a probabilistic decoder for the two tasks: source separation and source classification. Throughout a variety of experiments performed on various image and audio datasets, source separation performance of our method is as good as the method that performs source separation under source class supervision. In addition, the proposed method does not require the class labels and can predict the labels.

Browse

Browsing Elektrik Elektronik Mühendisliği Bölümü Koleksiyonu by Journal "2020 28th Signal Processing and Communications Applications Conference (SIU)"