Mixcycle: Unsupervised Speech Separation Via Cyclic Mixture Permutation Invariant Training
Loading...
Date
2022
Authors
Kırbız, Serap
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Open Access Color
Green Open Access
Yes
OpenAIRE Downloads
OpenAIRE Views
Publicly Funded
No
Abstract
We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the supervised baseline (PIT-DM) while circumventing the over-separation issue of MixIT. Also, we propose a self-evaluation technique inspired by MixCycle that estimates model performance without utilizing any reference sources. We show that it yields results consistent with an evaluation on reference sources (LibriMix) and also with an informal listening test conducted on a real-life mixtures dataset (REAL-M).
Description
Keywords
Self-supervised learning, Time-domain analysis, Unsupervised learning, Training, Source separation, Optimized production technology, Recording, Blind source separation, Deep learning, Task analysis, Signal Processing (eess.SP), FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), deep learning, unsupervised learning, Unsupervised learning, Computer Science - Sound, Machine Learning (cs.LG), Time-domain analysis, Audio and Speech Processing (eess.AS), Recording, Task analysis, self-supervised learning, FOS: Electrical engineering, electronic engineering, information engineering, Training, Blind source separation, Source separation, Electrical Engineering and Systems Science - Signal Processing, Optimized production technology, Electrical Engineering and Systems Science - Audio and Speech Processing
Turkish CoHE Thesis Center URL
Fields of Science
02 engineering and technology, 0202 electrical engineering, electronic engineering, information engineering
Citation
Karamatlı, E., & Kırbız, S. (2022). MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training. IEEE Signal Processing Letters, 29, 2637-2641.
WoS Q
Q2
Scopus Q
Q1

OpenCitations Citation Count
2
Source
IEEE Signal Processing Letters
Volume
29
Issue
Start Page
2637
End Page
2641
PlumX Metrics
Citations
CrossRef : 1
Scopus : 14
Captures
Mendeley Readers : 7
SCOPUS™ Citations
14
checked on Feb 04, 2026
Web of Science™ Citations
9
checked on Feb 04, 2026
Page Views
303
checked on Feb 04, 2026
Downloads
633
checked on Feb 04, 2026
Google Scholar™


