Mixcycle: Unsupervised Speech Separation Via Cyclic Mixture Permutation Invariant Training

Loading...
Thumbnail Image

Date

2022

Authors

Kırbız, Serap

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Open Access Color

Green Open Access

Yes

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the supervised baseline (PIT-DM) while circumventing the over-separation issue of MixIT. Also, we propose a self-evaluation technique inspired by MixCycle that estimates model performance without utilizing any reference sources. We show that it yields results consistent with an evaluation on reference sources (LibriMix) and also with an informal listening test conducted on a real-life mixtures dataset (REAL-M).

Description

Keywords

Self-supervised learning, Time-domain analysis, Unsupervised learning, Training, Source separation, Optimized production technology, Recording, Blind source separation, Deep learning, Task analysis, Signal Processing (eess.SP), FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), deep learning, unsupervised learning, Unsupervised learning, Computer Science - Sound, Machine Learning (cs.LG), Time-domain analysis, Audio and Speech Processing (eess.AS), Recording, Task analysis, self-supervised learning, FOS: Electrical engineering, electronic engineering, information engineering, Training, Blind source separation, Source separation, Electrical Engineering and Systems Science - Signal Processing, Optimized production technology, Electrical Engineering and Systems Science - Audio and Speech Processing

Turkish CoHE Thesis Center URL

Fields of Science

02 engineering and technology, 0202 electrical engineering, electronic engineering, information engineering

Citation

Karamatlı, E., & Kırbız, S. (2022). MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training. IEEE Signal Processing Letters, 29, 2637-2641.

WoS Q

Q2

Scopus Q

Q1
OpenCitations Logo
OpenCitations Citation Count
2

Source

IEEE Signal Processing Letters

Volume

29

Issue

Start Page

2637

End Page

2641
PlumX Metrics
Citations

CrossRef : 1

Scopus : 14

Captures

Mendeley Readers : 7

SCOPUS™ Citations

14

checked on Feb 04, 2026

Web of Science™ Citations

9

checked on Feb 04, 2026

Page Views

303

checked on Feb 04, 2026

Downloads

633

checked on Feb 04, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
1.16974372
Altmetrics Badge

Sustainable Development Goals

SDG data is not available