Mixcycle: Unsupervised Speech Separation Via Cyclic Mixture Permutation Invariant Training
| dc.contributor.author | Karamatlı, Ertuğ | |
| dc.contributor.author | Kırbız, Serap | |
| dc.date.accessioned | 2023-10-18T12:06:14Z | |
| dc.date.available | 2023-10-18T12:06:14Z | |
| dc.date.issued | 2022 | |
| dc.description.abstract | We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the supervised baseline (PIT-DM) while circumventing the over-separation issue of MixIT. Also, we propose a self-evaluation technique inspired by MixCycle that estimates model performance without utilizing any reference sources. We show that it yields results consistent with an evaluation on reference sources (LibriMix) and also with an informal listening test conducted on a real-life mixtures dataset (REAL-M). | |
| dc.identifier.citation | Karamatlı, E., & Kırbız, S. (2022). MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training. IEEE Signal Processing Letters, 29, 2637-2641. | |
| dc.identifier.doi | 10.1109/LSP.2022.3232276 | |
| dc.identifier.issn | 1070-9908 | |
| dc.identifier.issn | 1558-2361 | |
| dc.identifier.scopus | 2-s2.0-85146250664 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.11779/1989 | |
| dc.identifier.uri | https://doi.org/10.1109/LSP.2022.3232276 | |
| dc.language.iso | en | |
| dc.publisher | IEEE | |
| dc.relation.ispartof | IEEE Signal Processing Letters | |
| dc.rights | info:eu-repo/semantics/openAccess | |
| dc.subject | Self-supervised learning | |
| dc.subject | Time-domain analysis | |
| dc.subject | Unsupervised learning | |
| dc.subject | Training | |
| dc.subject | Source separation | |
| dc.subject | Optimized production technology | |
| dc.subject | Recording | |
| dc.subject | Blind source separation | |
| dc.subject | Deep learning | |
| dc.subject | Task analysis | |
| dc.title | Mixcycle: Unsupervised Speech Separation Via Cyclic Mixture Permutation Invariant Training | |
| dc.type | Article | |
| dspace.entity.type | Publication | |
| gdc.author.institutional | Kırbız, Serap | |
| gdc.bip.impulseclass | C5 | |
| gdc.bip.influenceclass | C5 | |
| gdc.bip.popularityclass | C5 | |
| gdc.coar.access | open access | |
| gdc.coar.type | text::journal::journal article | |
| gdc.collaboration.industrial | false | |
| gdc.description.department | Mühendislik Fakültesi, Endüstri Mühendisliği Bölümü | |
| gdc.description.endpage | 2641 | |
| gdc.description.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | |
| gdc.description.scopusquality | Q1 | |
| gdc.description.startpage | 2637 | |
| gdc.description.volume | 29 | |
| gdc.description.woscitationindex | Science Citation Index Expanded | |
| gdc.description.wosquality | Q2 | |
| gdc.identifier.openalex | W4312540993 | |
| gdc.identifier.wos | WOS:000910559500004 | |
| gdc.index.type | WoS | |
| gdc.index.type | Scopus | |
| gdc.oaire.diamondjournal | false | |
| gdc.oaire.impulse | 2.0 | |
| gdc.oaire.influence | 2.6886524E-9 | |
| gdc.oaire.isgreen | true | |
| gdc.oaire.keywords | Signal Processing (eess.SP) | |
| gdc.oaire.keywords | FOS: Computer and information sciences | |
| gdc.oaire.keywords | Computer Science - Machine Learning | |
| gdc.oaire.keywords | Sound (cs.SD) | |
| gdc.oaire.keywords | deep learning | |
| gdc.oaire.keywords | unsupervised learning | |
| gdc.oaire.keywords | Unsupervised learning | |
| gdc.oaire.keywords | Computer Science - Sound | |
| gdc.oaire.keywords | Machine Learning (cs.LG) | |
| gdc.oaire.keywords | Time-domain analysis | |
| gdc.oaire.keywords | Audio and Speech Processing (eess.AS) | |
| gdc.oaire.keywords | Recording | |
| gdc.oaire.keywords | Task analysis | |
| gdc.oaire.keywords | self-supervised learning | |
| gdc.oaire.keywords | FOS: Electrical engineering, electronic engineering, information engineering | |
| gdc.oaire.keywords | Training | |
| gdc.oaire.keywords | Blind source separation | |
| gdc.oaire.keywords | Source separation | |
| gdc.oaire.keywords | Electrical Engineering and Systems Science - Signal Processing | |
| gdc.oaire.keywords | Optimized production technology | |
| gdc.oaire.keywords | Electrical Engineering and Systems Science - Audio and Speech Processing | |
| gdc.oaire.popularity | 4.058928E-9 | |
| gdc.oaire.publicfunded | false | |
| gdc.oaire.sciencefields | 02 engineering and technology | |
| gdc.oaire.sciencefields | 0202 electrical engineering, electronic engineering, information engineering | |
| gdc.openalex.collaboration | National | |
| gdc.openalex.fwci | 1.16974372 | |
| gdc.openalex.normalizedpercentile | 0.73 | |
| gdc.opencitations.count | 2 | |
| gdc.plumx.crossrefcites | 1 | |
| gdc.plumx.mendeley | 7 | |
| gdc.plumx.scopuscites | 14 | |
| gdc.publishedmonth | Aralık | |
| gdc.relation.journal | Ieee Signal Processing Letters | |
| gdc.scopus.citedcount | 14 | |
| gdc.virtual.author | Kırbız, Serap | |
| gdc.wos.citedcount | 9 | |
| gdc.wos.collaboration | Uluslararası işbirliği ile yapılmayan - HAYIR | |
| gdc.wos.documenttype | article | |
| gdc.wos.indexdate | 2022 | |
| gdc.wos.publishedmonth | Aralık | |
| gdc.yokperiod | YÖK - 2022-23 | |
| relation.isAuthorOfPublication | 552e4b0c-955f-4b93-925b-08cb2e6c5cc0 | |
| relation.isAuthorOfPublication.latestForDiscovery | 552e4b0c-955f-4b93-925b-08cb2e6c5cc0 | |
| relation.isOrgUnitOfPublication | de19334f-6a5b-4f7b-9410-9433c48d1e5a | |
| relation.isOrgUnitOfPublication | 0d54cd31-4133-46d5-b5cc-280b2c077ac3 | |
| relation.isOrgUnitOfPublication | a6e60d5c-b0c7-474a-b49b-284dc710c078 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | de19334f-6a5b-4f7b-9410-9433c48d1e5a |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- MixCycle_Unsupervised_Speech_Separation_via_Cyclic_Mixture_Permutation_Invariant_Training.pdf
- Size:
- 604.92 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full Text- Article
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 0 B
- Format:
- Item-specific license agreed upon to submission
- Description:
