Improving the Usage of Subword-Based Units for Turkish Speech Recognition

dc.contributor.author Çetinkaya, Gözde
dc.contributor.author Saraçlar, Murat
dc.contributor.author Arısoy, Ebru
dc.date.accessioned 2021-10-09T07:26:12Z
dc.date.available 2021-10-09T07:26:12Z
dc.date.issued 2020
dc.description.abstract Subword units are often utilized to achieve better performance in speech recognition because of the high number of observed words in agglutinative languages. In this study, the proper use of subword units is explored in recognition by a reconsideration of details such as silence modeling and position-dependent phones. A modified lexicon by finite-state transducers is implemented to represent the subword units correctly. Also, we experiment with different types of word boundary markers and achieve the best performance by adding a marker both to the left and right side of a subword unit. In our experiments on a Turkish broadcast news dataset, the subword models do outperform word-based models and naive subword implementations. Results show that using proper subword units leads to a relative word error rate (WER) reductions, which is 2.4%, compared with the word level automatic speech recognition (ASR) system for Turkish.
dc.description.sponsorship Istanbul Medipol Univ
dc.identifier.citation G. Çetinkaya, E. Arısoy and M. Saraçlar, (5-7 Oct. 2020). Improving the Usage of Subword-Based Units for Turkish Speech Recognition, 2020 28th Signal Processing and Communications Applications Conference (SIU), pp. 1-4, doi: 10.1109/SIU49456.2020.9302043. ‌
dc.identifier.doi 10.1109/SIU49456.2020.9302043
dc.identifier.isbn 9781728172064
dc.identifier.issn 2165-0608
dc.identifier.scopus 2-s2.0-85100307964
dc.identifier.uri https://doi.org/10.1109/SIU49456.2020.9302043
dc.identifier.uri https://hdl.handle.net/20.500.11779/1572
dc.language.iso tr
dc.publisher IEEE
dc.relation.ispartof 2020 28th Signal Processing and Communications Applications Conference (SIU)
dc.rights info:eu-repo/semantics/closedAccess
dc.subject Konuşma tanıma
dc.subject Language modelling
dc.subject Acoustic modelling
dc.subject Speech recognition
dc.subject Akustik modelleme
dc.subject Dil modelleme
dc.title Improving the Usage of Subword-Based Units for Turkish Speech Recognition
dc.title.alternative Türkçe konuşma tanıma için sözcük altı birimlerin kullanımının iyileştirilmesi
dc.type Conference Object
dspace.entity.type Publication
gdc.author.id Ebru Arısoy / 0000-0002-8311-3611
gdc.author.institutional Arısoy, Ebru
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.description.department Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümü
gdc.description.endpage 4
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
gdc.description.scopusquality N/A
gdc.description.startpage 1-4
gdc.description.woscitationindex Conference Proceedings Citation Index - Science
gdc.description.wosquality N/A
gdc.identifier.openalex W3119042470
gdc.identifier.wos WOS:000653136100017
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.5942106E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 1.652743E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 03 medical and health sciences
gdc.oaire.sciencefields 0305 other medical science
gdc.openalex.fwci 0.2937191
gdc.openalex.normalizedpercentile 0.65
gdc.opencitations.count 1
gdc.plumx.crossrefcites 1
gdc.plumx.mendeley 3
gdc.plumx.scopuscites 2
gdc.publishedmonth Ekim
gdc.relation.journal 2020 28th Signal Processing and Communications Applications Conference (SIU)
gdc.scopus.citedcount 2
gdc.virtual.author Arısoy Saraçlar, Ebru
gdc.wos.citedcount 1
gdc.wos.collaboration Uluslararası işbirliği ile yapılmayan - HAYIR
gdc.wos.documenttype Proceedings Paper
gdc.wos.indexdate 2020
gdc.wos.publishedmonth Ekim
gdc.yokperiod YÖK - 2020-21
relation.isAuthorOfPublication 0b895153-5793-4e46-bc2f-06a28b30f531
relation.isAuthorOfPublication.latestForDiscovery 0b895153-5793-4e46-bc2f-06a28b30f531
relation.isOrgUnitOfPublication de19334f-6a5b-4f7b-9410-9433c48d1e5a
relation.isOrgUnitOfPublication 0d54cd31-4133-46d5-b5cc-280b2c077ac3
relation.isOrgUnitOfPublication a6e60d5c-b0c7-474a-b49b-284dc710c078
relation.isOrgUnitOfPublication.latestForDiscovery de19334f-6a5b-4f7b-9410-9433c48d1e5a

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Improving_the_Usage_of_Subword-Based_Units_for_Turkish_Speech_Recognition.pdf
Size:
224.35 KB
Format:
Adobe Portable Document Format
Description:
Proceedings Paper

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: