On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers

dc.contributor.author Arslan, Şuayb Şefik
dc.contributor.author Zeydan, Engin
dc.date.accessioned 2021-07-09T08:48:36Z
dc.date.available 2021-07-09T08:48:36Z
dc.date.issued 2021
dc.description.abstract It has become commonplace to observe frequent multiple disk failures in big data centers in which thousands of drives operate simultaneously. Disks are typically protected by replication or erasure coding to guarantee a predetermined reliability. However, in order to optimize data protection, real life disk failure trends need to be modeled appropriately. The classical approach to modeling is to estimate the probability density function of failures using nonparametric estimation techniques such as kernel density estimation (KDE). However, these techniques are suboptimal in the absence of the true underlying density function. Moreover, insufficient data may lead to overfitting. In this article, we propose to use a set of transformations to the collected failure data for almost perfect regression in the transform domain. Then, by inverse transformation, we analytically estimated the failure density through the efficient computation of moment generating functions, and hence, the density functions. Moreover, we developed a visualization platform to extract useful statistical information such as model-based mean time to failure. Our results indicate that for other heavy-tailed data, the complex Gaussian hypergeometric distribution and classical KDE approach can perform best if the overfitting problem can be avoided and the complexity burden is overtaken. On the other hand, we show that the failure distribution exhibits less complex Argus-like distribution after performing the Box–Cox transformation up to appropriate scaling and shifting operations.
dc.description.sponsorship Turkiye Bilimsel ve Teknolojik Arastirma Kurumu (TUBITAK) 115C111 - 119E235 / Spanish MINEC TEC2017-88373-R / Generalitat de Catalunya 2017SGR1195
dc.identifier.citation Arslan, S. S., & Zeydan, E. (2021). On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers. IEEE Transactions on Reliability, 70(2), 507–524. https://doi.org/10.1109/tr.2020.3007127
dc.identifier.doi 10.1109/TR.2020.3007127
dc.identifier.issn 1558-1721
dc.identifier.issn 0018-9529
dc.identifier.scopus 2-s2.0-85110818271
dc.identifier.uri https://hdl.handle.net/20.500.11779/1512
dc.identifier.uri https://doi.org/10.1109/TR.2020.3007127
dc.language.iso en
dc.publisher IEEE
dc.relation.ispartof IEEE Transactions on Reliability
dc.rights info:eu-repo/semantics/openAccess
dc.subject Estimation
dc.subject Kernel density estimation (kde)
dc.subject Kernel
dc.subject Reliability
dc.subject Probability density function
dc.subject Measurement
dc.subject Modeling
dc.subject Predictive models
dc.subject Hard-disk systems
dc.subject Data analytics
dc.subject Data models
dc.subject Data storage
dc.title On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers
dc.type Article
dspace.entity.type Publication
gdc.author.id Şuayb Şefik Arslan / 0000-0003-3779-0731
gdc.author.id Şuayb Şefik Arslan / K-2883-2015
gdc.author.institutional Arslan, Şuayb Şefik
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.description.department Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
gdc.description.endpage 524
gdc.description.issue 2
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
gdc.description.scopusquality Q1
gdc.description.startpage 507 - 524
gdc.description.volume 70
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q1
gdc.identifier.openalex W3045668963
gdc.identifier.wos WOS:000659549200008
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.accesstype HYBRID
gdc.oaire.diamondjournal false
gdc.oaire.downloads 11
gdc.oaire.impulse 3.0
gdc.oaire.influence 2.8777756E-9
gdc.oaire.isgreen true
gdc.oaire.keywords Measurement
gdc.oaire.keywords hard-disk systems
gdc.oaire.keywords Data Storage
gdc.oaire.keywords data storage
gdc.oaire.keywords Modeling
gdc.oaire.keywords Data Analytics
gdc.oaire.keywords Data models
gdc.oaire.keywords modeling
gdc.oaire.keywords Reliability
gdc.oaire.keywords Kernel Density Estimation
gdc.oaire.keywords Predictive models
gdc.oaire.keywords Kernel
gdc.oaire.keywords Hard Disk Systems
gdc.oaire.keywords Data analytics
gdc.oaire.keywords Probability density function
gdc.oaire.keywords kernel density estimation (KDE)
gdc.oaire.keywords Kernel density estimation (KDE)
gdc.oaire.keywords Data storage
gdc.oaire.keywords Estimation
gdc.oaire.popularity 4.7283937E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 01 natural sciences
gdc.oaire.sciencefields 0101 mathematics
gdc.oaire.views 4
gdc.openalex.collaboration International
gdc.openalex.fwci 0.67812976
gdc.openalex.normalizedpercentile 0.71
gdc.opencitations.count 3
gdc.plumx.mendeley 3
gdc.plumx.scopuscites 5
gdc.publishedmonth Haziran
gdc.relation.journal IEEE Transactions on Reliability
gdc.scopus.citedcount 5
gdc.virtual.author Arslan, Şefik Şuayb
gdc.wos.citedcount 3
gdc.wos.collaboration Uluslararası işbirliği ile yapılan - EVET
gdc.wos.documenttype Article
gdc.wos.indexdate 2021
gdc.wos.publishedmonth Haziran
gdc.yokperiod YÖK - 2020-21
relation.isAuthorOfPublication 37152966-5384-4fd7-a0dc-34d1dd8bdc7f
relation.isAuthorOfPublication.latestForDiscovery 37152966-5384-4fd7-a0dc-34d1dd8bdc7f
relation.isOrgUnitOfPublication 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3
relation.isOrgUnitOfPublication 0d54cd31-4133-46d5-b5cc-280b2c077ac3
relation.isOrgUnitOfPublication a6e60d5c-b0c7-474a-b49b-284dc710c078
relation.isOrgUnitOfPublication.latestForDiscovery 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
On the Distribution Modeling.pdf
Size:
1.39 MB
Format:
Adobe Portable Document Format
Description:
Full Text - Article

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: