On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers
| dc.contributor.author | Arslan, Şuayb Şefik | |
| dc.contributor.author | Zeydan, Engin | |
| dc.date.accessioned | 2021-07-09T08:48:36Z | |
| dc.date.available | 2021-07-09T08:48:36Z | |
| dc.date.issued | 2021 | |
| dc.description.abstract | It has become commonplace to observe frequent multiple disk failures in big data centers in which thousands of drives operate simultaneously. Disks are typically protected by replication or erasure coding to guarantee a predetermined reliability. However, in order to optimize data protection, real life disk failure trends need to be modeled appropriately. The classical approach to modeling is to estimate the probability density function of failures using nonparametric estimation techniques such as kernel density estimation (KDE). However, these techniques are suboptimal in the absence of the true underlying density function. Moreover, insufficient data may lead to overfitting. In this article, we propose to use a set of transformations to the collected failure data for almost perfect regression in the transform domain. Then, by inverse transformation, we analytically estimated the failure density through the efficient computation of moment generating functions, and hence, the density functions. Moreover, we developed a visualization platform to extract useful statistical information such as model-based mean time to failure. Our results indicate that for other heavy-tailed data, the complex Gaussian hypergeometric distribution and classical KDE approach can perform best if the overfitting problem can be avoided and the complexity burden is overtaken. On the other hand, we show that the failure distribution exhibits less complex Argus-like distribution after performing the Box–Cox transformation up to appropriate scaling and shifting operations. | |
| dc.description.sponsorship | Turkiye Bilimsel ve Teknolojik Arastirma Kurumu (TUBITAK) 115C111 - 119E235 / Spanish MINEC TEC2017-88373-R / Generalitat de Catalunya 2017SGR1195 | |
| dc.identifier.citation | Arslan, S. S., & Zeydan, E. (2021). On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers. IEEE Transactions on Reliability, 70(2), 507–524. https://doi.org/10.1109/tr.2020.3007127 | |
| dc.identifier.doi | 10.1109/TR.2020.3007127 | |
| dc.identifier.issn | 1558-1721 | |
| dc.identifier.issn | 0018-9529 | |
| dc.identifier.scopus | 2-s2.0-85110818271 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.11779/1512 | |
| dc.identifier.uri | https://doi.org/10.1109/TR.2020.3007127 | |
| dc.language.iso | en | |
| dc.publisher | IEEE | |
| dc.relation.ispartof | IEEE Transactions on Reliability | |
| dc.rights | info:eu-repo/semantics/openAccess | |
| dc.subject | Estimation | |
| dc.subject | Kernel density estimation (kde) | |
| dc.subject | Kernel | |
| dc.subject | Reliability | |
| dc.subject | Probability density function | |
| dc.subject | Measurement | |
| dc.subject | Modeling | |
| dc.subject | Predictive models | |
| dc.subject | Hard-disk systems | |
| dc.subject | Data analytics | |
| dc.subject | Data models | |
| dc.subject | Data storage | |
| dc.title | On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers | |
| dc.type | Article | |
| dspace.entity.type | Publication | |
| gdc.author.id | Şuayb Şefik Arslan / 0000-0003-3779-0731 | |
| gdc.author.id | Şuayb Şefik Arslan / K-2883-2015 | |
| gdc.author.institutional | Arslan, Şuayb Şefik | |
| gdc.bip.impulseclass | C5 | |
| gdc.bip.influenceclass | C5 | |
| gdc.bip.popularityclass | C4 | |
| gdc.coar.access | open access | |
| gdc.coar.type | text::journal::journal article | |
| gdc.description.department | Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü | |
| gdc.description.endpage | 524 | |
| gdc.description.issue | 2 | |
| gdc.description.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | |
| gdc.description.scopusquality | Q1 | |
| gdc.description.startpage | 507 - 524 | |
| gdc.description.volume | 70 | |
| gdc.description.woscitationindex | Science Citation Index Expanded | |
| gdc.description.wosquality | Q1 | |
| gdc.identifier.openalex | W3045668963 | |
| gdc.identifier.wos | WOS:000659549200008 | |
| gdc.index.type | WoS | |
| gdc.index.type | Scopus | |
| gdc.oaire.accesstype | HYBRID | |
| gdc.oaire.diamondjournal | false | |
| gdc.oaire.downloads | 11 | |
| gdc.oaire.impulse | 3.0 | |
| gdc.oaire.influence | 2.8777756E-9 | |
| gdc.oaire.isgreen | true | |
| gdc.oaire.keywords | Measurement | |
| gdc.oaire.keywords | hard-disk systems | |
| gdc.oaire.keywords | Data Storage | |
| gdc.oaire.keywords | data storage | |
| gdc.oaire.keywords | Modeling | |
| gdc.oaire.keywords | Data Analytics | |
| gdc.oaire.keywords | Data models | |
| gdc.oaire.keywords | modeling | |
| gdc.oaire.keywords | Reliability | |
| gdc.oaire.keywords | Kernel Density Estimation | |
| gdc.oaire.keywords | Predictive models | |
| gdc.oaire.keywords | Kernel | |
| gdc.oaire.keywords | Hard Disk Systems | |
| gdc.oaire.keywords | Data analytics | |
| gdc.oaire.keywords | Probability density function | |
| gdc.oaire.keywords | kernel density estimation (KDE) | |
| gdc.oaire.keywords | Kernel density estimation (KDE) | |
| gdc.oaire.keywords | Data storage | |
| gdc.oaire.keywords | Estimation | |
| gdc.oaire.popularity | 4.7283937E-9 | |
| gdc.oaire.publicfunded | false | |
| gdc.oaire.sciencefields | 01 natural sciences | |
| gdc.oaire.sciencefields | 0101 mathematics | |
| gdc.oaire.views | 4 | |
| gdc.openalex.collaboration | International | |
| gdc.openalex.fwci | 0.67812976 | |
| gdc.openalex.normalizedpercentile | 0.71 | |
| gdc.opencitations.count | 3 | |
| gdc.plumx.mendeley | 3 | |
| gdc.plumx.scopuscites | 5 | |
| gdc.publishedmonth | Haziran | |
| gdc.relation.journal | IEEE Transactions on Reliability | |
| gdc.scopus.citedcount | 5 | |
| gdc.virtual.author | Arslan, Şefik Şuayb | |
| gdc.wos.citedcount | 3 | |
| gdc.wos.collaboration | Uluslararası işbirliği ile yapılan - EVET | |
| gdc.wos.documenttype | Article | |
| gdc.wos.indexdate | 2021 | |
| gdc.wos.publishedmonth | Haziran | |
| gdc.yokperiod | YÖK - 2020-21 | |
| relation.isAuthorOfPublication | 37152966-5384-4fd7-a0dc-34d1dd8bdc7f | |
| relation.isAuthorOfPublication.latestForDiscovery | 37152966-5384-4fd7-a0dc-34d1dd8bdc7f | |
| relation.isOrgUnitOfPublication | 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3 | |
| relation.isOrgUnitOfPublication | 0d54cd31-4133-46d5-b5cc-280b2c077ac3 | |
| relation.isOrgUnitOfPublication | a6e60d5c-b0c7-474a-b49b-284dc710c078 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3 |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- On the Distribution Modeling.pdf
- Size:
- 1.39 MB
- Format:
- Adobe Portable Document Format
- Description:
- Full Text - Article
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.44 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
