Cloud2hdd: Large-Scale Hdd Data Analysis on Cloud for Cloud Datacenters

dc.contributor.author Zeydan, Engin
dc.contributor.author Arslan, Şefik Şuayb
dc.date.accessioned 2020-05-31T13:51:23Z
dc.date.available 2020-05-31T13:51:23Z
dc.date.issued 2020
dc.description.abstract The main focus of this paper is to develop a distributed large scale data analysis platform for the opensource data of Backblaze cloud datacenter which consists of operational hard disk drive (HDD) information collected over an observable period of 2272 days (over 74 months). To carefully analyze the intrinsic characteristics of the hard disk behavior, we have exploited a large bolume of data and the benefits of Hadoop ecosystem as our big data processing engine. In other words, we have utilized a special distributed scheme on cloud for cloud HDD data, which is termed as Cloud2HDD. To classify the remaining lifetime of hard disk drives based on health indicators such as in-built S.M.A.R.T (Self-Monitoring, Analysis, and Reporting Technology) features, we used some of the state-of-the-art classification algorithms and compared their accuracy, precision, and recall rates simultaneously. In addition, importance of various S.M.A.R.T. features in predicting the true remaining lifetime of HDDs are identified. For instance, our analysis results indicate that Random Forest Classifier (RFC) can yield up to 94% accuracy with the highest precision and recall at a reasonable time by classifying the remaining lifetime of drives into one of three different classes, namely critical, high and low ideal states in comparison to other classification approaches based on a specific subset of S.M.A.R.T. features.
dc.description.sponsorship TÜBİTAK, MINECO
dc.description.sponsorship Gen-eralitat de Catalunya; TUBITAK, (2232-115C111); Generalitat de Catalunya, (2017SGR1195); Ministerio de Economía y Competitividad, MINECO, (5G-REFINE, TEC2017-88373-R); Türkiye Bilimsel ve Teknolojik Araştirma Kurumu, TÜBITAK
dc.description.sponsorship This work was partially funded by The Scientific and Technological Research Council of Turkey (TUBITAK) under the grant number 2232-115C111, Spanish MINECO under the grant number TEC2017-88373-R (5G-REFINE) and by Generalitat de Catalunya under the grant number 2017SGR1195.
dc.description.sponsorship ACKNOWLEDGMENT This work was partially funded by The Scientific and Technological Research Council of Turkey (TUBITAK) under the grant number 2232-115C111, Spanish MINECO under the grant number TEC2017-88373-R (5G-REFINE) and by Gen-eralitat de Catalunya under the grant number 2017SGR1195.
dc.description.sponsorship Scientific and Technological Research Council of Turkey (TUBITAK) [2232-115C111]; Spanish MINECO [TEC2017-88373-R]; Generalitat de Catalunya [2017SGR1195]
dc.identifier.citation Zeydan, E. & Arslan S. S. (February 01, 2020). Cloud2HDD: large-scale HDD data analysisn cloud for cloud datacenters, 23rd Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN 2020), Paris, France, IEEE, Article number: 9059482, pp. 243-249, DOI: https://doi.org/10.1109/ICIN48450.2020.9059482
dc.identifier.doi 10.1109/ICIN48450.2020.9059482
dc.identifier.isbn 9781728151281
dc.identifier.isbn 9781728151274
dc.identifier.issn 2472-8144
dc.identifier.issn 2162-3414
dc.identifier.scopus 2-s2.0-85084061181
dc.identifier.uri https://hdl.handle.net/20.500.11779/1325
dc.identifier.uri https://doi.org/10.1109/ICIN48450.2020.9059482
dc.language.iso en
dc.publisher IEEE
dc.relation.ispartof 23rd Conference on Innovation in Clouds, Internet and Networks and Workshops = ICIN 2020
dc.relation.ispartofseries Conference on Innovations in Clouds Internet and Networks
dc.rights info:eu-repo/semantics/closedAccess
dc.subject Lifetime
dc.subject Hadoop
dc.subject Cloud
dc.subject Machine learning
dc.subject Data center
dc.subject Hdds
dc.title Cloud2hdd: Large-Scale Hdd Data Analysis on Cloud for Cloud Datacenters
dc.type Conference Object
dspace.entity.type Publication
gdc.author.id Şuayb Şefik Arslan / 0000-0003-3779-0731
gdc.author.id Şuayb Şefik Arslan / K-2883-2015
gdc.author.id Arslan, Suayb/0000-0003-3779-0731
gdc.author.institutional Arslan, Şuayb Şefik
gdc.author.scopusid 24315322700
gdc.author.scopusid 35955672100
gdc.author.wosid Zeydan, Engin/AAI-2467-2019
gdc.author.wosid Arslan, Suayb/K-2883-2015
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.collaboration.industrial false
gdc.description.department Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
gdc.description.departmenttemp [Zeydan, Engin] Ctr Technol Telecomunicac Catalunya, Barcelona 08860, Spain; [Arslan, Suayb S.] MEF Univ, Dept Comp Engn, TR-34912 Istanbul, Turkey
gdc.description.endpage 249
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
gdc.description.scopusquality N/A
gdc.description.startpage 243
gdc.description.woscitationindex Conference Proceedings Citation Index - Science
gdc.description.wosquality N/A
gdc.identifier.openalex W3016192933
gdc.identifier.wos WOS:000569984100041
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.downloads 12
gdc.oaire.impulse 3.0
gdc.oaire.influence 2.7317726E-9
gdc.oaire.isgreen true
gdc.oaire.keywords lifetime
gdc.oaire.keywords machine learning
gdc.oaire.keywords Hadoop
gdc.oaire.keywords HDDs
gdc.oaire.keywords Cloud
gdc.oaire.popularity 5.3484066E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0211 other engineering and technologies
gdc.oaire.sciencefields 02 engineering and technology
gdc.oaire.sciencefields 0101 mathematics
gdc.oaire.sciencefields 01 natural sciences
gdc.oaire.views 4
gdc.openalex.collaboration International
gdc.openalex.fwci 1.0676
gdc.openalex.normalizedpercentile 0.83
gdc.opencitations.count 4
gdc.plumx.crossrefcites 2
gdc.plumx.mendeley 6
gdc.plumx.scopuscites 7
gdc.publishedmonth Şubat
gdc.scopus.citedcount 7
gdc.virtual.author Arslan, Şefik Şuayb
gdc.wos.citedcount 5
gdc.wos.documenttype Proceedings Paper
gdc.wos.indexdate 2020
gdc.wos.publishedmonth Şubat
gdc.yokperiod YÖK - 2019-20
relation.isAuthorOfPublication 37152966-5384-4fd7-a0dc-34d1dd8bdc7f
relation.isAuthorOfPublication.latestForDiscovery 37152966-5384-4fd7-a0dc-34d1dd8bdc7f
relation.isOrgUnitOfPublication 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3
relation.isOrgUnitOfPublication 0d54cd31-4133-46d5-b5cc-280b2c077ac3
relation.isOrgUnitOfPublication a6e60d5c-b0c7-474a-b49b-284dc710c078
relation.isOrgUnitOfPublication.latestForDiscovery 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Şefik Şuayb ARSLAN.pdf
Size:
1.13 MB
Format:
Adobe Portable Document Format
Description:
Full Text - Conference Proceeding

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: