Cloud2hdd: Large-Scale Hdd Data Analysis on Cloud for Cloud Datacenters

dc.contributor.author Zeydan, Engin
dc.contributor.author Arslan, Şefik Şuayb
dc.date.accessioned 2020-05-31T13:51:23Z
dc.date.available 2020-05-31T13:51:23Z
dc.date.issued 2020
dc.description.abstract The main focus of this paper is to develop a distributed large scale data analysis platform for the opensource data of Backblaze cloud datacenter which consists of operational hard disk drive (HDD) information collected over an observable period of 2272 days (over 74 months). To carefully analyze the intrinsic characteristics of the hard disk behavior, we have exploited a large bolume of data and the benefits of Hadoop ecosystem as our big data processing engine. In other words, we have utilized a special distributed scheme on cloud for cloud HDD data, which is termed as Cloud2HDD. To classify the remaining lifetime of hard disk drives based on health indicators such as in-built S.M.A.R.T (Self-Monitoring, Analysis, and Reporting Technology) features, we used some of the state-of-the-art classification algorithms and compared their accuracy, precision, and recall rates simultaneously. In addition, importance of various S.M.A.R.T. features in predicting the true remaining lifetime of HDDs are identified. For instance, our analysis results indicate that Random Forest Classifier (RFC) can yield up to 94% accuracy with the highest precision and recall at a reasonable time by classifying the remaining lifetime of drives into one of three different classes, namely critical, high and low ideal states in comparison to other classification approaches based on a specific subset of S.M.A.R.T. features.
dc.description.sponsorship TÜBİTAK, MINECO
dc.identifier.citation Zeydan, E. & Arslan S. S. (February 01, 2020). Cloud2HDD: large-scale HDD data analysisn cloud for cloud datacenters, 23rd Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN 2020), Paris, France, IEEE, Article number: 9059482, pp. 243-249, DOI: https://doi.org/10.1109/ICIN48450.2020.9059482
dc.identifier.doi 10.1109/ICIN48450.2020.9059482
dc.identifier.isbn 9781728151281
dc.identifier.isbn 9781728151274
dc.identifier.issn 2472-8144
dc.identifier.issn 2162-3414
dc.identifier.scopus 2-s2.0-85084061181
dc.identifier.uri https://hdl.handle.net/20.500.11779/1325
dc.identifier.uri https://doi.org/10.1109/ICIN48450.2020.9059482
dc.language.iso en
dc.publisher IEEE
dc.relation.ispartof 23rd Conference on Innovation in Clouds, Internet and Networks and Workshops = ICIN 2020
dc.rights info:eu-repo/semantics/closedAccess
dc.subject Lifetime
dc.subject Hadoop
dc.subject Cloud
dc.subject Machine learning
dc.subject Data center
dc.subject Hdds
dc.title Cloud2hdd: Large-Scale Hdd Data Analysis on Cloud for Cloud Datacenters
dc.type Conference Object
dspace.entity.type Publication
gdc.author.id Şuayb Şefik Arslan / 0000-0003-3779-0731
gdc.author.id Şuayb Şefik Arslan / K-2883-2015
gdc.author.institutional Arslan, Şuayb Şefik
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.description.department Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
gdc.description.endpage 249
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
gdc.description.scopusquality N/A
gdc.description.startpage 243
gdc.description.woscitationindex Conference Proceedings Citation Index - Science
gdc.description.wosquality N/A
gdc.identifier.openalex W3016192933
gdc.identifier.wos WOS:000569984100041
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.downloads 12
gdc.oaire.impulse 3.0
gdc.oaire.influence 2.7317726E-9
gdc.oaire.isgreen true
gdc.oaire.keywords lifetime
gdc.oaire.keywords machine learning
gdc.oaire.keywords Hadoop
gdc.oaire.keywords HDDs
gdc.oaire.keywords Cloud
gdc.oaire.popularity 5.3484066E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0211 other engineering and technologies
gdc.oaire.sciencefields 02 engineering and technology
gdc.oaire.sciencefields 0101 mathematics
gdc.oaire.sciencefields 01 natural sciences
gdc.oaire.views 4
gdc.openalex.collaboration International
gdc.openalex.fwci 1.13229457
gdc.openalex.normalizedpercentile 0.83
gdc.opencitations.count 4
gdc.plumx.crossrefcites 2
gdc.plumx.mendeley 6
gdc.plumx.scopuscites 7
gdc.publishedmonth Şubat
gdc.scopus.citedcount 7
gdc.virtual.author Arslan, Şefik Şuayb
gdc.wos.citedcount 5
gdc.wos.documenttype Proceedings Paper
gdc.wos.indexdate 2020
gdc.wos.publishedmonth Şubat
gdc.yokperiod YÖK - 2019-20
relation.isAuthorOfPublication 37152966-5384-4fd7-a0dc-34d1dd8bdc7f
relation.isAuthorOfPublication.latestForDiscovery 37152966-5384-4fd7-a0dc-34d1dd8bdc7f
relation.isOrgUnitOfPublication 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3
relation.isOrgUnitOfPublication 0d54cd31-4133-46d5-b5cc-280b2c077ac3
relation.isOrgUnitOfPublication a6e60d5c-b0c7-474a-b49b-284dc710c078
relation.isOrgUnitOfPublication.latestForDiscovery 05ffa8cd-2a88-4676-8d3b-fc30eba0b7f3

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Şefik Şuayb ARSLAN.pdf
Size:
1.13 MB
Format:
Adobe Portable Document Format
Description:
Full Text - Conference Proceeding

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: