Cloud2hdd: Large-Scale Hdd Data Analysis on Cloud for Cloud Datacenters

Loading...
Thumbnail Image

Date

2020

Authors

Arslan, Şefik Şuayb

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Open Access Color

Green Open Access

Yes

OpenAIRE Downloads

12

OpenAIRE Views

4

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Top 10%

Research Projects

Journal Issue

Abstract

The main focus of this paper is to develop a distributed large scale data analysis platform for the opensource data of Backblaze cloud datacenter which consists of operational hard disk drive (HDD) information collected over an observable period of 2272 days (over 74 months). To carefully analyze the intrinsic characteristics of the hard disk behavior, we have exploited a large bolume of data and the benefits of Hadoop ecosystem as our big data processing engine. In other words, we have utilized a special distributed scheme on cloud for cloud HDD data, which is termed as Cloud2HDD. To classify the remaining lifetime of hard disk drives based on health indicators such as in-built S.M.A.R.T (Self-Monitoring, Analysis, and Reporting Technology) features, we used some of the state-of-the-art classification algorithms and compared their accuracy, precision, and recall rates simultaneously. In addition, importance of various S.M.A.R.T. features in predicting the true remaining lifetime of HDDs are identified. For instance, our analysis results indicate that Random Forest Classifier (RFC) can yield up to 94% accuracy with the highest precision and recall at a reasonable time by classifying the remaining lifetime of drives into one of three different classes, namely critical, high and low ideal states in comparison to other classification approaches based on a specific subset of S.M.A.R.T. features.

Description

Keywords

Lifetime, Hadoop, Cloud, Machine learning, Data center, Hdds, lifetime, machine learning, Hadoop, HDDs, Cloud

Turkish CoHE Thesis Center URL

Fields of Science

0211 other engineering and technologies, 02 engineering and technology, 0101 mathematics, 01 natural sciences

Citation

Zeydan, E. & Arslan S. S. (February 01, 2020). Cloud2HDD: large-scale HDD data analysisn cloud for cloud datacenters, 23rd Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN 2020), Paris, France, IEEE, Article number: 9059482, pp. 243-249, DOI: https://doi.org/10.1109/ICIN48450.2020.9059482

WoS Q

N/A

Scopus Q

N/A
OpenCitations Logo
OpenCitations Citation Count
4

Source

23rd Conference on Innovation in Clouds, Internet and Networks and Workshops = ICIN 2020

Volume

Issue

Start Page

243

End Page

249
PlumX Metrics
Citations

CrossRef : 2

Scopus : 7

Captures

Mendeley Readers : 6

SCOPUS™ Citations

7

checked on Feb 03, 2026

Web of Science™ Citations

5

checked on Feb 03, 2026

Page Views

187

checked on Feb 03, 2026

Downloads

28

checked on Feb 03, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
1.13229457

Sustainable Development Goals

SDG data is not available