Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Permanent URI for this collectionhttps://hdl.handle.net/20.500.11779/1926

Browse

Now showing 1 - 20 of 33

Citation - WoS: 5
Citation - Scopus: 8
A Data-Assisted Reliability Model for Carrier-Assisted Cold Data Storage Systems
(Elsevier, 2020) Arslan, Şuayb Şefik; Göker, Turguy; Peng, James
Cold data storage systems are used to allow long term digital preservation for institutions’ archive. The common functionality among cold and warm/hot data storage is that the data is stored on some physical medium for read-back at a later time. However in cold storage, write and read operations are not necessarily done in the same exact geographical location. Hence, a third party assistance is typically utilized to bring together the medium and the drive. On the other hand, the reliability modeling of such a decomposed system poses few challenges that do not necessarily exist in other warm/hot storage alternatives such as fault detection and absence of the carrier, all totaling up to the data unavailability issues. In this paper, we propose a generalized non-homogenous Markov model that encompasses the aging of the carriers in order to address the requirements of today's cold data storage systems in which the data is encoded and spread across multiple nodes for the long-term data retention. We have derived useful lower/upper bounds on the overall system availability. Furthermore, the collected field data is used to estimate parameters of a Weibull distribution to accurately predict the lifetime of the carriers in an example scale-out setting.
Citation - WoS: 3
Citation - Scopus: 3
A Joint Dedupe-Fountain Coded Archival Storage
(IEEE, 2017) Arslan, Şuayb Şefik; Göker, Turguy; Wideman, Rod
An erasure-coded archival file storage system is presented using a chunk-based deduplication mechanism and fountain codes for space/time efficient operation. Unlike traditional archival storage, this proposal considers the deduplication operation together with correction coding in order to provide a reliable storage solution. The building blocks of deduplication and fountain coding processes are judiciously interleaved to present two novel ideas, reducing memory footprint with weaker hashing and dealing with the increased collisions using correction coding, and applying unequal error protection to deduplicated chunks for increased availability. The combination of these two novel ideas made the performance of the proposed system stand out. For example, it is shown to outperform one of the replication-based as well as RAID data protection schemes. The proposed system also addresses some of the fundamental challenges of today's low-cost deduplicated data storage systems such as hash collisions, disk bottleneck and RAM overflow problems, securing savings up to 90% regular RAM use.
A New Benchmark Dataset for P300 Erp-Based Bci Applications
(Academic Press Inc Elsevier Science, 2023) Çakar, Tuna; Özkan, Hüseyin; Musellim, Serkan; Arslan, Suayb S.; Yağan, Mehmet; Alp, Nihan
Because of its non-invasive nature, one of the most commonly used event-related potentials in brain -computer interface (BCI) system designs is the P300 electroencephalogram (EEG) signal. The fact that the P300 response can easily be stimulated and measured is particularly important for participants with severe motor disabilities. In order to train and test P300-based BCI speller systems in more realistic high-speed settings, there is a pressing need for a large and challenging benchmark dataset. Various datasets already exist in the literature but most of them are not publicly available, and they either have a limited number of participants or utilize relatively long stimulus duration (SD) and inter-stimulus intervals (ISI). They are also typically based on a 36 target (6 x 6) character matrix. The use of long ISI, in particular, not only reduces the speed and the information transfer rates (ITRs) but also oversimplifies the P300 detection. This leaves a limited challenge to state-of-the-art machine learning and signal processing algorithms. In fact, near-perfect P300 classification accuracies are reported with the existing datasets. Therefore, one certainly needs a large-scale dataset with challenging settings to fully exploit the recent advancements in algorithm design (machine learning and signal processing) and achieve high-performance speller results. To this end, in this article we introduce a new freely-and publicly-accessible P300 dataset obtained using 32-channel EEG, in the hope that it will lead to new research findings and eventually more efficient BCI designs. The introduced dataset comprises 18 participants performing a 40 -target (5 x 8) cued-spelling task, with reduced SD (66.6 ms) and ISI (33.3 ms) for fast spelling. We have also processed, analyzed, and character-classified the introduced dataset and we presented the accuracy and ITR results as a benchmark. The introduced dataset and the codes of our experiments are publicly accessible at https://data .mendeley.com /datasets /vyczny2r4w.(c) 2023 Elsevier Inc. All rights reserved.
Citation - WoS: 6
Citation - Scopus: 7
A Reliability Model for Dependent and Distributed Mds Disk Array Units
(IEEE Transactions on Reliability, 2018) Arslan, Şuayb Şefik
Archiving and systematic backup of large digital data generates a quick demand for multi-petabyte scale storage systems. As drive capacities continue to grow beyond the few terabytes range to address the demands of today’s cloud, the likelihood of having multiple/simultaneous disk failures became a reality. Among the main factors causing catastrophic system failures, correlated disk failures and the network bandwidth are reported to be the two common source of performance degradation. The emerging trend is to use efficient/sophisticated erasure codes (EC) equipped with multiple parities and efficient repairs in order to meet the reliability/bandwidth requirements. It is known that mean time to failure and repair rates reported by the disk manufacturers cannot capture life-cycle patterns of distributed storage systems. In this study, we develop failure models based on generalized Markov chains that can accurately capture correlated performance degradations with multiparity protection schemes based on modern maximum distance separable EC. Furthermore, we use the proposed model in a distributed storage scenario to quantify two example use cases: Primarily, the common sense that adding more parity disks are only meaningful if we have a decent decorrelation between the failure domains of storage systems and the reliability of generic multiple single-dimensional EC protected storage systems.
Citation - Scopus: 2
A Visualization Platfom for Disk Failure Analysis
(IEEE, 2018) Arslan, Şuayb Şefik; Yiğit, İbrahim Onuralp; Zeydan, Engin
It has become a norm rather than an exception to observe multiple disks malfunctioning or whole disk failures in places like big data centers where thousands of drives operate simultaneously. Data that resides on these devices is typically protected by replication or erasure coding for long-term durable storage. However, to be able to optimize data protection methods, real life disk failure trends need to be modeled. Modelling helps us build insights while in the design phase and properly optimize protection methods for a given application. In this study, we developed a visualization platform in light of disk failure data provided by BackBlaze, and extracted useful statistical information such as failure rate and model-based time to failure distributions. Finally, simple modeling is performed for disk failure predictions to alarm and take necessary system-wide precautions.
Citation - WoS: 1
Citation - Scopus: 1
Adaptive Boosting of Dnn Ensembles for Brain-Computer Interface Spellers
(IEEE, 2021) Çatak, Yiğit; Aksoy, Can; Özkan, Hüseyin; Güney, Osman Berke; Koç, Emirhan; Arslan, Şuayb Şefik
Steady-state visual evoked potentials (SSVEP) are commonly used in brain computer interface (BCI) applications such as spelling systems, due to their advantages over other paradigms. In this study, we develop a method for SSVEP-based BCI speller systems, using a known deep neural network (DNN), which includes transfer and ensemble learning techniques. We test performance of our method on publicly available benchmark and BETA datasets with leave-one-subject-out procedure. Our method consists of two stages. In the first stage, a global DNN is trained using data from all subjects except one subject that is excluded for testing. In the second stage, the global model is fine-tuned to each subject whose data are used in the training. Combining the responses of trained DNNs with different weights for each test subject, rather than an equal weight, provide better performance as brain signals may differ significantly between individuals. To this end, weights of DNNs are learnt with SAMME algorithm with using data belonging to the test subject. Our method significantly outperforms canonical correlation analysis (CCA) and filter bank canonical correlation analysis (FBCCA) methods.
Citation - WoS: 13
Citation - Scopus: 21
Advancements in Distributed Ledger Technology for Internet of Things
(Elsevier, 2020) Jurdak, Raja; Arslan, Şuayb Şefik; Krishnamachari, Bhaskar; Jelitto, Jens
Internet of Things (IoT) is paving the way for different kinds of devices to be connected and properly communicated at a mass scale. However, conventional mechanisms used to sustain security and privacy cannot be directly applied to IoT whose topology is increasingly becoming decentralized. Distributed Ledger Technologies (DLT) on the other hand comprise varying forms of decentralized data structures that provide immutability through cryptographically linking blocks of data. To be able to build reliable, autonomous and trusted IoT platforms, DLT has the potential to provide security, privacy and decentralized operation while adhering to the limitations of IoT devices. The marriage of IoT and DLT technology is not very recent. In fact many projects have been focusing on this interesting combination to address the challenges of smart cities, smart grids, internet of everything and other decentralized applications, most based on blockchain structures. In this special issue, the focus is on the new and broader technical problems associated with the DLT-based security and backend platform solutions for IoT devices and applications.
Citation - WoS: 14
Citation - Scopus: 40
An Overview of Blockchain Technologies: Principles, Opportunities and Challenges
(IEEE, 2018) Arslan, Şuayb Şefik; Mermer, Gültekin Berahan; Zeydan, Engin
Blokzincir, toplumumuzun birbiriyle iletişim kurma ve ticaret yapma biçiminde devrim yapma potansiyeline sahip, yakın zamanda ortaya çıkmış olan bir teknolojidir. Bu teknolojinin sağladığı en önemli avantaj aracı gerektiren bir oluşumda güvenilir bir merkezi kuruma ihtiyaç duymadan değer taşıyan işlemleri değiş tokuş edebilmesidir. Ayrıca, veri bütünlüğü, dahili orijinallik ve kullanıcı şeffaflığı sağlayabilir. Blokzincir, birçok yenilikçi uygulamanın temel alınacağı yeni internet olarak görülebilir. Bu çalışmada, genel çalışma prensibi, oluşan fırsatlar ve ileride karşılaşılabilecek zorlukları içerecek şekilde güncel blokzincir teknolojilerinin genel bir görünümünü sunmaktayız.
Citation - WoS: 3
Citation - Scopus: 3
Array Bp-Xor Codes for Hierarchically Distributed Matrix Multiplication
(IEEE, 2021) Arslan, Şuayb Şefik
A novel fault-tolerant computation technique based on array Belief Propagation (BP)-decodable XOR (BP-XOR) codes is proposed for distributed matrix-matrix multiplication. The proposed scheme is shown to be configurable and suited for modern hierarchical compute architectures such as Graphical Processing Units (GPUs) equipped with multiple nodes, whereby each has many small independent processing units with increased core-to-core communications. The proposed scheme is shown to outperform a few of the well–known earlier strategies in terms of total end-to-end execution time while in presence of slow nodes, called stragglers. This performance advantage is due to the careful design of array codes which distributes the encoding operation over the cluster (slave) nodes at the expense of increased master-slave communication. An interesting trade-off between end-to-end latency and total communication cost is precisely described. In addition, to be able to address an identified problem of scaling stragglers, an asymptotic version of array BP-XOR codes based on projection geometry is proposed at the expense of some computation overhead. A thorough latency analysis is conducted for all schemes to demonstrate that the proposed scheme achieves order-optimal computation in both the sublinear as well as the linear regimes in the size of the computed product from an end-to-end delay perspective.
Citation - WoS: 3
Citation - Scopus: 3
Asymptotically Mds Array Bp-Xor Codes
(IEEE, 2018) Arslan, Şuayb Şefik
Belief propagation (BP) on binary erasure channels (BEC) is a low complexity decoding algorithm that allows the recovery of message symbols based on bipartite graph pruning process. Recently, array XOR codes have attracted attention for storage systems due to their burst error recovery performance and easy arithmetic based on Exclusive OR (XOR)-only logic operations. Array BP-XOR codes are a subclass of array XOR codes that can be decoded using BP under BEC. Requiring the capability of BP-decodability in addition to Maximum Distance Separability (MDS) constraint on the code construction process is observed to put an upper bound on the achievable code block-length, which leads to the code construction process to become a hard problem. In this study, we introduce asymptotically MDS array BP-XOR codes that are alternative to exact MDS array BP-XOR codes to allow for easier code constructions while keeping the decoding complexity low with an asymptotically vanishing coding overhead. We finally provide a code construction method that is based on discrete geometry to fulfill the requirements of the class of asymptotically MDS array BP-XOR codes.
Citation - WoS: 2
Citation - Scopus: 1
Average Bandwidth-Cost Vs. Storage Trade-Off for Bs-Assisted Distributed Storage Networks
(IEEE, 2021) Tengiz, Ayse Ceyda; Haytaoğlu, Elif; Pusane, Ali Emre; Arslan, Şuayb Şefik; Pourmandi, Massoud
In this study, we consider a hierarchically structured base station (BS)-assisted cellular system equipped with a backend distributed data storage in which nodes randomly arrive and depart the cell. We numerically motivate and characterize the fundamental trade-off between the average repair bandwidth cost versus storage space where BS communication cost (higher than that of local) and link capacity constraints exist while the number of failed nodes can vary dynamically. We establish the capacity region that is most relevant to 5G and beyond networks, which are layered by design. We hope that this study shall motivate novel regeneration code constructions that will be able to achieve the presented limits.
Citation - WoS: 1
Citation - Scopus: 1
Base Station-Assisted Cooperative Network Coding for Cellular Systems With Link Constraints
(IEEE, 2022) Arslan, Suayb S.; Pourmandi, Massoud; Haytaoglu, Elif
We consider a novel distributed data storage/caching scenario in a cellular network, where multiple nodes may fail/depart simultaneously To meet reliability, we allow cooperative regeneration of lost nodes with the help of base stations allocated in a set of hierarchical layers1. Due to this layered structure, a symbol download from each base station has a different cost, while the link capacities between the nodes of the cellular system and the base stations are also constrained. Under such a setting, we formulate the fundamental trade-off with closed form expressions between repair bandwidth cost and the storage space per node. Particularly, the minimum storage as well as bandwidth cost points are formulated. Finally, we provide an explicit optimal code construction for the minimum storage regeneration point for a special set of system parameters.
Citation - WoS: 5
Citation - Scopus: 7
Cloud2hdd: Large-Scale Hdd Data Analysis on Cloud for Cloud Datacenters
(IEEE, 2020) Zeydan, Engin; Arslan, Şefik Şuayb
The main focus of this paper is to develop a distributed large scale data analysis platform for the opensource data of Backblaze cloud datacenter which consists of operational hard disk drive (HDD) information collected over an observable period of 2272 days (over 74 months). To carefully analyze the intrinsic characteristics of the hard disk behavior, we have exploited a large bolume of data and the benefits of Hadoop ecosystem as our big data processing engine. In other words, we have utilized a special distributed scheme on cloud for cloud HDD data, which is termed as Cloud2HDD. To classify the remaining lifetime of hard disk drives based on health indicators such as in-built S.M.A.R.T (Self-Monitoring, Analysis, and Reporting Technology) features, we used some of the state-of-the-art classification algorithms and compared their accuracy, precision, and recall rates simultaneously. In addition, importance of various S.M.A.R.T. features in predicting the true remaining lifetime of HDDs are identified. For instance, our analysis results indicate that Random Forest Classifier (RFC) can yield up to 94% accuracy with the highest precision and recall at a reasonable time by classifying the remaining lifetime of drives into one of three different classes, namely critical, high and low ideal states in comparison to other classification approaches based on a specific subset of S.M.A.R.T. features.
Citation - WoS: 12
Citation - Scopus: 18
Compress-Store on Blockchain: a Decentralized Data Processing and Immutable Storage for Multimedia Streaming
(Springer, 2022) Arslan, Şuayb Şefik; Turguy, Göker; Goker, Turguy
Decentralization for data storage is a challenging problem for blockchain-based solutions as the blocksize plays a key role for scalability. In addition, specific requirements of multimedia data call for various changes in the blockchain technology internals. Considering one of the most popular applications of secure multimedia streaming, i.e., video surveillance, it is not clear how to judiciously encode incentivization, immutability, and compression into a viable ecosystem. In this study, we provide a genuine scheme that achieves this encoding for a video surveillance application. The proposed scheme provides a novel integration of data compression, immutable off-chain data storage using a new consensus protocol namely, Proof-of-WorkStore (PoWS) in order to enable fully useful work to be performed by the miner nodes of the network. The proposed idea is the first step towards achieving greener application of a blockchain-based environment to the video storage business that utilizes system resources efficiently.
Citation - WoS: 4
Citation - Scopus: 4
Cost of Guessing: Applications To Data Repair
(Institute of Electrical and Electronics Engineers Inc., 2020) Arslan, Şuayb Şefik; Haytaoğlu, Elif
In this paper, we introduce the notion of cost of guessing and provide an optimal strategy for guessing a random variable taking values on a finite set whereby each choice may be associated with a positive finite cost value. Moreover, we drive asymptotically tight upper and lower bounds on the moments of cost of guessing problem. Similar to previous studies on the standard guesswork, established bounds on moments quantify the accumulated cost of guesses required for correctly identifying the unknown choice and are expressed in terms of the Rényi's entropy. A new random variable is introduced to bridge between cost of guessing and the standard guesswork and establish the guessing cost exponent on the moments of the optimal guessing. Furthermore, these bounds are shown to serve quite useful for finding repair latency cost for distributed data storage in which sparse graph codes may be utilized.
Citation - WoS: 3
Citation - Scopus: 3
Data Repair in Bs-Assisted Distributed Data Caching
(IEEE, 2020) Kaya, Erdi; Haytaoğlu, Elif; Arslan, Şuayb Şefik
In this paper, centralized and independent repair approaches based on device-to-device communication for the repair of the lost nodes have been investigated in a cellular network where distributed caching is applied whose fault tolerance is provided by erasure codes. The caching mechanisms based on Reed-Solomon codes and minimum bandwidth regenerating codes are adopted. The proposed approaches are analyzed in a simulation environment in terms of base station utilization load during the repair process. Based on the intuitive assumption that the base station is usually more costly than device-to-device communication, the centralized repair approach demonstrates a better performance than the independent repair approaches on the number of symbols retrieved from the base station. On the other hand, the centralized approach has not achieved a dramatic reduction in the number of symbols downloaded from the other devices.
Citation - WoS: 4
Citation - Scopus: 5
Data Repair-Efficient Fault Tolerance for Cellular Networks Using Ldpc Codes
(IEEE, 2021) Haytaoglu, Elif; Kaya, Erdi; Arslan, Şuayb Şefik
The base station-mobile device communication traffic has dramatically increased recently due to mobile data, which in turn heavily overloaded the underlying infrastructure. To decrease Base Station (BS) interaction, intra-cell communication between local devices, known as Device-to-Device, is utilized for distributed data caching. Nevertheless, due to the continuous departure of existing nodes and the arrival of newcomers, the missing cached data may lead to permanent data loss. In this study, we propose and analyze a class of LDPC codes for distributed data caching in cellular networks. Contrary to traditional distributed storage, a novel repair algorithm for LDPC codes is proposed which is designed to exploit the minimal direct BS communication. To assess the versatility of LDPC codes and establish performance comparisons to classic coding techniques, novel theoretical and experimental evaluations are derived. Essentially, the theoretical/numerical results for repair bandwidth cost in presence of BS are presented in a distributed caching setting. Accordingly, when the gap between the cost of downloading a symbol from BS and from other local network nodes is not dramatically high, we demonstrate that LDPC codes can be considered as a viable fault-tolerance alternative in cellular systems with caching capabilities for both low and high code rates.
Citation - WoS: 2
Citation - Scopus: 2
Distributed Matrix Multiplication With Mds Array Bp-Xor Codes for Scaling Clusters
(IEEE, 2019) Arslan, Şuayb Şefik
This study presents a novel coded computation technique for distributed matrix-matrix product computation at a massive scale that outperforms well known previous strategies in terms of total execution time. Our method achieves this performance by distributing the encoding operation over the cluster (slave) nodes at the expense of increased master-slave communication. The product computation is performed using MDS array Belief Propagation (BP)-decodable codes based on pure XOR operations. In addition, our scheme is configurable and suited for modern compute node architectures equipped with multiple processing units organized in a hierarchical manner. Assuming the number of backup nodes being sublinear in the size of the product, we shall demonstrate that the proposed scheme achieves order-optimal computation from an end-to-end latency perspective while ensuring acceptable communication requirements that can be addressed by today's high speed network link infrastructures.
Citation - WoS: 3
Citation - Scopus: 3
Exact Construction of Bs-Assisted Mscr Codes With Link Constraints
(IEEE Communications Letters, 2021) Arslan, Şuayb Şefik
It is clear that 5G network resources would be consumed by heavy data traffic owing to increased mobility, slicing, and layered/distributed storage system architecture. The problem is elevated when multiple node failures are repaired to address service quality requirements. Typical approaches include individual or cooperative data regeneration to efficiently utilize the available bandwidth. It is observed that storage systems of 5G and beyond technologies shall have a multi–layer architecture in which base stations (BS) would be present. Moreover, communication with each layer would be subject to various communication costs and link constraints. Under limited BS assistance and cooperation, the trade-off between storage per node and communication bandwidth has been established. In this trade–off, two operating points, namely minimum storage, and bandwidth regeneration are particularly important. In this study, we first identify the optimal number of BS use at the minimum storage regeneration point. An explicit code construction is provided subsequently for the exact minimum storage regeneration whereby each layer may help the repair process subject to a communication link constraint.
Citation - WoS: 5
Citation - Scopus: 7
Founsure 1.0: an Erasure Code Library With Efficient Repair and Update Features
(Elsevier, 2021) Arslan, Şuayb Şefik
Founsure is an open-source software library that implements a multi-dimensional graph-based erasure coding entirely based on fast exclusive OR (XOR) logic. Its implementation utilizes compiler optimizations and multi-threading to generate the right assembly code for the given multi-core CPU architecture with vector processing capabilities. Founsure possesses important features that shall find various applications in modern data storage, communication, and networked computer systems, in which the data needs protection against device, hardware, and node failures. As data size reached unprecedented levels, these systems have become hungry for network bandwidth, computational resources, and average consumed power. To address that, the proposed library provides a three-dimensional design space that trades off the computational complexity, coding overhead, and data/node repair bandwidth to meet different requirements of modern distributed data storage and processing systems. Founsure library enables efficient encoding, decoding, repairs/rebuilds, and updates while all the required data storage and computations are distributed across the network nodes.

Browse

Browsing Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection by Institution Author "Arslan, Şuayb Şefik"