Bilgisayar Mühendisliği Bölümü Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.11779/1940

Browse

Now showing 1 - 20 of 49

Citation - WoS: 5
Citation - Scopus: 8
A Data-Assisted Reliability Model for Carrier-Assisted Cold Data Storage Systems
(Elsevier, 2020) Arslan, Şuayb Şefik; Göker, Turguy; Peng, James
Cold data storage systems are used to allow long term digital preservation for institutions’ archive. The common functionality among cold and warm/hot data storage is that the data is stored on some physical medium for read-back at a later time. However in cold storage, write and read operations are not necessarily done in the same exact geographical location. Hence, a third party assistance is typically utilized to bring together the medium and the drive. On the other hand, the reliability modeling of such a decomposed system poses few challenges that do not necessarily exist in other warm/hot storage alternatives such as fault detection and absence of the carrier, all totaling up to the data unavailability issues. In this paper, we propose a generalized non-homogenous Markov model that encompasses the aging of the carriers in order to address the requirements of today's cold data storage systems in which the data is encoded and spread across multiple nodes for the long-term data retention. We have derived useful lower/upper bounds on the overall system availability. Furthermore, the collected field data is used to estimate parameters of a Weibull distribution to accurately predict the lifetime of the carriers in an example scale-out setting.
Citation - WoS: 3
Citation - Scopus: 3
A Joint Dedupe-Fountain Coded Archival Storage
(IEEE, 2017) Arslan, Şuayb Şefik; Göker, Turguy; Wideman, Rod
An erasure-coded archival file storage system is presented using a chunk-based deduplication mechanism and fountain codes for space/time efficient operation. Unlike traditional archival storage, this proposal considers the deduplication operation together with correction coding in order to provide a reliable storage solution. The building blocks of deduplication and fountain coding processes are judiciously interleaved to present two novel ideas, reducing memory footprint with weaker hashing and dealing with the increased collisions using correction coding, and applying unequal error protection to deduplicated chunks for increased availability. The combination of these two novel ideas made the performance of the proposed system stand out. For example, it is shown to outperform one of the replication-based as well as RAID data protection schemes. The proposed system also addresses some of the fundamental challenges of today's low-cost deduplicated data storage systems such as hash collisions, disk bottleneck and RAM overflow problems, securing savings up to 90% regular RAM use.
A New Benchmark Dataset for P300 Erp-Based Bci Applications
(Academic Press Inc Elsevier Science, 2023) Çakar, Tuna; Özkan, Hüseyin; Musellim, Serkan; Arslan, Suayb S.; Yağan, Mehmet; Alp, Nihan
Because of its non-invasive nature, one of the most commonly used event-related potentials in brain -computer interface (BCI) system designs is the P300 electroencephalogram (EEG) signal. The fact that the P300 response can easily be stimulated and measured is particularly important for participants with severe motor disabilities. In order to train and test P300-based BCI speller systems in more realistic high-speed settings, there is a pressing need for a large and challenging benchmark dataset. Various datasets already exist in the literature but most of them are not publicly available, and they either have a limited number of participants or utilize relatively long stimulus duration (SD) and inter-stimulus intervals (ISI). They are also typically based on a 36 target (6 x 6) character matrix. The use of long ISI, in particular, not only reduces the speed and the information transfer rates (ITRs) but also oversimplifies the P300 detection. This leaves a limited challenge to state-of-the-art machine learning and signal processing algorithms. In fact, near-perfect P300 classification accuracies are reported with the existing datasets. Therefore, one certainly needs a large-scale dataset with challenging settings to fully exploit the recent advancements in algorithm design (machine learning and signal processing) and achieve high-performance speller results. To this end, in this article we introduce a new freely-and publicly-accessible P300 dataset obtained using 32-channel EEG, in the hope that it will lead to new research findings and eventually more efficient BCI designs. The introduced dataset comprises 18 participants performing a 40 -target (5 x 8) cued-spelling task, with reduced SD (66.6 ms) and ISI (33.3 ms) for fast spelling. We have also processed, analyzed, and character-classified the introduced dataset and we presented the accuracy and ITR results as a benchmark. The introduced dataset and the codes of our experiments are publicly accessible at https://data .mendeley.com /datasets /vyczny2r4w.(c) 2023 Elsevier Inc. All rights reserved.
Citation - WoS: 6
Citation - Scopus: 7
A Reliability Model for Dependent and Distributed Mds Disk Array Units
(IEEE Transactions on Reliability, 2018) Arslan, Şuayb Şefik
Archiving and systematic backup of large digital data generates a quick demand for multi-petabyte scale storage systems. As drive capacities continue to grow beyond the few terabytes range to address the demands of today’s cloud, the likelihood of having multiple/simultaneous disk failures became a reality. Among the main factors causing catastrophic system failures, correlated disk failures and the network bandwidth are reported to be the two common source of performance degradation. The emerging trend is to use efficient/sophisticated erasure codes (EC) equipped with multiple parities and efficient repairs in order to meet the reliability/bandwidth requirements. It is known that mean time to failure and repair rates reported by the disk manufacturers cannot capture life-cycle patterns of distributed storage systems. In this study, we develop failure models based on generalized Markov chains that can accurately capture correlated performance degradations with multiparity protection schemes based on modern maximum distance separable EC. Furthermore, we use the proposed model in a distributed storage scenario to quantify two example use cases: Primarily, the common sense that adding more parity disks are only meaningful if we have a decent decorrelation between the failure domains of storage systems and the reliability of generic multiple single-dimensional EC protected storage systems.
Citation - Scopus: 2
A Visualization Platfom for Disk Failure Analysis
(IEEE, 2018) Arslan, Şuayb Şefik; Yiğit, İbrahim Onuralp; Zeydan, Engin
It has become a norm rather than an exception to observe multiple disks malfunctioning or whole disk failures in places like big data centers where thousands of drives operate simultaneously. Data that resides on these devices is typically protected by replication or erasure coding for long-term durable storage. However, to be able to optimize data protection methods, real life disk failure trends need to be modeled. Modelling helps us build insights while in the design phase and properly optimize protection methods for a given application. In this study, we developed a visualization platform in light of disk failure data provided by BackBlaze, and extracted useful statistical information such as failure rate and model-based time to failure distributions. Finally, simple modeling is performed for disk failure predictions to alarm and take necessary system-wide precautions.
Citation - WoS: 1
Citation - Scopus: 1
Adaptive Boosting of Dnn Ensembles for Brain-Computer Interface Spellers
(IEEE, 2021) Çatak, Yiğit; Aksoy, Can; Özkan, Hüseyin; Güney, Osman Berke; Koç, Emirhan; Arslan, Şuayb Şefik
Steady-state visual evoked potentials (SSVEP) are commonly used in brain computer interface (BCI) applications such as spelling systems, due to their advantages over other paradigms. In this study, we develop a method for SSVEP-based BCI speller systems, using a known deep neural network (DNN), which includes transfer and ensemble learning techniques. We test performance of our method on publicly available benchmark and BETA datasets with leave-one-subject-out procedure. Our method consists of two stages. In the first stage, a global DNN is trained using data from all subjects except one subject that is excluded for testing. In the second stage, the global model is fine-tuned to each subject whose data are used in the training. Combining the responses of trained DNNs with different weights for each test subject, rather than an equal weight, provide better performance as brain signals may differ significantly between individuals. To this end, weights of DNNs are learnt with SAMME algorithm with using data belonging to the test subject. Our method significantly outperforms canonical correlation analysis (CCA) and filter bank canonical correlation analysis (FBCCA) methods.
Adaptive Erasure Codes
(2017) Arslan, Şuayb Şefik; Göker, Turguy
Methods, apparatus, and other embodiments associated with adaptive use of erasure codes for distributed data storage systems are described. One example method includes accessing a message, where the message has a message size, selecting an encoding strategy as a function of the message size, data storage device failure statistics, data storage device wear periods, data storage space constraints, or overhead constraints, and where the encoding strategy includes an erasure code approach, generating an encoded message using the encoding strategy, generating an encoded block, where the encoded block includes the encoded mes sage and metadata associated with the message, and storing the encoded block in the data storage system. Example methods and apparatus may employ Reed Solomon erasure codes or Fountain erasure codes. Example methods and apparatus may display to a user the storage capacity and durability of the data storage system.
Citation - WoS: 13
Citation - Scopus: 21
Advancements in Distributed Ledger Technology for Internet of Things
(Elsevier, 2020) Jurdak, Raja; Arslan, Şuayb Şefik; Krishnamachari, Bhaskar; Jelitto, Jens
Internet of Things (IoT) is paving the way for different kinds of devices to be connected and properly communicated at a mass scale. However, conventional mechanisms used to sustain security and privacy cannot be directly applied to IoT whose topology is increasingly becoming decentralized. Distributed Ledger Technologies (DLT) on the other hand comprise varying forms of decentralized data structures that provide immutability through cryptographically linking blocks of data. To be able to build reliable, autonomous and trusted IoT platforms, DLT has the potential to provide security, privacy and decentralized operation while adhering to the limitations of IoT devices. The marriage of IoT and DLT technology is not very recent. In fact many projects have been focusing on this interesting combination to address the challenges of smart cities, smart grids, internet of everything and other decentralized applications, most based on blockchain structures. In this special issue, the focus is on the new and broader technical problems associated with the DLT-based security and backend platform solutions for IoT devices and applications.
Citation - WoS: 14
Citation - Scopus: 40
An Overview of Blockchain Technologies: Principles, Opportunities and Challenges
(IEEE, 2018) Arslan, Şuayb Şefik; Mermer, Gültekin Berahan; Zeydan, Engin
Blokzincir, toplumumuzun birbiriyle iletişim kurma ve ticaret yapma biçiminde devrim yapma potansiyeline sahip, yakın zamanda ortaya çıkmış olan bir teknolojidir. Bu teknolojinin sağladığı en önemli avantaj aracı gerektiren bir oluşumda güvenilir bir merkezi kuruma ihtiyaç duymadan değer taşıyan işlemleri değiş tokuş edebilmesidir. Ayrıca, veri bütünlüğü, dahili orijinallik ve kullanıcı şeffaflığı sağlayabilir. Blokzincir, birçok yenilikçi uygulamanın temel alınacağı yeni internet olarak görülebilir. Bu çalışmada, genel çalışma prensibi, oluşan fırsatlar ve ileride karşılaşılabilecek zorlukları içerecek şekilde güncel blokzincir teknolojilerinin genel bir görünümünü sunmaktayız.
Citation - WoS: 3
Citation - Scopus: 3
Array Bp-Xor Codes for Hierarchically Distributed Matrix Multiplication
(IEEE, 2021) Arslan, Şuayb Şefik
A novel fault-tolerant computation technique based on array Belief Propagation (BP)-decodable XOR (BP-XOR) codes is proposed for distributed matrix-matrix multiplication. The proposed scheme is shown to be configurable and suited for modern hierarchical compute architectures such as Graphical Processing Units (GPUs) equipped with multiple nodes, whereby each has many small independent processing units with increased core-to-core communications. The proposed scheme is shown to outperform a few of the well–known earlier strategies in terms of total end-to-end execution time while in presence of slow nodes, called stragglers. This performance advantage is due to the careful design of array codes which distributes the encoding operation over the cluster (slave) nodes at the expense of increased master-slave communication. An interesting trade-off between end-to-end latency and total communication cost is precisely described. In addition, to be able to address an identified problem of scaling stragglers, an asymptotic version of array BP-XOR codes based on projection geometry is proposed at the expense of some computation overhead. A thorough latency analysis is conducted for all schemes to demonstrate that the proposed scheme achieves order-optimal computation in both the sublinear as well as the linear regimes in the size of the computed product from an end-to-end delay perspective.
Artificial Intelligence Augmented Iterative Product Decoding
(2023) Arslan , Şuayb Şefik; Göker, Turguy
A method for product decoding within a data storage system includes receiving data to be decoded within a first decoder; performing a plurality of decoding iterations to decode the data utilizing a first decoder and a second decoder; and outputting fully decoded data based on the performance of the plurality of decoding iterations. Each of the plurality of decoding iterations includes (i) decoding the data with the first decoder operating at a first decoder operational mode to generate once decoded data; (ii) sending the once decoded data from the first decoder to the second decoder; (iii) receiving error information from the first decoder with an artificial intelligence system; (iv) selecting a second decoder operational mode based at least in part on the error information that is received by the artificial intelligence system; and (v) decoding the once decoded data with the second decoder operating at the second decoder operational mode to generate twice decoded data; and outputting fully decoded data based on the performance of the plurality of decoding iterations.
Citation - WoS: 3
Citation - Scopus: 3
Asymptotically Mds Array Bp-Xor Codes
(IEEE, 2018) Arslan, Şuayb Şefik
Belief propagation (BP) on binary erasure channels (BEC) is a low complexity decoding algorithm that allows the recovery of message symbols based on bipartite graph pruning process. Recently, array XOR codes have attracted attention for storage systems due to their burst error recovery performance and easy arithmetic based on Exclusive OR (XOR)-only logic operations. Array BP-XOR codes are a subclass of array XOR codes that can be decoded using BP under BEC. Requiring the capability of BP-decodability in addition to Maximum Distance Separability (MDS) constraint on the code construction process is observed to put an upper bound on the achievable code block-length, which leads to the code construction process to become a hard problem. In this study, we introduce asymptotically MDS array BP-XOR codes that are alternative to exact MDS array BP-XOR codes to allow for easier code constructions while keeping the decoding complexity low with an asymptotically vanishing coding overhead. We finally provide a code construction method that is based on discrete geometry to fulfill the requirements of the class of asymptotically MDS array BP-XOR codes.
Citation - WoS: 2
Citation - Scopus: 1
Average Bandwidth-Cost Vs. Storage Trade-Off for Bs-Assisted Distributed Storage Networks
(IEEE, 2021) Tengiz, Ayse Ceyda; Haytaoğlu, Elif; Pusane, Ali Emre; Arslan, Şuayb Şefik; Pourmandi, Massoud
In this study, we consider a hierarchically structured base station (BS)-assisted cellular system equipped with a backend distributed data storage in which nodes randomly arrive and depart the cell. We numerically motivate and characterize the fundamental trade-off between the average repair bandwidth cost versus storage space where BS communication cost (higher than that of local) and link capacity constraints exist while the number of failed nodes can vary dynamically. We establish the capacity region that is most relevant to 5G and beyond networks, which are layered by design. We hope that this study shall motivate novel regeneration code constructions that will be able to achieve the presented limits.
Citation - WoS: 1
Citation - Scopus: 1
Base Station-Assisted Cooperative Network Coding for Cellular Systems With Link Constraints
(IEEE, 2022) Arslan, Suayb S.; Pourmandi, Massoud; Haytaoglu, Elif
We consider a novel distributed data storage/caching scenario in a cellular network, where multiple nodes may fail/depart simultaneously To meet reliability, we allow cooperative regeneration of lost nodes with the help of base stations allocated in a set of hierarchical layers1. Due to this layered structure, a symbol download from each base station has a different cost, while the link capacities between the nodes of the cellular system and the base stations are also constrained. Under such a setting, we formulate the fundamental trade-off with closed form expressions between repair bandwidth cost and the storage space per node. Particularly, the minimum storage as well as bandwidth cost points are formulated. Finally, we provide an explicit optimal code construction for the minimum storage regeneration point for a special set of system parameters.
Citation - WoS: 5
Citation - Scopus: 7
Cloud2hdd: Large-Scale Hdd Data Analysis on Cloud for Cloud Datacenters
(IEEE, 2020) Zeydan, Engin; Arslan, Şefik Şuayb
The main focus of this paper is to develop a distributed large scale data analysis platform for the opensource data of Backblaze cloud datacenter which consists of operational hard disk drive (HDD) information collected over an observable period of 2272 days (over 74 months). To carefully analyze the intrinsic characteristics of the hard disk behavior, we have exploited a large bolume of data and the benefits of Hadoop ecosystem as our big data processing engine. In other words, we have utilized a special distributed scheme on cloud for cloud HDD data, which is termed as Cloud2HDD. To classify the remaining lifetime of hard disk drives based on health indicators such as in-built S.M.A.R.T (Self-Monitoring, Analysis, and Reporting Technology) features, we used some of the state-of-the-art classification algorithms and compared their accuracy, precision, and recall rates simultaneously. In addition, importance of various S.M.A.R.T. features in predicting the true remaining lifetime of HDDs are identified. For instance, our analysis results indicate that Random Forest Classifier (RFC) can yield up to 94% accuracy with the highest precision and recall at a reasonable time by classifying the remaining lifetime of drives into one of three different classes, namely critical, high and low ideal states in comparison to other classification approaches based on a specific subset of S.M.A.R.T. features.
Comparing Humans and Deep Neural Networks on Face Recognition Under Various Distance and Rotation Viewing Conditions
(Journal of Vision, 2023) Fux, Michal; Arslan , Şuayb Şefik; Jang, Hojin; Boix, Xavier; Cooper, Avi; Groth, Matt J; Sinha, Pawan
Humans possess impressive skills for recognizing faces even when the viewing conditions are challenging, such as long ranges, non-frontal regard, variable lighting, and atmospheric turbulence. We sought to characterize the effects of such viewing conditions on the face recognition performance of humans, and compared the results to those of DNNs. In an online verification task study, we used a 100 identity face database, with images captured at five different distances (2m, 5m, 300m, 650m and 1000m) three pitch values (00 - straight ahead, +/- 30 degrees) and three levels of yaw (00, 45, and 90 degrees). Participants were presented with 175 trials (5 distances x 7 yaw and pitch combinations, with 5 repetitions). Each trial included a query image, from a certain combination of range x yaw x pitch, and five options, all frontal short range (2m) faces. One was of the same identity as the query, and the rest were the most similar identities, chosen according to a DNN-derived similarity matrix. Participants ranked the top three most similar target images to the query image. The collected data reveal the functional relationship between human performance and multiple viewing parameters. Nine state-of-the-art pre-trained DNNs were tested for their face recognition performance on precisely the same stimulus set. Strikingly, DNN performance was significantly diminished by variations in ranges and rotated viewpoints. Even the best-performing network reported below 65% accuracy at the closest distance with a profile view of faces, with results dropping to near chance for longer ranges. The confusion matrices of DNNs were generally consistent across the networks, indicating systematic errors induced by viewing parameters. Taken together, these data not only help characterize human performance as a function of key ecologically important viewing parameters, but also enable a direct comparison of humans and DNNs in this parameter regime
Citation - WoS: 12
Citation - Scopus: 18
Compress-Store on Blockchain: a Decentralized Data Processing and Immutable Storage for Multimedia Streaming
(Springer, 2022) Arslan, Şuayb Şefik; Turguy, Göker; Goker, Turguy
Decentralization for data storage is a challenging problem for blockchain-based solutions as the blocksize plays a key role for scalability. In addition, specific requirements of multimedia data call for various changes in the blockchain technology internals. Considering one of the most popular applications of secure multimedia streaming, i.e., video surveillance, it is not clear how to judiciously encode incentivization, immutability, and compression into a viable ecosystem. In this study, we provide a genuine scheme that achieves this encoding for a video surveillance application. The proposed scheme provides a novel integration of data compression, immutable off-chain data storage using a new consensus protocol namely, Proof-of-WorkStore (PoWS) in order to enable fully useful work to be performed by the miner nodes of the network. The proposed idea is the first step towards achieving greener application of a blockchain-based environment to the video storage business that utilizes system resources efficiently.
Cooperative Network Coding for Distributed Storage Using Base Stations With Link Constraints
(arXiv, 2021) Arslan, Şuayb Şefik; Pourmandi, Massoud; Haytaoğlu, Elif
In this work, we consider a novel distributed data storage/caching scenario in a cellular setting where multiple nodes may fail/depart at the same time. In order to maintain the target reliability, we allow cooperative regeneration of lost nodes with the help of base stations allocated in a set of hierarchical layers. Due to this layered structure, a symbol download from each base station has a different cost, while the link capacities connecting the nodes of the cellular system and the base stations are also limited. In this more practical and general scenario, we present the fundamental trade-off between repair bandwidth cost and the storage space per node. Particularly interesting operating points are the minimum storage as well as bandwidth cost points in this trade-off curve. We provide closed-form expressions for the corresponding bandwidth (cost) and storage space per node for these operating points. Finally, we provide an explicit optimal code construction for the minimum storage regeneration point for a given set of system parameters.
Citation - WoS: 4
Citation - Scopus: 4
Cost of Guessing: Applications To Data Repair
(Institute of Electrical and Electronics Engineers Inc., 2020) Arslan, Şuayb Şefik; Haytaoğlu, Elif
In this paper, we introduce the notion of cost of guessing and provide an optimal strategy for guessing a random variable taking values on a finite set whereby each choice may be associated with a positive finite cost value. Moreover, we drive asymptotically tight upper and lower bounds on the moments of cost of guessing problem. Similar to previous studies on the standard guesswork, established bounds on moments quantify the accumulated cost of guesses required for correctly identifying the unknown choice and are expressed in terms of the Rényi's entropy. A new random variable is introduced to bridge between cost of guessing and the standard guesswork and establish the guessing cost exponent on the moments of the optimal guessing. Furthermore, these bounds are shown to serve quite useful for finding repair latency cost for distributed data storage in which sparse graph codes may be utilized.
Data Deduplication With Adaptive Erasure Code Redundancy (us20160013815a1)
(2016) Arslan, Şuayb Şefik; Wideman, Roderick; Lee, Jaewook; Göker, Turguy
Example apparatus and methods combine erasure coding with data deduplication to simultaneously reduce the overall redundancy in data while increasing the redundancy of unique data. In one embodiment, an efficient representation of a data set is produced by deduplication. The efficient rep resentation reduces duplicate data in the data set. Redundancy is then added back into the data set using erasure coding. The redundancy that is added back in adds protection to the unique data associated with the efficient representation. How much redundancy is added back in and what type of redundancy is added back in may be controlled based on an attribute (e.g., value, reference count, symbol size, number of symbols) of the unique data. Decisions concerning how much and what type of redundancy to add back in may be adapted over time based, for example, on observations of the efficiency of the overall system.

Browse

Browsing Bilgisayar Mühendisliği Bölümü Koleksiyonu by Institution Author "Arslan, Şuayb Şefik"