Array Bp-Xor Codes for Hierarchically Distributed Matrix Multiplication

Loading...
Thumbnail Image

Date

2021

Authors

Arslan, Şuayb Şefik

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Open Access Color

BRONZE

Green Open Access

Yes

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

A novel fault-tolerant computation technique based on array Belief Propagation (BP)-decodable XOR (BP-XOR) codes is proposed for distributed matrix-matrix multiplication. The proposed scheme is shown to be configurable and suited for modern hierarchical compute architectures such as Graphical Processing Units (GPUs) equipped with multiple nodes, whereby each has many small independent processing units with increased core-to-core communications. The proposed scheme is shown to outperform a few of the well–known earlier strategies in terms of total end-to-end execution time while in presence of slow nodes, called stragglers. This performance advantage is due to the careful design of array codes which distributes the encoding operation over the cluster (slave) nodes at the expense of increased master-slave communication. An interesting trade-off between end-to-end latency and total communication cost is precisely described. In addition, to be able to address an identified problem of scaling stragglers, an asymptotic version of array BP-XOR codes based on projection geometry is proposed at the expense of some computation overhead. A thorough latency analysis is conducted for all schemes to demonstrate that the proposed scheme achieves order-optimal computation in both the sublinear as well as the linear regimes in the size of the computed product from an end-to-end delay perspective.

Description

Keywords

Decoding, Codes, Complexity theory, Arrays, Encoding, Task analysis, Iterative decoding, FOS: Computer and information sciences, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Information Theory, Information Theory (cs.IT), Distributed, Parallel, and Cluster Computing (cs.DC)

Turkish CoHE Thesis Center URL

Fields of Science

0202 electrical engineering, electronic engineering, information engineering, 0102 computer and information sciences, 02 engineering and technology, 01 natural sciences

Citation

Arslan, S. S. (02 December 2021). Array BP-XOR Codes for Hierarchically Distributed Matrix Multiplication. IEEE Transactions on Information Theory, pp. 1–17. https://doi.org/10.1109/tit.2021.3132043 ‌ ‌

WoS Q

Q1

Scopus Q

Q2
OpenCitations Logo
OpenCitations Citation Count
2

Source

IEEE Transactions on Information Theory

Volume

68

Issue

Start Page

1

End Page

17
PlumX Metrics
Citations

Scopus : 3

Captures

Mendeley Readers : 1

SCOPUS™ Citations

3

checked on Feb 03, 2026

Web of Science™ Citations

3

checked on Feb 03, 2026

Page Views

258

checked on Feb 03, 2026

Downloads

27

checked on Feb 03, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.49689527
Altmetrics Badge

Sustainable Development Goals

SDG data is not available