Current Research

(see here for recent work on these topics)

Coded Private Information Retrieval

Various aspects of user privacy have become an essential feature of real-world communication scenarios. Consider a library of files present in set of servers. The user wants to retrieve one file from the set of servers without revealing the identity of which file is being retrieved. This is called a Private Information Retrieval problem. A naive solution which ensures this demanded-file privacy is to download the entire file library, however this puts a huge load on the communication network. Coded Private Information Retrieval allows us to reduce the communication load on the network by enabling the servers to transmit linearly combined subfiles of the files. A number of open scenarios are there in this area which we desire to study.

Capacity Achieving Codes and Their Applications in 5G and Beyond

Reed-Muller Codes have been recently achieve capacity (the maximum possible rate with vanishingly small probability of error) on Binary-Memoryless-Symmetric Channels. We have recently identified codes called Berman Codes which also achieve capacity in the Binary-Erasure Channel. We are interested to investigate the performance of these codes under various decoding algorithms, under a variety of channels. We also are interested to see whether these codes achieve capacity on the larger class of BMS channels as well. These codes seem to perform well in short-block length communications, which make them possibly useful candidates for applications in 5G communications (and Beyond 5G) in scenarios such as URLLC (Ultra-Reliable Low-Latency Communications) and mMTC (massive Machine Type Communications).

Codes for DNA Storage

The next century is popularly being called as the "Age of biology", for that is one of the most important frontiers that has opened up ever since the discovery of the double helix structure of the DNA molecule. Further scientific developments have now enabled the storage of information in DNA. While DNA storage has the advantage of longetivity and density, current high-fidelity DNA storage techniques are remain extremely cost-ineffective. Error Correcting Codes for DNA storage can be used effectively in conjunction with low-fidelity (and thus low cost) DNA storage techniques in order to still retain the correctness of the original information.

Coded Data Rebalancing

Distributed Data Analytics plaforms consist of two components - (i) Distributed Storage System and (ii) Distributed Computation Engine. In such systems, data is usually stored in a distributed fashion in several nodes with some replication in order to reliable maintain and process the data and also provide for availability of data to multiple connecting clients. "Data skew" in such systems denotes the disparity in the storage size across different nodes. Data skew and reduction in the replication factor is created in the distributed store due to multiple reasons. For instance, nodes in such platforms often get added or removed from the system due to several reasons like node failures (creating a drop in the replication factor of partial data) or adding new servers into the systems respectively (the new node has no data compared to existing nodes). Correcting the data skew and re-instating the replication factor involves communication between the nodes, which incurs a cost. Coded communication, in which we communicate linear combinations of the data symbols, has the potential to reduce the communication load in the system by a multiplicative factor. We call this framework as Coded Data Rebalancing. We are interested to design Coded Data Rebalancing schemes and obtain fundamental limits on the multiplicative gain due to coding, for a variety of distributed settings and storage architectures.

List Decoding:

Generally, error correcting codes are uniquely decodable upto half the code's minimum distance (i.e., the decoder outputs exactly one codeword if it succeeds). List Decoding is a technique by which generalizes the unique decoding to decoding a small list of codewords, but able to decode nearly upto the information theoretic limit number of errors (minimum distance - 1). We are interested in constructing high-rate, high-error correcting capability codes, under a combinatorial list decoding constraint. These have some nice applications in communication systems on networks with restricted adversaries, coded computing, etc.

I am also working with several students in the following areas.

Quantum Error Correction and its applications

Quantum Computation holds promise in terms of improving current computational techniques and algorithms by several fold. However, implementing quantum computation in reality requires quantum error correcting codes (QECC). Several constructions of such QECCs have been studied for a long time. We are interested in advancing these further and also exploring their applications in other interesting communication scenarios like Private Information Retrieval, Network Coding, etc.

Codes for Broadcast Channels

Broadcast channels with a single source transmitting information to multiple receivers are common in today's communication network scenarios. For instance, the same video may be requested over a shared channel by multiple receivers. The receivers often have some local storage which can be used for caching some part of the information locally, so as to offset channel usage during peak times. During the peak times, channel coding can then be used to make channel usage efficient. Coded Caching and Index Coding are active areas of research in problems dealing with channel codes for broadcast channels with receiver caches. These technologies are also being considered as candidate technologies for 5G Communications.

Codes for Communication and Computation in Distributed Computing

Data analytics platforms today handle huge quantities of data and want to perform some computations on them. In order to manage the load while reducing latency of computation, this data is processed in a distributed and parallel method by distributing the data to a number of worker nodes, each of which individually process the data in parallel, and ultimately combining this sub-results into the complete result. Implementing this distributed computation technique in parallel presents many challenges, such as managing the communications between the nodes, handling worker nodes which complete their sub-jobs much slowly than others (such slow nodes are known as straggler nodes). Coding theory has shown promise in overcoming these challenges. Codes for distributed computing (CDCs) would be used to encode the data before splitting it to the worker nodes in such a way that straggler nodes do not affect the final computation latency. Similarly, the communication between the nodes of the cluster would also be coded, in order to exploit the existence of parts of data in the memory of the nodes themselves. This is known as Coded Communication for Distributed Computing. We seek to develop new and effective schemes for these classes of problems in Distributed Computing.