Data Fabric: Technologies and Applications

Speakers

Brief Outline of the Tutorial

Data Fabric serves as a comprehensive framework that integrates various database system technologies, providing extensive research opportunities in developing end-to-end data management platform based solutions. These platforms have seen significant advancements, including improved middleware, enriched ETL pipelines, generative AI-driven data pipelines, and unified storage and compute. Technologies like data mesh, data lakes, data warehouses, and cloud databases handle diverse data from various sources for analytics. Each plays a unique role, requiring a unified view for efficient management. Data fabric integrates data pipelines seamlessly, improving governance and reducing latency. The data fabric architecture manages data, query, and analytics pipelines by leveraging distributed computing capabilities without moving data to a centralized location, and dynamically routes queries for optimal performance. Understanding the interconnections (technology and applications) among source systems, data fabric, domain, and application is crucial for maximizing data fabric solutions for user applications. In this tutorial, we address this gap by providing a comprehensive lesson on the background technologies of data fabric and the need for metadata to comprehend and develop applications on top of data fabric. We shall give examples of Data Fabric applications and present how metadata plays a directed role in developing these applications. Finally, we present the need for better comprehension of metadata with suitable case scenarios.

Bio

Kamal Karlapalem is a Professor and Head of the Data Science and Analytics Center IIIT-Hyderabad. He had worked for over three decades on specific problems of distributed relational database design, object-oriented database partitioning and allocation, and larger data warehouse design. He introduced the problem of a total redesign of distributed relational databases in 1992 and worked on distributed data systems and their design for the last few decades. Currently, he has been developing conceptual modeling frameworks to support the data fabric solutions. His research interests include database systems, data visualization, data analytics, multi-agent systems, workflows, and electronic contracts.

Radha Krishna is a Professor at the Department of Computer Science and Engineering, National Institute of Technology Warangal. Prior to joining NIT, he worked at Infosys Labs, IDRBT, and National Informatics Centre (Govt. of India), where he was associated with research projects leading to futuristic intelligent systems and analytical solutions. He holds PhDs from Osmania University and IIIT-Hyderabad. His research interests include data mining, big data, machine learning, databases, and e-contracts & services.

Satya Valluri is a Software Engineer at Databricks, USA. He is part of the query optimizer group and focuses on optimizing SQL queries for Spark and Databricks SQL. Previously, he worked at Meta Platforms Inc, USA, where he was involved in managing a highly scalable and distributed database that stores the operational data of Meta. Before Meta, Satya worked in the Query Optimizer group of Oracle. His main areas of interest are query processing and optimization, query execution and manageability, and debuggability of features in DBMS systems. Satya did a postdoctoral fellowship at EPFL, Switzerland, and has a Ph.D. from IIIT, Hyderabad.

Related work