Data Analytics @ QCRI

Data Analytics @ QCRI

Institutions and industries are dealing with large scale, heterogeneous data collected from large number of sources. The main challenge is a judicious use of the information within and across organizations to make informed decisions and to run operations effectively. The Data Analytics group at QCRI has built expertise focused on four core data analytics themes that will enable the effective use of this growing asset class.


Data Curation - Going beyond traditional ETL approaches, we are investigating various new directions, including: handling unstructured data; interleaving extraction, integration, and cleaning tasks in a more dynamic and interactive process that responds to evolving data sets and real-time decision-making constraints; and leveraging the power of human cycles to solve hard problems such as data cleaning and information integration.


Data Fusion - In dealing with Big Data, users are faced with different data sources that provide either conflicting information or mutually reinforcing information. In addition, users need to go back to the sources to really understand and use the data. We are working on issues related to continuous data source profiling, active data fusion, adaptive data fusion, and explanatory data lineage.


Measurement, Analysis, Discovery, & Visualization - In this theme, we study data  as a first class object to develop “generic” methods and frameworks across various fields. We are addressing issues in generic data mining and machine learning techniques that are efficient and scalable. 


Big Data Infrastructure - To address the scalability requirement of big data analytics, a myriad of distributed data processing platforms have appeared.  While these platforms are addressing many key challenges, they present several limitations that highlight the need of a new infrastructure. We are building an infrastructure that would allow users to express any of their data analysis tasks with minimal effort as well as  allow developers or expert users to fine-tune the implementation of these tasks  to improve performance. 

See some projects and demos from our group and browse through our latest publications »