Data Analytics @ QCRI

2016

  • RDFind: Scalable Condition Inclusion Dependency Discovery in RDF Datasets.
    SIGMOD, San Francisco, USA, 2016.
    S. Kruse, A. Jentzsch, T. Papenbrock, Z. Kaoudi, J. Quiané-Ruiz, F. Naumann.
  • Rheem: Enabling Multi-platform Task Execution
    SIGMOD, San Francisco, USA, 2016 (demo).
    D. Agrawal, L. Ba, L. Berti-Equille, S. Chawla, A. Elmagarmid, H. Hammady, Y. Idris, Z. Kaoudi, Z. Khayyat, S. Kruse, M. Ouzzani, P. Papotti, J. Quiane-Ruiz, N. Tang, M. J. Zaki.
  • Road to Freedom in Data Analytics.
    EDBT, Bordeaux, France, 2016. (vision paper)
    D. Agrawal, S. Chawla, A. Elmagarmid, Z. Kaoudi, M. Ouzzani, P. Papotti, J. Quiané-Ruiz, N. Tang, M. J. Zaki
  • Scaling Up Truth Discovery.
    Tutorial at the International Conference on Data Engineering (ICDE 2016), Helsinki, May 2016.
    Laure Berti-Equille, Javier Borge-Holthoefer.
  • VERA: A Platform for Veracity Estimation over Web Data
    Proceedings of the 25th World Wide Web Conference (WWW 2016), Montréal, Canada, 2016 (demo). 
    Mouhamadou Lamine Ba, Laure Berti-Equille, Kushal Shah, Hossam M. Hammady.
  • Using Twitter to Understand Public Interest in Climate Change: The Case of Qatar
    Proceedings of International Workshop on the Social Web for Environmental and Ecological Monitoring (SWEEM 2016) Workshop, Cologne, Germany, May 2016.
    Sofiane Abbar, Tahar Zanouda, Laure Berti-Equille, Javier Borge-Holthoefer.
  • Veracity of Big Data: Challenges of Cross-Modal Truth Discovery
    ACM Journal of Data and Information Quality (Challenge paper).
    Laure Berti-Equille, Mouhamadou Lamine Ba.
  • Quality of Web data.
    Chapter in the 2nd Edition of the book Data Quality: Concepts, Methodologies and Techniques, Springer, 2016. 
    Monica Scannapieco, Laure Berti-Equille.  

2015

  • Messing up with BART: Error Generation for Evaluating Data Cleaning Algorithms. 
    The 42nd International Conference on Very Large Data Bases (VLDB), New Delhi, India, 2016
    P. Arocena, B. Glavic G. Mecca, R. Miller,  P. Papotti, D. Santoro
  • Veracity of Big Data. From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics. 
    Tutorial of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Melbourne, Australia, October 2015
    Laure Berti-Equille, Javier Borge-Holthoefer
  • Veracity of Big Data. From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics.
    Synthesis Lecture on Data Management, Morgan & Claypool Publishers, December 2015.
    Laure Berti-Equille, Javier Borge-Holthoefer
  • Data Veracity Estimation with Ensembling Truth Discovery Methods.
    Proceedings of 2015 IEEE Big Data Conference Workshop on Data Quality Issues, Santa Clara, US, Nov. 2015.
    Laure Berti-Equille
  • A quality-Aware Spatial Data Warehouse for Querying Hydroecological Data.
    Computers & Geosciences, Volume 85, Part A, December 2015, Pages 126–135
    L. Berrahou, N. Lalande, E. Serrano, G. Molla, L. Berti-Equille, S. Bimonte, S. Bringay, F. Cernesson , C. Grac, D. Ienco, F. Le Ber, M. Teisseire.
  • A  Masking Index for Quantifying Hidden Glitches.
    Knowledge and Information Systems, 44(2): 253-277, 2015
    Laure Berti-Equille, Ji Meng Loh, Tamraparni Dasu
  • Towards Principled Data Science Assessment - The Personal Data Science Process (PdsP). 
    Proceedings of the 17th International Conference on Enterprise Information Systems ICEIS 2015, Volume 1, Barcelona, Spain, 27-30 April, 2015.(1) 2015: 374-378. 
    Ismael Caballero, Laure Berti-Equille, Mario Piattini
  • Unsupervised Quantification of Under- and Over-Segmentation for Object-Based Remote Sensing Image Analysis. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8(5):1-10 · May 2015
    Andres Troya-Galvis, Pierre Gancarski, Nicolas Passat, Laure Berti-Equille
  • Learning to Identify Relevant Studies for Systematic Reviews using Random Forest and External Information
    Machine Learning, Springer (To Appear)
    Madian Khabsa , Ahmed Elmagarmid, Ihab Ilyas, Hossam Hammady, and Mourad Ouzzani
  • Similarity Group-by Operators for Multi-dimensional Relational Data
    IEEE Transactions on Knowledge and Data Engineering (TKDE) (To Appear)
    Mingjie Tang, Ruby Y. Tahboub, Walid G. Aref, Mikhail Atallah, Qutaibah Malluhi, Mourad Ouzzani, Yasin. Siva
  • Lightning Fast and Space Efficient Inequality Joins
    The 42nd International Conference on Very Large Data Bases (VLDB), New Delhi, India, 2016
    Zuhair Khayyat, William Lucia, Meghna Singh, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiane-Ruiz, Nan Tang, and Panos Kalnis
  • AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data.
    The 42nd International Conference on Very Large Data Bases (VLDB), New Delhi, India, 2016
    Ahmed M. Aly, Ahmed R. Mahmood, Mohamed S. Hassan, Walid G. Aref, Mourad Ouzzani, Hazem Elmeleegy, and Thamir Qadah
  • Divide Conquer-based Inclusion Dependency Discovery
    The 41st International Conference on Very Large Data Bases (VLDB), Kohala Coast, Hawai‘i USA, 2015
    Thorsten Papenbrock, Sebastian Kruse, Jorge-Arnulfo Quiané-Ruiz, and Felix Naumann
  • KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing (A demo)
    The 41st International Conference on Very Large Data Bases (VLDB), Kohala Coast, Hawai‘i USA, 2015
    Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, andf Yin Ye
  • A Demonstration of AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data (A demo)
    The 41st International Conference on Very Large Data Bases (VLDB), Kohala Coast, Hawai‘i USA, 2015
    Ahmed M. Aly, Ahmed S. Abdelhamid, Ahmed R. Mahmood, Walid G. Aref, Mohamed S. Hassan, Hazem Elmeleegy, Mourad Ouzzani
  • BigDansing: A System for Big Data Cleansing
    ACM SIGMOD Conference on Management of Data (SIGMOD), Melbourne, Australia, 2015
    Zuhair Khayyat, Ihab Ilyas, Alekh Jindal, Samuel Madden, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Si Yin
  • KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing
    ACM SIGMOD Conference on Management of Data (SIGMOD), Melbourne, Australia, 2015
    Xu Chu, John Morcos, Ihab Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, Yin Ye
  • Updating Graph Indices with a One-Pass Algorithm
    ACM SIGMOD Conference on Management of Data (SIGMOD), Melbourne, Australia, 2015
    Dayu Yuan, Prasenjit Mitra, Huiwen Yu, C. Lee Giles
  • DataXFormer: An Interactive Data Transformation Tool (Best Demo Award)
    ACM SIGMOD Conference on Management of Data (SIGMOD), Melbourne, Australia, 2015
    Zia Abedjan, John Morcors, Ihab Ilyas, Mourad Ouzzani, Paolo Papotti, Micheal Stonebraker
  • Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases
    ACM SIGMOD Conference on Management of Data (SIGMOD), Melbourne, Australia, 2015
    Aaron Elmore, Vaibhav Arora, Rebecca Taft, Andrew Pavlo, Divyakant Agrawal, Amr El Abbadi
  • Deep Learning for the Web
    K Jung, BT Zhang, P Mitra
    24th International Conference on World Wide Web, Florence, Italy
  • Abstractive Meeting Summarization Using Dependency Graph Fusion
    Siddhartha Banerjee, Prasenjit Mitra, Kazunari Sugiyama
    24th International Conference on World Wide Web, Florence, Italy
  • Using Subjectivity Analysis to Improve Thread Retrieval in Online Forums
    Prakhar Biyani, Sumit Bhatia, Cornelia Caragea, Prasenjit Mitra
    Advances in Information Retrieval - 37th European Conference on IR Research, ECIR 2015, Vienna, Austria
  • Query-Time Record Linkage and Fusion over Web Databases
    The 31th International Conference on Data Engineering (ICDE), Seoul, Korea, 2015
    El Kindi Rezig, Eduard Dragut, Mourad Ouzzani, Ahmed Elmagarmid
  • CliqueSquare: Flat Plans for Massively Parallel RDF Queries
    The 31th International Conference on Data Engineering (ICDE), Seoul, Korea, 2015
    François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge-Arnulfo Quiané-Ruiz, Stamatis Zampetakis
  • Proof Positive and Negative in Data Cleaning
    The 31th International Conference on Data Engineering (ICDE), Seoul, Korea, 2015
    Matteo Interlandi and Nan Tang
  • CliqueSquare in Action: Flat Plans for Massively Parallel RDF Queries (demo)
    The 31th International Conference on Data Engineering (ICDE), Seoul, Korea, 2015
    Benjamin Djahandideh, François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge-Arnulfo Quiané-Ruiz, Stamatis Zampetakis
  • AllegatorTrack: Visualizing and Explaining Truth Discovery Results from Multisource Data (demo)
    The 31th International Conference on Data Engineering (ICDE), Seoul, Korea, 2015
    Dalia Attia Waguih, Naman Goel, Hossam M. Hammady, Laure Berti-Equille
  • Cost Estimation of Spatial k-Nearest-Neighbor Operators
    18th International Conference on Extending Database Technology (EDBT), Brussels, Belgium, 2015
    Ahmed M. Aly, Walid G. Aref, Mourad Ouzzani
  • Efficient Processing of Hamming-Distance-Based Similarity-Search Queries Over MapReduce
    18th International Conference on Extending Database Technology (EDBT), Brussels, Belgium, 2015
    Mingjie Tang, Yongyang Yu, Walid G. Aref, Qutaibah M. Malluhi, Mourad Ouzzani
  • Approving Updates in Collaborative Databases
    IEEE International Conference on Cloud Engineering (IC2E), Tempe, AZ, USA, 2015
    Khaleel Mershad, Qutaibah M. Malluhi, Mourad Ouzzani, Mingjie Tang, and Walid G. Aref
  • DataXFormer: Leveraging the Web for Semantic Data Transformations
    The 7th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, California, USA, 2015
    Zia Abedjan, John Morcors, Michael Gubanov, Ihab Ilyas, Micheal Stonebraker, Paolo Papotti and Mourad Ouzzani

2014

  • Towards Dependable Data Repairing with Fixing Rules
    ACM SIGMOD Conference on Management of Data (SIGMOD), Snowbird, USA, 2014
    Jiannan Wang and Nan Tang
  • The Similarity-aware Relational Intersect Database Operator (Best Paper Award)
    7th International Conference on Similarity Search and Applications, Los Cabos, Mexico, September 2014
    Wadha Jabir Al Marri, Mingjie Tang, Qutaibah Malluhi, Mourad Ouzzani and Walid Aref
  • Big Data Cleaning
    APWeb 2014 (invited as Distinguished Lecture Series)
    Nan Tang
  • A Masking Index for Quantifying Hidden Glitches (extended version)
    Kownledge and Information Systems, Springer, July 2014.
    Laure Berti-Equille, Ji Meng Loh and Tamraparni Dasu
  • Web Data Quality: Current State and New Challenges
    Int. J. Semantic Web Inf. Syst., 10(2), 1-6 2014
    Amrapali Zaveri, Andrea Maurino, and Laure Berti-Equille
  • Descriptive and Prescriptive Data Cleaning
    ACM SIGMOD Conference on Management of Data (SIGMOD), Snowbird, USA, 2014
    Anup K. Chalamalla, Ihab Ilyas, Mourad Ouzzani, and Paolo Papotti
  • NADEEF/ER: Generic and Interactive Entity Resolution
    ACM SIGMOD Conference on Management of Data (SIGMOD demo), Snowbird, USA, 2014
    Ahmed Elmagarmid, Ihab Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, Nan Tang and Si Yin
  • Interaction between Record Matching and Data Repairing
    ACM Journal of Data and Information Quality (JDIQ).
    Wenfei Fan, Shuai Ma, Nan Tang, and Wenyuan Yu
  • Conflict Resolution with Data Currency and Consistency
    ACM Journal of Data and Information Quality (JDIQ) (invited).
    Wenfei Fan, Shuai Ma, Nan Tang, and Wenyuan Yu
  • Towards Zero-Overhead Static and Adaptive Indexing in Hadoop
    Very Large Data Bases Journal (VLDBJ).
    Stefan Richter, Jorge-Arnulfo Quiané-Ruiz, Stefan Schuh, and Jens Dittrich
  • Discovering Denial Constraints
    The 40th International Conference on Very Large Data Bases (VLDB), Hangzhou, China, 2014
    Xu Chu, Ihab Ilyas, Paolo Papotti
  • Scalable Discovery of Unique Column Combinations
    The 40th International Conference on Very Large Data Bases (VLDB), Hangzhou, China, 2014
    Arvid Heise, Jorge-Arnulfo Quiané-Ruiz, Ziawasch Abdejan, Anja Jentzsch, Felix Naumann
  • iHUB – An Information and Collaborative Management Platform for Life Sciences
    The 23rd International World Wide Web Conference (WWW), Seoul, South Korea, 2014
    David Salt, Mourad Ouzzani, Eduard Dragut, Peter Baker, and Srivathsava Rangarajan
  • RuleMiner: Data Quality Rules Discovery
    The 30th International Conference on Data Engineering (ICDE demo), Chicago, USA, 2014
    Xu Chu, Ihab Ilyas, Paolo Papotti, Yin Ye
  • Detecting Unique Column Combinations on Dynamic Data
    The 30th International Conference on Data Engineering (ICDE), Chicago, USA, 2014
    Ziawasch Abdejan, Jorge-Arnulfo Quiané-Ruiz, Felix Naumann
  • Mapping and Cleaning
    The 30th International Conference on Data Engineering (ICDE), Chicago, USA, 2014
    Floris Geerts, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro
  • IQ-Meter: An Evaluation Tool for Data-Transformation Systems
    The 30th International Conference on Data Engineering (ICDE demo), Chicago, USA, 2014
    Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro
  • JISC: Adaptive Stream Processing Using Just-In-Time State Completion
    17th International Conference on Extending Database Technology (EDBT), Athens, Greece, 2014
    Ahmed M. Aly, Walid G. Aref, Mourad Ouzzani, Hosam M. Mahmoud

2013

  • A Masking Index for Quantifying Hidden Glitches
    IEEE 13th Int. Conference on Data Mining (ICDM), Dallas, TX, USA, December 2013
    Laure Berti-Equille, Ji Meng Loh and Tamraparni Dasu
  • Data Quality Problems beyond Consistency and Deduplication
    In search of elegance in the theory and practice of computation: a Festschrift in honour of Peter Buneman, Edinburgh, UK, 2013. (invited)
    Wenfei Fan, Floris Geerts, Shuai Ma, Nan Tang, and Wenyuan Yu
  • HandsOn DB: Managing Data Dependencies involving Human Actions
    IEEE Transaction on Knowledge and Data Engineering (TKDE). To appear.
    Mohamed Eltabakh, Walid Aref, Ahmed Elmagarmid, Mourad Ouzzani
  • Future Locations Prediction with Uncertain Data The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), Prague, Czech Republic, 2013
    Disheng Qiu, Paolo Papotti, and Lorenzo Blanco.
  • Extraction and Integration of Partially Overlapping Web Sources
    The 39th International Conference on Very Large Data Bases (VLDB), Riva Del Garda, Italy, 2013
    Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, and Paolo Papotti.
  • The Llunatic Data Cleaning Framework
    The 39th International Conference on Very Large Data Bases (VLDB), Riva Del Garda, Italy, 2013
    Floris Geerts, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro.
  • NADEEF: A Generalized Data Cleaning System
    The 39th International Conference on Very Large Data Bases (VLDB demo), Riva Del Garda, Italy, 2013
    Amr Ebaid, Ahmed Elmagarmid, Ihab Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, Nan Tang and Si Yin.
  • Author Disambiguation by Hierarchical Agglomerative Clustering with Adaptive Stopping Criterion
    The 36th Annual ACM SIGIR Conference, Dublin, Ireland, 2013
    Lei Cen, Eduard C. Dragut, Luo Si, and Mourad Ouzzani.
  • NADEEF: A Commodity Data Cleaning System
    ACM SIGMOD Conference on Management of Data (SIGMOD), New York, USA, 2013
    Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed Elmagarmid, Ihab Ilyas, Mourad Ouzzani, and Nan Tang.
  • Don't be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes
    ACM SIGMOD Conference on Management of Data (SIGMOD), New York, USA, 2013
    Mohamed Yakout, Laure Berti-Equille, and Ahmed Elmagarmid
  • Cartilage: Adding Flexibility to the Hadoop Skeleton
    ACM SIGMOD Conference on Management of Data (SIGMOD demo), New York, USA, 2013
    Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, Samuel Madden
  • Elephant, Do not Forget Everything! Efficient Processing of Growing Datasets
    IEEE Conference on Cloud Computing (IEEE Cloud), New York, USA, 2013
    Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, Samuel Madden
  • Introduction to the special issue on data quality
    Information Systems
    Mourad Ouzzani, Paolo Papotti, and Erhard Rahm.
  • Holistic Data Cleaning: Put Violations Into Context
    The 29th International Conference on Data Engineering (ICDE), Brisbane, Australia, 2013
    Xu Chu, Paolo Papotti, and Ihab Ilyas
  • On the Relative Trust between Inconsistent Data and Inaccurate Constraints
    The 29th International Conference on Data Engineering (ICDE), Brisbane, Australia, 2013
    George Beskales, Ihab Ilyas, Lukasz Golab, and Artur Galiullin
  • Inferring Data Currency and Consistency for Conflict Resolution
    The 29th International Conference on Data Engineering (ICDE), Brisbane, Australia, 2013
    WenFei Fan, Floris Geerts, Nan Tang, and Wenyuan Yu.
  • Data Curation at Scale: The Data Tamer System
    The 6th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, California, USA, 2013
    Michael Stonebraker, Daniel Bruckner, Ihab Ilyas, George Beskales, Mitch Cherniack, Stan Zdonik, Alexander Pagan, and Shan Xu
  • WWHow! Freeing Data Storage from Cages
    The 6th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, California, USA, 2013
    Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, Jens Dittrich

2012

  • Atlas: a tool to explore interconnected ionomic, genomic and environmental data
    The 21st ACM International Conference on Information and Knowledge Management (CIKM), 2012
    Eduard C. Dragut, Mourad Ouzzani, Amgad Madkour, Nabeel Mohamed, Peter Baker, and David E. Salt.
  • What is the IQ of your data transformation system?
    The 21st ACM International Conference on Information and Knowledge Management (CIKM), 2012
    Giansalvatore Mecca, Paolo Papotti, Salvatore Raunich, and Donatello Santoro
  • Incremental Detection of Inconsistencies in Distributed Data
    IEEE Transaction on Knowledge and Data Engineering (TKDE): (Special issue: Best Papers of ICDE 2012, invited)
    Wenfei Fan, Jianzhong Li, Nan Tang, and Wenyuan Yu
  • Incremental Detection of Inconsistencies in Distributed Data
    The 28th International Conference on Data Engineering (ICDE), Washington DC, US, 2012
    Wenfei Fan, Jianzhong Li, Nan Tang, and Wenyuan Yu
  • Spatial Queries with Two kNN Predicates
    Proceedings of the VLDB Endowment (PVLDB), 2012
    Ahmed Aly, Walid Aref, and Mourad Ouzzani
  • M3: Stream Processing on Main-Memory MapReduce
    The 28th International Conference on Data Engineering (ICDE Demo), Washington DC, USA, 2012
    Ahmed M. Aly, AsmaaSallam, Bala M. Gnanasekaran, Long-Van Nguyen-Dinh, Walid G. Aref, Mourad Ouzzani, and Arif Ghafoor
  • High-resolution genome-wide scan of genes, gene-networks and cellular systems impacting the yeast ionome
    BMC Genomics, 2012 Nov 14;13(1):623
    Yu D, Danku JM, Baxter I, Kim S, Vatamaniuk OK, Vitek O, Ouzzani M, Salt DE.
  • Interactive web-based breastfeeding monitoring: feasibility, usability, and acceptability
    Journal of Human Lactation.2012 Nov;28(4):468-75.
    Azza Ahmed and Mourad Ouzzani
  • Development and Assessment of an Interactive Web-Based Breastfeeding Monitoring System (LACTOR)
    Maternal and Child Health Journal 2012 Jul 12
    Azza Ahmed and Mourad Ouzzani

2011

  • Semantic Web Services for Web Databases
    Publisher: Springer; 2011 edition (October 22, 2011), ISBN-10: 1461416434
    Mourad Ouzzani and Athman Bouguettaya
  • Guided Data Repair
    Proceedings of the 37th International Conference on Very Large Databases (VLDB), Seattle, Washington, USA, 2011
    Mohamed Yakout, Ahmed K. Elmagarmid, Jennifer Neville , Mourad Ouzzani and Ihab F. Ilyas
  • ACConv – An Access Control Model for Conversational Web Services
    ACM Transactions on the Web. Volume 5 Issue 3, July 2011.
    Federica Paci, Massimo Mecella, Mourad Ouzzani, and Elisa Bertino