Hide menu

Reading materials

  • Python (refresh)
  • Relational databases (refresh)
    • Ramez Elmasri and Shamkant B Navathe, Fundamentals of Database Systems, 7th edition, 2016: chapters 3-6 and 9, section 7.1.
    • SQL tutorial
  • Parallel processing (recommended reading)
    • C. Lin, L. Snyder: Principles of Parallel Programming. Pearson/Addison Wesley, 2008. 978-0-321-54942.
  • MapReduce and Hadoop (recommended reading)
    • Jeffrey Dean and Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. Proc. OSDI, ACM, 2004. (There is also the journal version in CACM 2008, which is under 'Machine Learning' on this page.)
    • Apache Hadoop: https://hadoop.apache.org
    • Donald Miner and Adam Shook: MapReduce Design Patterns. O'Reilly, 2012.
  • Spark (recommended reading)
  • Resource management in big-data clusters (recommended reading)
  • NoSQL data stores and techniques (recommended reading)
  • HDFS
    • (recommended reading) Shvachko et al.: The Hadoop Distributed File System . IEEE MSST 2010, pages 1-10.
    • (optional) White: Hadoop The Definitive Guide, Chapter: The Hadoop Distributed File System. 2011.
  • Dynamo (recommended reading)
  • HBase (recommended reading)
  • Hive and Shark/SparkSQL (recommended reading)
  • Machine learning (recommended reading)



  • Page responsible: BDA
    Last updated: 2017-03-15