Description
This hands-on course teaches the tools & methods used by data scientists, from researching solutions to scaling up prototypes to Spark clusters. It exposes the students to the entire data science pipeline, from data acquisition to extracting valuable insights applied to real-world problems.
Questions
Questions and discussions about the course are gathered on mattermost: https://mattermost-dslab.epfl.ch
Virtual Machine
Lab Sessions
Week 1 - 21.02.2018 - Module 1 - Python for data scientists 1/4
Week 2 - 28.02.2018 - Module 1 - Python for data scientists 2/4
Week 3 - 07.03.2018 - Module 1 - Python for data scientists 3/4
Week 4 - 14.03.2018 - Module 1 - Python for data scientists 4/4
Week 5 - 21.03.2018 - Module 2 - Distributed computing with Hadoop 1/2
Week 6 - 28.03.2018 - Module 2 - Distributed computing with Hadoop 2/2
- Slides: week 6
- Solutions to last week’s exercises: solutions (Right click and copy the url to import it into Zeppelin)
- Setup instructions: Instructions
Week 7 - 11.04.2018 - Module 3 - Distributed processing with Apache Spark 1/3
Week 8 - 18.04.2018 - Module 3 - Distributed processing with Apache Spark 2/3
Week 9 - 25.04.2018 - Module 3 - Distributed processing with Apache Spark 3/3
- Solutions to last week’s exercises: solutions
- Homework 3: repository - start by reading the README
Week 10 - 02.05.2018 - Module 4 - Real-time data acquisition and processing 1/2
Week 11 - 09.05.2018 - Module 4 - Real-time data acquisition and processing 2/2
- Solutions to last week’s exercises: solutions
- Homework 4: repository - start by reading the README
Week 12 - 16.05.2018 - Module 5 - Final Project 1/3
Week 13 - 23.05.2018 - Module 5 - Final Project 2/3
Week 14 - 30.05.2018 - Module 5 - Final Project 3/3