Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 2.44 KB

README.md

File metadata and controls

10 lines (6 loc) · 2.44 KB

Big-Data-Platform

Imagine that you finish the course and become an "expert" of "Big Data Platforms" from @CSAalto. You work for a company and one day you get a request to build a big data platform for the company with your team (in this course your team is you, playing different roles). You might get a description like

“Your team has to build a big data platform for X types of data. Data will be generated/collected from N sources. We expect to have 10+ GBs/day of data to be ingested into our platform. We will have to serve K thousands of requests for different types of analytics – to be determined. Our response time should be in t milliseconds. Our services should not be …”

@PS: and things will be added and changed

And you know that big data is characterized many V properties (volume, velocity, variety, varacity, ...) and a platform must be able to facilitate different types of interactions for exchanging data and services, etc. You are faced with different questions related to the development and operation of big data platforms and their big data pipelines: how to design the big data platform which can be resilient, elastic and responsive that allow different customers and applications to be integrated? Which are the data models you have to select? Whether you have to support batch or streaming processing? etc. Also very practical issues like: should you use public cloud infrastructures or build your own. Which cloud companies should you rely? Google, Amazon or Microsoft?. Your story is not centered around a "narrow scope" of big data processing, like taking a lot of data, puting them into Hadoop and running ML algorithms (although it is not easy to achieve the work in such a "narrow scope") but you need to deal with a big picture of many tasks in big data platforms, involved in designs with microservices and serverless, reactive systems patterns, big data storage and database, complex data ingestions, various data processing models and algorithms atop them, to name just a few.

But of course, with a limited time in a 5 credit course, you cannot be the master of all aspects (BTW who could be the master of big data, given the complexitity of the field?). Thus you need to build your platform atop core concepts, practice your tasks with the four assignments, exploring the best skills you have in the big "Big Data Platforms" and let your other team members to work with you to deliver the "Big Data Platform" under your lead. Build your story!