Being taught at IIT Bhilai, India in the Winter Semester of 2024.
Course Instructor: Dr. Gagan Raj Gupta
Real-life applications are complex and involve a variety of components (multiple clients, backend servers, databases, ML modules, all connected by a network). We want our applications to be intelligent, adaptive (data-driven), scalable, reliable, and performant. How do we go from the idea to the design to its implementation and successful operationalization?
This course attempts to teach the basic principles underlying system design, implementation, and evaluation of computer systems. It provides an introduction to the fundamentals of analytic modeling techniques that are used in computer system design. Students will also learn general systems concepts that support design goals of modularity, performance, and security. Students will apply materials learned in lectures and readings to design, build and evaluate new systems components.
Objectives:
After completing this class, the students will be able to design their own distributed systems to solve real-world problems. The ability to design one's own distributed system includes an ability to argue for one's design choices.
The students will be able to evaluate and critique existing systems and their own system designs. As part of that, students will learn to recognize design choices made in existing systems.
Learning Outcomes:
The students will be able to apply the technical material taught in the lecture to new system components. This implies an ability to recognize and describe:
• How common design patterns in computer systems—such as abstraction and modularity are used to limit complexity.
• How operating systems use virtualization and abstraction to enforce modularity.
• How reliable, usable distributed systems can be built on top of an unreliable network.
• How to measure system performance and what can we do to improve performance and scalability?
- How to design and deploy ML systems?
Pre-requisites
Undergraduate course in Computer Networks and Operating Systems. Basic courses in data science and ML (DS250, DS200, CS550)
Class Timings and Location
Lecture Room: L102 Lecture Timings: 8:30 a.m. to 9:30 a.m. On Mondays and 9:30 am -10:30 am on Wednesdays and Friday
Course Materials
-
Google Drive Link with Lecture materials for IIT Bhilai Students: GDrive
-
Canvas Link for registered Students (for Assignments and Discussions): Canvas
Textbook/Reference books:
-
[MB] Mor Harchol-Balter, February 2013, Performance Modeling and Design of Computer Systems: Queueing Theory in Action, Cambridge University Press
-
[UDS] Roberto Vitillo, Understanding Distributed Systems, https://understandingdistributed.systems/ : Simplified and easy-to-follow description of essential concepts on a wide range of topics
-
[EDSI] Ville Tuulos, Effective Data Science Infrastructure: How to Make Data Scientists Productive, Manning Publications.
Grading Plan
- 2 Exams: 45%
- Weekly System Design Exercise (In class): 25% [will mimic system design interviews]
- 1 Assignment: 10%
- 1 Project: 20% [Project will include a mock interview]
The assignment is individual. The project will be end-to-end (full stack) in a team. Students are encouraged to build balanced teams [front-end developer, back-end developer, ML engineer, tester, architect roles]
Detailed Schedule
- Week1: Introduction to the course, modularity, building a Campus Service Application, Building Blocks of AWS (reading assignment)
- Week2: Building Reliable and Secure Communications, Networks, API Design
- Week3: Design for Maintainability: testing, tracing, logging, metrics
- Week4: System Design Interview Preparation, High-Level Design, Back of Envelope Calculations, Detailed Design
- Week5: System Scalability and performance analysis basics
- Feb5-9: Rate Limiter (Case Study), Open and Closed Systems
- Week7: Distributed Systems Fundamentals: Process Coordination, Concurrency.
- Consistent Hashing,
- Exam Week: No classes
- Week8: Reliable storage and File Systems, Key Value Store (Case Study)
- Research Papers: After exam 1, we will switch gears and study both classical and recent papers on 3 main topics: Caching, DBs, ML Systems.