GitHub - 8thlight/services-engineering: A reading list for services engineering, with a focus on cloud infrastructure services

8thlight / services-engineering Public

forked from mmcgrana/services-engineering

Notifications You must be signed in to change notification settings
Fork 5
Star 41

A reading list for services engineering, with a focus on cloud infrastructure services

41 stars 308 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
resources.md		resources.md
schedule.md		schedule.md

Repository files navigation

Services Engineering Reading List

A reading list for services engineering, with a focus on cloud infrastructure services.

We welcome suggestions.

Papers

Fault Injection in Production (Allspaw)
Making Reliable Distributed Systems in the Presence of Software Errors (Armstrong)
Highly Available Transactions: Virtues and Limitations (Bailis et al.)
The Incident Command System (Bigley and Roberts)
The Chubby Lock Service for Loosely Coupled Distributed Systems (Burrows)
Bigtable: a Distributed Storage System for Structured Data (Chang et al.)
Spanner: Google’s Globally-Distributed Database (Corbett et al.)
Dynamo: Amazon’s Highly Available Key-Value Store (DeCandia et al.)
MapReduce: Simplified Data Processing on Large Clusters (Dean and Ghemawat)
The Google File System (Ghemawat et al.)
On Designing and Deploying Internet Scale Services (Hamilton)
Kafka: A Distributed Messaging System for Log Processing (Kreps et al.)
Weathering the Unexpected (Krishnan)
The Unified Logging Infrastructure for Data Analytics at Twitter (Lee et al.)
Automatic Management of Partitioned, Replicated Search Services (Leibert et al.)
Learning to Embrace Failure (Limoncelli et al.)
Scaling Big Data Mining Infrastructure: The Twitter Experience (Lin and Rayboy)
Dremel: Interactive Analysis of Web-Scale Datasets (Melnik et al.)
Out of the Tar Pit (Moseley and Marks)
In Search of an Understandable Consensus Algorithm (Ongaro and Ousterhout)
Failure Trends in a Large Disk Drive Population (Pinheiro et al.)
Fallacies of Distributed Computing Explained (Rotem-Gal-Oz)
F1 - The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business (Shute et al.)
Dapper, A Large Scale Distributed Systems Tracing Infrastructure (Sigelman et al.)
Resident Distributed Datasets: a Fault-Tolerant Abstraction for In-Memory Cluster Computing (Zahari et al.)
The Human Side of Postmortems (Zwieback)
Crew Resource Management: a Positive Change for the Fire Service
Architecture of a Database System (Hellerstein et al.)
The Art of the Propagator (Radul and Sussman)

Posts

Resilience Engineering: Part I, Part II (Allspaw)
Systems Engineering: a Great Definition (Allspaw)
Chaos Monkey Released Into The Wild (Bennett and Tseitlin)
Some Rules for Engineering and Operations (Black)
Service Level Disagreements Part I, Part II (Black)
My Philosophy on Alerting (Ewaschuk)
You Can’t Sacrifice Partition Tolerance (Hale)
Customer Trust (Hamilton)
Observations on Errors, Corrections, & Trust of Dependent Systems (Hamilton)
Life Beyond Distributed Transactions: An Apostate’s Opinion (Helland)
Notes on Distributed Systems for Young Bloods (Hodges)
The Network is Reliable (Kingsbury)
The Trouble with Clocks (Kingsbury)
Call Me Maybe: Final Thoughts (Kingsbury)
Getting Real About Distributed Systems Reliability (Kreps)
The Log: What every software engineer should know about real-time data's unifying abstraction (Kreps)
Incident Response at Heroku (McGranaghan)
On HTTP Load Testing (Nottingham)
Observability at Twitter (Watson)
Stevey’s Google Platforms Rant (Yegge)

Presentations

Design, Lessons, and Advice from Building Distributed Systems at Google (Dean)
Service Design Best Practices (Hamilton)

Books

The Field Guide To Understanding Human Error (Dekker)
Agile Retrospectives: Making Good Teams Great (Derby et al.)
Better: A Surgeon’s Notes on Performance (Gawande)
The Checklist Manifesto: How to Get Things Right (Gawande)
High Performance Browswer Networking (Grigorik)
Resilience Engineering in Practice (Hollnagel et al.)
Effective Monitoring and Alerting (Ligus)
Release It!: Design and Deploy Production-Ready Software (Nygard)
The Challenger Launch Decision (Vaughan)
Managing the Unexpected (Weick and Sutcliffe)

Research Groups

Conferences

About

A reading list for services engineering, with a focus on cloud infrastructure services

Custom properties

Report repository

Releases

No releases published

Packages

No packages published