reliability

3 posts
Issue20

How we write an incident postmortem

Make postmortem a team effort, led by the on-call engineer and compile a timeline of the incident. Keep the conversation free of blame.…

Issue18

How Shopify manages petabyte-scale MySQL backup and restore

Use incremental snapshots of data for backup after one initial full snapshot to reduce both storage and recovery times. Save on storage costs by deleting all but the last two copies for recovery purposes.…

Issue17

Site reliability engineering best practices for data pipelines

Reduce hot-spotting by balancing out the workload across resources. Utilize autoscaling. Adhere to strict access control for privacy, security and data integrity.…