Issue20
How we write an incident postmortem
Make postmortem a team effort, led by the on-call engineer and compile a timeline of the incident. Keep the conversation free of blame.…
Issue18
How Shopify manages petabyte-scale MySQL backup and restore
Use incremental snapshots of data for backup after one initial full snapshot to reduce both storage and recovery times. Save on storage costs by deleting all but the last two copies for recovery purposes.…
Issue17
Site reliability engineering best practices for data pipelines
Reduce hot-spotting by balancing out the workload across resources. Utilize autoscaling. Adhere to strict access control for privacy, security and data integrity.…