How we write an incident postmortem

Make postmortem a team effort, led by the on-call engineer and compile a timeline of the incident. Keep the conversation free of blame.…


How Shopify manages petabyte-scale MySQL backup and restore

Use incremental snapshots of data for backup after one initial full snapshot to reduce both storage and recovery times. Save on storage costs by deleting all but the last two copies for recovery purposes.…


Site reliability engineering best practices for data pipelines

Reduce hot-spotting by balancing out the workload across resources. Utilize autoscaling. Adhere to strict access control for privacy, security and data integrity.…