How Instagram Reels manages reliability | Jack Li (Instagram, Shopify)

Engineering Enablement by Abi Noda - En podkast av DX

Kategorier:

Jack Li explains how his production engineering team rolled out a new incident review process, how they’ve made the case for investing in reliability, and specific tools his team has built to improve reliability.—Discussion points:(1:25) How Jack became interested in reliability (3:24) Where the Instagram Reels team fits into the broader organization(4:05) What Jack’s team focuses on(4:55) The role of production engineering at Instagram versus Shopify (8:32) The essence of DevOps(10:44) Pros and cons of having product-focused teams(13:35) How Jack’s team defines and tracks quality(15:46) Signals the team monitors outside of systems (18:10) Revamping Instagram Reel’s incident management process(19:46) Making the case for improving the incident review process(28:10) How their incident review process works(31:55) The roles involved in an incident review (33:40) The value of having incident reviews(35:55) Why leaders should be part of incident reviews (38:34) Why Jack’s team builds tools for driving reliability goals(40:06) The types of tools Jack’s team focuses on (43:09) What a merge queue is and why it was built at Shopify(51:20) Using a Slack bot for ‘failed build’ alerts(52:32) When a company should consider implementing a merge queue—Mentions and links: Follow Jack on LinkedIn Jack’s article from his time on Shopify about their Merge QueueJack’s talk on Shopify’s Merge Queue at GitHub Universe 2019

Visit the podcast's native language site