John McBride: How to Build Your Own AI Infrastucture with Kubernetes

ConTejas Code - En podkast av Tejas Kumar - Mandager

Prøv Podimo gratis i hele 60! dager!

I Podimo finner du eksklusive podkaster og bestselgende lydbøker tilpasset dine ører

Kategorier:

Links- Codecrafters (sponsor): https://tej.as/codecrafters- OpenSauced blog post: https://opensauced.pizza/blog/how-we-saved-thousands-of-dollars-deploying-low-cost-open-source-ai-technologies- John on X: https://x.com/johncodezzz- Tejas on X: https://x.com/tejaskumar_SummaryJohn McBride discusses his experience deploying open-source AI technologies at scale with Kubernetes. He shares insights on building AI-enabled applications and the challenges of managing large-scale data engineering.The conversation focuses on the use of Kubernetes as a platform for running compute and the decision to use TimeScaleDB for storing time-series data and vectors. McBride also highlights the importance of data-intensive applications and recommends the book 'Designing Data-Intensive Applications' for further reading.The conversation discusses the process of migrating from OpenAI to an open-source large language model (LLM) inference engine. The decision to switch to an open-source LLM was driven by the need for cost optimization and the desire to have more control over the infrastructure. VLLM was chosen as the inference engine due to its compatibility with the OpenAI API and its performance. The migration process involved deploying Kubernetes, setting up node groups with GPUs, running VLLM pods, and using a Kubernetes service for load balancing.The conversation emphasizes the importance of choosing the right level of abstraction and understanding the trade-offs involved.Takeaways1. Building AI-enabled applications requires good mass-scale data engineering.2. Kubernetes is an excellent platform for servicing large-scale applications.3. TimeScaleDB, built on top of Postgres, is a suitable choice for storing time-series data and vectors.4. The book 'Designing Data-Intensive Applications' is recommended for understanding data-intensive application development.5. Choosing the right level of abstraction is important, and it depends on factors such as expertise, time constraints, and specific requirements.6. The use of Kubernetes can be complex and expensive, and it may not be necessary for all startups.7. The decision to adopt Kubernetes should consider the scale and needs of the company, as well as the operational burden it may bring.Chapters00:00 John McBride03:05 Introduction and Background07:24 Summary of the Blog Post12:15 The Role of Kubernetes in AI-Enabled Applications16:10 The Use of TimeScaleDB for Storing Time-Series Data and Vectors35:37 Migrating to an Open-Source LLM Inference Engine47:35 Deploying Kubernetes and Setting Up Node Groups55:14 Choosing VLLM as the Inference Engine1:02:21 The Migration Process: Deploying Kubernetes and Setting Up Node Groups1:08:02 Choosing the Right Level of Abstraction1:24:12 Challenges in Evaluating Language Model Performance1:31:41 Considerations for Adopting Kubernetes in Startups Hosted on Acast. See acast.com/privacy for more information.

Visit the podcast's native language site