David Garnitz on VectorFlow - Weaviate Podcast #66!
Weaviate Podcast - En podkast av Weaviate
Kategorier:
Hey everyone! Thank you so much for watching the 66th Weaviate Podcast with David Garnitz, the creator of VectorFlow! VectorFlow (open-sourced on GH and linked below) is a new tool for ingesting data into Vector Databases such as Weaviate! There is quite an interesting End-to-End stack emerging at the ingestion layer, from retrieving data from misc. sources such as Slack, Salesforce, GitHub, Google Drive, Notion, ... to then Chunking the Text (maybe with the use of Visual Document Layout parsers like what Unstructured is imagining), extracting Metadata potentially (say the "age" of an NBA player as in the Evaporate-Code+ research) -- then sending this data off to embedding model inference and unpacking that can of worms from inference acceleration to load balancing, and finally -- importing the vectors themselves to Weaviate! I learned so much from this conversation, I really hope you enjoy listening and please check out VectorFlow below! VectorFlow: https://github.com/dgarnitz/vectorflow Chapters 0:00 VectorFlow on GitHub! 0:52 Welcome David Garnitz! 1:17 Vector Flow, Founding Vision 2:00 Billions of Vectors in Weaviate! 4:20 End-to-end data importing 6:30 Metadata Extraction in Vector Database Flows 10:15 Vectorizing 100s of millions of billions of chunks 15:58 Fine-Tuning Embedding Models 23:50 Zero-Shot Models in Metadata and Chunking 36:36 Vector + SQL 42:45 Self-Driving Databases 49:23 Generative Feedback Loop REST API 51:38 GPT Cache 55:55 Building VectorFlow