Sam Bean, Zain Hasan, and John Trengrove on You.com and Spark
Weaviate Podcast - En podkast av Weaviate
Kategorier:
Weaviate Podcast #32. Thank you so much for watching the Weaviate podcast! We are super excited to host Sam Bean from You.com! As well as welcome Zain Hasan and John Trengrove to the Weaviate podcast for the first time! Sam begins by describing You.com, and then we dive into the Weaviate Spark Connector that Sam played a massive role in creating. I thought this was such a masterclass in the Spark big data technology; John, Sam, and Zain are all data engineering pros and I've never learned more about a new technology from a podcast than this one. I really hope you enjoy listening to it, please let us know any questions or ideas you have. Also, please see Zain's blog post on "The Details Behind the Sphere Dataset in Weaviate" - https://weaviate.io/blog/2022/12/deta.... This provides great detail on exactly how to use the Spark connector in Weaviate! In this case for a billion-scale dataset upload!!! Chapters 0:00 Thanks for Watching! 0:18 Welcome Zain and John 0:28 Welcome Sam Bean, You.com! 1:48 Search Interface and Search Apps / Widgets 3:40 Searching through Specific Websites 4:00 Origin Story of You.com 6:53 How did you come across Weaviate? 8:33 Text, Image, Audio Search 10:28 What do you use Spark for? 14:20 Datasets used with Weaviate 16:14 Creating a Spark Connector to Weaviate 21:05 Adding Streaming support 22:50 Vectorizing Data at You.com 27:15 More on ONNX + Spark 29:52 Performance Questions, Spark + Weaviate 34:35 Parquet for HuggingFace Dataset Files 34:54 What is Parquet? Spark Pushdown Filters Explained 39:04 Similar to HDF5? 39:45 Spark for Extracting Ranking Features 43:25 Hybrid Search 46:48 Collecting Search Relevance Data 51:07 Thank you for watching! Thanks Sam, Zain, and John!