ANT316: Effective Data Lakes: Challenges and Design Patterns
AWS re:Invent 2018 - En podkast av AWS
Kategorier:
Data lakes are emerging as the most common architecture built in data-driven organizations today. A data lake enables you to store unstructured, semi-structured, or fully-structured raw data as well as processed data for different types of analytics-from dashboards and visualizations to big data processing, real-time analytics, and machine learning. Well-designed data lakes ensure that organizations get the most business value from their data assets. In this session, you learn about the common challenges and patterns for designing an effective data lake on the AWS Cloud, with wisdom distilled from various customer implementations. We walk through patterns to solve data lake challenges, like real-time ingestion, choosing a partitioning strategy, file compaction techniques, database replication to your data lake, handling mutable data, machine learning integration, security patterns, and more.