Monday, November 2, 2015
01:00 PM - 04:15 PM
What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and real-time analytical workloads.
By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including:
- Acquisition: from internal and external data sources
- Ingestion: offline and real-time processing
- Providing data services: exposing data to applications
- Analytics: batch and interactive
- Data management: data security, lineage, metadata and quality
We’ll give also advice on:
- Tool selection
- The function of the major Hadoop components and other big data technologies such as Spark and Kafka
- hardware sizing and cloud provisioning
- integration with legacy systems
Founder of the pioneering data conference, O'Reilly Strata, Edd is a respected voice in the worlds of data, open source and the web. Bringing together deep technical know-how with market understanding, Edd makes sense of information technology and its trajectory.
A leading expert on big data architecture and Hadoop, Stephen brings over 20 years of experience creating scalable, high-availability, data and applications solutions. A veteran of WalmartLabs, Sun and Yahoo!, Stephen leads data architecture and infrastructure.
An Apache Cassandra committer and PMC member, Gary specializes in building distributed systems. Recent experience includes creating an open source high-volume metrics processing pipeline and building out several geographically distributed API services in the cloud.