PM3: Architecting a Big Data Platform

What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and real-time analytical workloads.

By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including:

Acquisition: from internal and external data sources
Ingestion: offline and real-time processing
Storage
Providing data services: exposing data to applications
Analytics: batch and interactive
Data management: data security, lineage, metadata and quality

We’ll give also advice on:

Tool selection
The function of the major Hadoop components and other big data technologies such as Spark and Kafka
hardware sizing and cloud provisioning
integration with legacy systems

Founder of the pioneering data conference, O'Reilly Strata, Edd is a respected voice in the worlds of data, open source and the web. Bringing together deep technical know-how with market understanding, Edd makes sense of information technology and its trajectory.

A leading expert on big data architecture and Hadoop, Stephen brings over 20 years of experience creating scalable, high-availability, data and applications solutions. A veteran of WalmartLabs, Sun and Yahoo!, Stephen leads data architecture and infrastructure.

An Apache Cassandra committer and PMC member, Gary specializes in building distributed systems. Recent experience includes creating an open source high-volume metrics processing pipeline and building out several geographically distributed API services in the cloud.