in ,

Google and Cloudera are bringing Dataflow to Spark

Google Cloud Dataflow allows developer to create and monitor their data-processing pipelines without the hassle of underlying clusters. Now the company is bringing this technology to Apache’s Spark data-processing engine with help from the people at Cloudera.

Google today announced that it has teamed up with the Hadoop specialists at Cloudera to bring its Cloud Dataflow programming model to Apache’s Spark data processing engine. With Google Cloud Dataflow, developers can create and monitor data processing pipelines without having to worry about the underlying data processing cluster. As Google likes to stress, the service evolved out of the company’s internal tools for processing large datasets at Internet scale. Not all data processing tasks are the same, though, and sometimes you may want to run a task in the cloud or on premise or on different processing engines. With Cloud Dataflow — in its ideal state — data analysts will be able use the same system for creating their pipelines, no matter the underlying architecture they want to run them on.

What do you think?

Avatar of Michio Hasai

Written by Michio Hasai

Michio Hasai is a social strategist and car guy. Find him on Facebook, Twitter, and Pinterest.

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Bosch says we’ll start seeing autonomous vehicles within a decade

Samsung dominated 2014’s smartphone shipments, but it’s shrinking