Meetups are a great way to connect with like-minded individuals face to face. Hortonworks and local community groups will host several Meetups the night prior to Summit. Join us for a social and networking hour, a presentation, Q&A with the presenter, and more socializing and networking to follow.

Machine Learning and IT Meetup


Mix, Mingle and Learn from IBM featured speakers on such topics as IBM’s Data Science Experience,  Scalable TensorFlow Deep Learning as a Service with Docker and OpenPOWER with GPUs, and more.

First class GPU support for big-data apps on your Apache Hadoop YARN clusters – Vinod Vavilapalli & Wangda Tan, Hortonworks
GPUs are increasingly becoming a key tool for many big data applications. Applications like deep-learning / machine learning, data analytics, Genome Sequencing, etc. all rely on GPUs for tractable performance.  In many cases, GPUs can get 10x speed ups. And in some reported cases, GPUs can get up to 300x speed ups. Many modern deep-learning applications directly build on top of GPU libraries like cuDNN (CUDA Deep Neural Network library). It’s not a stretch to say that many applications like deep-learning cannot live without GPU support.  By adding first class support (including configuration/discovery/scheduling/isolation) for GPUs in Apache Hadoop YARN, applications running on YARN are finally able to leverage the capability of GPUs in the shared cluster. This talk covers the details of how we add GPU support to YARN and how application developers can use this new feature and how cluster administrators can facilitate elastic sharing of these powerful devices.

Getting Started with TensorFlow Deep Learning Training on OpenPOWER – Andrei Yurkevich, CTO, Altoros
The disruptive power of applications using cognitive models is enormous, bringing both unprecedented value to humanity as well as open questions and cause for concern.  This presentation will explain the popularity of TensorFlow, a powerful Deep Learning framework and arguably the most popular, including how it works, where it fits, and what to look out for.  I’ll demonstrate how to train a TensorFlow model and how IBM Power Systems with OpenPOWER architecture make TensorFlow models even more powerful.

Improving Data Scientist Productivity with Data Science Experience – Patrick Pitre
Data Science is often hampered by the inability of data scientists to collaborate on a shared code base.  In this demonstration, I will discuss the use of composable data services and a collaborative development space to increase the speed to market of analytics using IBM’s Data Science Experience and IBM Bluemix.

Hors d’oeuvres and beverages will be served. We look forward to seeing you there!

View Details

Latest update to Apache Spark & Zeppelin, Data Science in the Cloud & Security

Update on latest Apache Spark & Zeppelin in HDP (Vinay Shukla)

Data Science with Spark and Zeppelin in the Cloud (AWS) (Robert Hryniewicz)

Apache Zeppelin deep dive. (Yanbo Liang, Apache Spark Committer)

Security with Spark SQL and Ranger with Hive LLAP integration. (Dong Hyun)

Q & A – also an opportunity for feedback and feature requests

View Details

Agile Data Science 2.0: Agile and Iterative Machine Learning

Our next Cognitive Computing Meetup will be a free pre-event for the 2017 Data Works Summit.  Our speaker, Russell Jurney, will be presenting on Agile and Iterative Machine Learning from a Data Science perspective.

We hope that you will come see his presentation and join in the discussion.

A number of other meetups and pre-event activities will be taking place in the convention center, so this should be a good networking opportunity.


Agile Data Science 2.0 (O’Reilly 2017) defines a methodology and a software stack with which to apply the methods. *The methodology* seeks to deliver data products in short sprints by going meta and putting the focus on the applied research process itself. *The stack* is but an example of one meeting the requirements that it be utterly scalable and utterly efficient in use by application developers as well as data engineers. It includes everything needed to build a full-blown predictive system: Apache Spark, Apache Kafka, Apache Incubating Airflow, MongoDB, ElasticSearch, Apache Parquet, Python/Flask, JQuery. This talk will cover the full lifecycle of large data application development and will show how to use lessons from agile software engineering to apply data science using this full-stack to build better analytics applications. The entire lifecycle of big data application development is discussed. The system starts with plumbing, moving on to data tables, charts and search, through interactive reports, and building towards predictions in both batch and realtime (and defining the role for both), the deployment of predictive systems and how to iteratively improve predictions that prove valuable by building an experimental setup.

Speaker Bio

Russell Jurney is principal consultant at Data Syndrome, a product analytics consultancy dedicated to advancing the adoption of the development methodology Agile Data Science, as outlined in the book Agile Data Science 2.0 (O’Reilly, 2017). He has worked as a data scientist building data products for over a decade, starting in interactive web visualization and then moving towards full-stack data products, machine learning and artificial intelligence at companies such as Ning, LinkedIn, Hortonworks and Relato. He is a self taught visualization software engineer, data engineer, data scientist, writer and most recently, he’s becoming a teacher. In addition to helping companies build analytics products, Data Syndrome offers live and video training courses.

View Details

[San Jose][DataWorks Summit] Reinforcement Learning, Tensorflow, OpenAI Universe

Talk 0:  Meetup Announcements and Updates
(Chris Fregly, Research Engineer @ PipelineIO)

More details coming soon…

Talk 1:  Hortonworks

More details coming soon…

Talk 2:  Title/Abstract Coming Soon
(Don Dini, Data Scientist AT&T Innovation Labs)

More details coming soon…

Talk 3:  Building Pong with Reinforcement Learning
(Francesco Mosconi, PhD and Data Scientist @ Catalit)

More details coming soon…

View Details

Model Framework to Unify Blockchain and Deep Learning @ Hadoop Summit 2017

Theme of this event : 

Unifying “Deep Learning”,  “Blockchain” and “Decentralized Autonomous Applications”.  There are attempts for elaborating a “model framework” on blockchain technologies, including:

(1) application of existing oversight regimes to financial applications and other IOT based applications involving blockchain;

(2) smart contracts or self-executing transactions and interactions between humans and machines or between multiple entities which are automatically enforced by the underlying code of technology involving deep learning; and

(3) decentralized autonomous organizations that offer new forms of participatory governance and activities involving IOT based systems.

Mathematical model and rules can integrate Deep Learning and Blockchain technology.

View Details

56th Bay Area Hadoop User Group (HUG) Meetup – DataWorks / Hadoop Summit Special



6:00 – 6:30 – Network and Socialize

6:30 – 7:00 – Apache Hadoop 3 The Road Ahead

7:00 – 7:30 – Apache Hadoop YARN Containerization

7:30 – 8:00 – Apache Hadoop HDFS Storage Optimization


Session 1 (6:30 – 7:00 PM) – Apache Hadoop 3 The Road Ahead

To be announced

Speaker To be announced

Session 2 (7:00 – 7:30 PM) – Apache Hadoop YARN Containerization

To be announced

Speaker To be announced

Session 3 (7:30 – 8:00 PM) – Apache Hadoop HDFS Storage Optimization

To be announced

Speaker To be announced

View Details

Hackers vs Big Data Cybersecurity

The Yahoo breach took 2 years to investigate and report. The Google Docs phishing attack spread like wildfire but once you’re infected how do you know what’s been compromised?

Big data has a major role to play in modern cybersecurity, to make it possible for security personnel to detect and investigate costly threats to enterprise in reasonable time frames.

Join us to hear from Apache Metron Committers James Sirota and Casey Stella who will share the latest development in new top level project Apache Metron.


• Why Cybersecurity Needs Big Data

• Anatomy of a Phishing Attack

• Intro to Apache Metron

• Demo of Apache Metron

• Q&A

View Details

From Automotive to Connected World Big Data @ DataWorks / Hadoop Summit


Michael Ger, General Manager for Industrial Manufacturing and Automotive Solutions at Hortonworks

Chris Gambino, Solution Engineer at Hortonworks

Steve Crumb, Executive Director, GENIVI Alliance


As Connected Car momentum continues to build, the art-of-the-possible is being defined by extending the connected car into broader cross-industry “Connected World” initiatives including Smart Cities and Smart Homes.  Data processing and analysis plays a key role in the connected car.  In this rapidly changing environment, adherence to Open Source principles can greatly increase agility,lowers cost and is driving innovation.

During this MeetUp we will discuss the main challenges for adopting Apache Open Source technologies in the connected vehicle use cases, including geographically distributed data, proprietary data formats, and enterprise requirements, and how they can be addressed.


Open Source Big Data in Automotive Primer

At this meetup learn how Open Source Apache project like Apache NiFi and Apache Hadoop can support advanced Connected World scenarios and data analytics requirements. We will provide a brief overview of the solution components supporting a connected vehicle architecture and will provide a real world demonstration (created with GENIVI, Jaguar Land Rover and Hortonworks) demonstrating the concept of vehicle “command and control”,  in which data from the vehicle is collected and processed, followed by commands being sent back to the vehicle to take actions.

GENIVI for Smart Car/Smart City Initiative

We will also discuss a real life Smart Car/Smart City initiative underway with the City of Las Vegas, being driven by GENIVI®, a  non-profit industry alliance committed to driving the broad adoption of open source, open technologies for the connected car.


Open Source Data Management for Connected Vehicles                

Demonstration Example (Jaguar RVI)                                                       

GENIVI Introduction City of Las Vegas Case Study                            

Q & A                                                                                                                 

Networking Reception      

Cost: Free

View Details

Latest developments in Apache NiFi and MiNiFi

In this meetup we’ll do a brief intro to NiFi for any new/curious participants then quickly dive into covering all the latest developments in Apache NiFi and MiNiFi and the highly related Registry efforts.  The upcoming Apache NiFi 1.2.0 release includes many powerful new features building on the tremendous momentum the community has provided over the last year.

We’ll highlight key capability areas including

• End to end flow management with MiNiFi and NiFi

• Performance boosts in the core framework and provenance

• Powerful record reader/writer abstraction for high performance event transformation, SQL queries over data streams, and efficient serialization and deserialization to and from systems like Kafka, Hadoop, and others.

• Docker/Swarm/Compose based deployment and clustering and what this leads to for roadmap work.

• The new Registry subproject which will provide centralized management of versioned flows, extensions, and assets.

• And many other important areas.

View Details

Apache Ambari Meetup at Dataworks Summit 2017 San Jose

Hi all, the time has come for another Apache Ambari meetup. This is your chance to meet other contributors & committers, interact with the community, ask questions, and brainstorm on ideas for the project roadmap.

If you would like to participate, please add a comment and email one of the admins (Alejandro Fernandez –, Yusaku Sako –

Meeting Agenda:

• “What’s New in Ambari 3.0?” by Alejandro Fernandez

• “How to do QE Automation for Ambari, lessons learned, tips” by Sunitha Velpula

• “Ambari Metric System + Grafana” by Sid Wagle & Aravindan Vijayan

• “Demo of Ambari High Availability” by Swapan Shridhar

View Details