NetApp receives 70 billion data points of telemetry information each day from its customer’s storage systems. This telemetry data contains configuration information, performance counters, and logs. All of this data is processed using multiple Hadoop clusters, and feeds a machine learning pipeline and a data serving infrastructure that produces insights for customers via an application called Active IQ. We describe the evolution of our Hadoop infrastructure from a traditional on-premises architecture to the hybrid cloud, and lessons learned.
We’ll discuss the insights we are able to produce for our customers, and the techniques used. Finally, we describe the data management challenges with our multi-petabyte Hadoop data lake. We solved these problems by building a unified data lake on-premises and using the NetApp Data Fabric to seamlessly connect to public clouds for data science and machine learning compute resources.
Architecting a truly hybrid cloud implementation allowed NetApp to free up our data scientists to use any software on any cloud, kept the customer log data safe on NetApp Private Storage in Equinix, resulted in faster ability to innovate and release new code and provided flexibility to use any public cloud at the same time with data on NetApp in Equinix.