Software engineering practices for the data science and machine learning lifecycle

Software engineering practices for the data science and machine learning lifecycle

Tuesday, June 19
2:00 PM - 2:40 PM
Meeting Room 211A/B/C/D

With the advent of newer frameworks and toolkits, data scientists are now more productive than ever and starting to prove indispensable to enterprises. Typical organizations have large teams of data scientists who build out key analytics assets that are used on a daily basis and an integral part of live transactions. However, there is also quite a lot of chaos and complexities that get introduced because of the state of the industry. Many packages used by data scientists are from open source, and even if they are well curated, there is a growing tendency to pick out the cutting-edge or unstable packages and frameworks to accelerate analytics. Different data scientists may use different versions of runtimes, different Python or R versions, or even different versions of the same packages. Predominantly data scientists work on their laptops and it becomes difficult to reproduce their environments for use by others. Since data science is now a team sport across multiple personas, involving non-practitioners, traditional application developers, execs, and IT operators, how does an enterprise create a platform for productive cross-role collaboration?

Enterprises need a very reliable and repeatable process, especially when it results in something that affects their production environments. They also require a well managed approach that enables the graduation of an asset from development through a testing and staging process to production. Given the pace of businesses nowadays, the process needs to be quite agile and flexible too—even enabling an easy path to reversing a change. Compliance and audit processes require clear lineage and history as well as approval chains.

In the traditional software engineering world, this lifecycle has been well understood and best practices have been followed for ages. But what does it mean when you have non-programmers or users who are not really trained in software engineering philosophies or who perceive all of this as "big process" roadblocks in their daily work ? How do you we engage them in a productive manner and yet support enterprise requirements for reliability, tracking, and a clear continuous integration and delivery practice? The presenters, in this session, will bring up interesting techniques based on their user research, real life customer interviews, and productized best practices. The presenters also invite the audience to share their stories and best practices to make this a lively conversation.


Sriram Srinivasan
Senior Technical Staff Member, Analytics Platform Architect
Sriram is an architect in the IBM Analytics group tasked with delivering modern cloud-native offerings such as Data Science Experience (DSX) in Private Clouds, ensuring reliability, scalability using technologies like Docker & Kubernetes. His current focus is on integrating DSX with Hadoop clusters and enabling Machine/Deep Learning at scale. Prior to this he worked on delivery and operations of data services, such as dashDB on IBM Bluemix. He has years of experience developing enterprise worthy relational database, warehouse, ETL and tools offerings. Sriram started his career at Informix Software where he worked on application server technology, web content management software and database tooling as well as on the Red Brick Data Warehouse suite.