Building the AI Engine for Retail in the New Era

Building the AI Engine for Retail in the New Era

Wednesday, May 22
11:50 AM - 12:30 PM
Marquis Salon 12

Global Market Insights forecasts the retail analytics market to surpass USD 13 billion by 2024. Traditional retailers are facing great competitive threat from online rivals. As a result, the retail industry is moving towards leveraging deep data analytics and AI to revolutionize their decision-making process. As a global leader in e-commerce and technology, Alibaba has been driving the emerging trend called “New Retail”, whose core concept centers around creating a customer experience by unifying online and offline behavior and data-driven operation. As one can imagine, the “New Retail” model creates huge amount of spatial-temporal data (e.g., user behavior, logistic trajectory, transactions). Inside Alibaba Group, TSDB is the backbone service for hosting all these data to enable high-concurrency storage and low-latency query, meanwhile provides intelligent analysis capability using AI and other data science technologies. So far, the TSDB service scales to thousands of physical nodes and deliver peak performance at 80 million operations per second.

In this talk, we focus on sharing the design of the Intelligence Engine on Alibaba TSDB service that enables fast and complex analytics of large-scale retail data. We will also demonstrate our work through a successful case study, where we deploy this system to support the Fresh Hema Supermarket, a major “New Retail” platform operated by Alibaba Group. We will highlight our solutions to the major technical challenges in data cleaning, storage and processing. Handling missing data is a key challenge in retail: For example, a missing store data point on a specific day could be caused by data transmission errors or actual store closure due to distinct reasons such as holidays, renovations, and natural disasters. How to treat such data gaps can profoundly impact the analytics results. The data cleaning module in the TSDB Intelligence Engine runs machine learning algorithms across multiple data sources to accurately diagnose the cause of missing data and automatically performs smart null-filling operations that are aligned with business expectations.

TSDB also performs a multitude of optimizations to enable fast access and computation at runtime. For example, retail analytics applications frequently deal with data aggregations across different product hierarchies, hierarchical geographic organizations, and timelines. With our customized optimization techniques, the pre-aggregation module in TSDB runs concurrent multi-level roll-ups on hundreds of financial sources along different temporal and spatial dimensions.

Another major analytical challenge in retail big data applications is the low signal-to-noise ratio: The net profit margin of leading retailers typically ranges from 1-3%, but the financial KPIs are influenced by numerous micro- and macro-economic factors. TSDB leverages a rich set of advanced time-series feature-extraction algorithms to quantify the true impact of business actions in the sea of noise. We also developed deep learning functions in the Intelligence Engine to automatically detect interesting trends in the real-time data streams and provide actionable insights.

With all the features above, the Intelligence Engine in TSDB provides a full-stack analytics solution to help retail companies identify interesting patterns from the most fine-grained data sources and achieve higher ROI by leveraging detailed closed-loop decision feedback in real time. We believe both technical and business audiences will be able to learn valuable experiences and insights from our success story.

Presentation Video


Sanjian Chen
Staff Algorithm Expert
Alibaba Group
Dr. Sanjian Chen is a data science expert with deep knowledge in scalable machine learning algorithms. He has developed cutting-edge data-driven modeling techniques and autonomous systems in both academic and industry settings. He designed data-analytics solutions that drove numerous high-impact business decisions for multiple Fortune 500 companies across several industries, including retail, banking, automotive, and telecommunications. He is currently working on building cutting-edge cloud-based AI engines for high-performance distributed database systems that support scalable data analytics in multiple business areas. Dr. Chen is a frequent invited speaker at top international conferences, including the Strata Data Conference (San Francisco, London), the IEEE Cyber-Physical Systems Week (Chicago), the IFAC conference on Analysis and Design of Hybrid Systems (Atlanta), and IEEE International Conference on Healthcare Informatics (Philadelphia, Dallas). Dr. Chen received his Ph.D. in Computer and Information Science at the University of Pennsylvania. He received two IEEE Best Paper Awards (IEEE RTSS 2012 and IEEE ISORC 2018). He has published over 25 papers in top journals and conferences, including 2 articles published in the Proceedings of IEEE (IF=9.1). He has served as an invited reviewer for numerous top international journals and conferences, e.g., the IEEE Design & Test, IEEE Transactions on Computers, ACM Transactions on Cyber-Physical Systems, IEEE Transactions on Industrial Electronics, IEEE RTSS conferences, and ACM HSCC conference.