Hot Water Leak Detection Using Variational Autoencoder Model

Hot Water Leak Detection Using Variational Autoencoder Model

Wednesday, May 22
2:50 PM - 3:30 PM
Marquis Salon 9

At the Southern California Gas Company, we need to distinguish between various types of abnormal gas usage. SoCalGas has built predictive models to detect such instances. For example, we need to identify water leaks from gas leaks and respond appropriately.

SoCalGas has very large datasets collected from almost six million residential smart meters in our region. The data collected for analyses and modeling includes gas consumption, temperature, service operation times, meter location etc. Via customer pattern analyses data, SoCalGas attempts to identify the different patterns of water leaks.

However, to build predictive models, requires very large sets of accurately labeled and cleaned data. There are a practical resource limitations to a manual labeling process, especially with consumption data that can be noisy. Another modeling challenge is that the meters, which are the data sources, are from different families and in different geographical locations.

For these reasons, we propose the use of a Variational Autoencoder (VAE) model as a semi-supervised learning method. With a certain amount of labeled data, we can train a much larger data set a) without bias from partial data and b) reduce the noise in the larger dataset. This approach has allowed us to differentiate and predict abnormal consumption patterns due to water leak.

The VAE is known as a generative model. The VAE model is and upgraded architecture of a regular autoencoder by replacing the usual deterministic function Q with a probabilistic function q((z|x)). A VAE model learns soft ellipsoidal regions in latent space by effectively force filling the gaps where labels are missing. The missing labels are able to be filled via this method when we apply the VAE model to the data. Also, the VAE model is encoded to the latent space and decoded to reconstruct the data. In this process, just like autoencoder, any noise is reduced.

In experiments, we compared VAE results with our consumption data analytics pattern recognition results and also with autoencoder results. This model showed efficient prediction of missing labels and prediction of properties of various areas. Depending on different ratio of labeled data and non-labeled data, the performance and accuracy of model were affected.


Jay Kim
Senior Data Scientist
Highly efficient and results-oriented data scientist with strong quantitative skills, development experience and strong education background with a MSc (Imperial College London (World Rank within Top 10 QS)). Responsible self-starter with demonstrated experience in statistical programming language (R, Python, SAS) and programming language python for API’s. High ability holder on visualization with tools such as Tableau as well as good understanding of relational database such as SQL and oracle and non-relational database such as hbase, mongoDB and redis. Machine learning tools such as Hadoop, Spark, H2O, sparkling-water, pysparkling, SAS etc. as well as deep learning tools such as Keras, Tensorflow, Theano, MXnet, PyTorch. GPU cuda programming. Scaling data science. Expert in Predictive Modeling such as XGBoost, regression, Logit, Probit, GBM, RandomForest, Neural Network (generative model, GAN, VAE, RNN, CNN, word2vec etc.) , Naive Bays, K-nearest learn, PCA etc. (supervised learning, unsupervised learning, semi-supervised learning , reinforcement learning etc.) and also probabilistic modeling (PyMC3, Edward, Pyro) such as MCMC, HMC, NUTS, bayesian linear regression, variational models etc, Data mining skills such as parsing, nlp (natural language processing) and proficient in language modeling such as topic model, text clustering, word embedding, Word2Vec, Glove, text classification, RNN, Convolutional RNN etc. familiar with all the development environment such as Hadoop, Cloud (AWS, GCP, Azure) , GPU, Spark. etc. Strong communication and relationship-building skills with diverse parties; fluent in English and Korean