Python is one of the most popular programming languages for advanced analytics, data science, machine learning, and deep learning. One of Python’s greatest assets is its extensive set of libraries, such as Numpy, Pandas, Scikit-learn, Theano, TensorFlow, Keras, and so on. Apache Spark is becoming the core component for big data processing and playing important role to help data scientists solve complicated problems. It has a great significance and strong demand to integrate Spark with the extremely rich Python ecosystems to handle challenges in artificial intelligence. In the latest Spark 2.3, some very exciting features were put in, for example: vectorized UDF in PySpark, which leverages Apache Arrow to provide high performance interoperability between Spark and Pandas/Numpy; Image format in dataFrame/dataset, which can improve Spark and TensorFlow (or other deep learning libraries) interoperability; high-efficiency parallel modeling tuning with Spark MLlib, etc. In this talk, we'll share best practice on real use cases and hands-on experiences to illustrate the power of these new features and bring more discussions on this topic.