Traditionally systems have stored and managed their own metadata, just as they traditionally stored and managed their own data. A revolutionary feature of big data tools such as Apache Hadoop and Apache Kafka is the ability to store all data together, where users can bring the tools of their choice to process it.
Apache Hive's metastore can be used to share the metadata in the same way. It is already used by many SQL and SQL-like systems beyond Hive (e.g. Apache Spark, Presto, Apache Impala, and via HCatalog, Apache Pig). As data processing changes from only data in the cluster to include data in streams, the metastore needs to expand and grow to meet these use cases as well. There is work going on in the Hive community to separate out the metastore, so it can continue to serve Hive but also be used by a more diverse set of tools. This talk will discuss that work, with particular focus on adding support for storing schemas for Kafka messages.