With HDFS and HBase, there are two different storage options available in the Hadoop ecosystem. Both have their strengths and weaknesses. However, neither HDFS nor HBase can be used universally for all kinds of workloads. Usually this leads to complex hybrid architectures. Kudu is a very versatile storage layer which fills this gap and simplifies the architecture of Big Data systems.
A large German bank is using Kudu as storage layer to fasten their credit processes. Within this system, financial transactions of millions of customers are analysed by Spark jobs to categorize transactions and to calculate key figures. In addition to this analytical workload, several frontend applications are using the Kudu Java API to perform random reads and writes in real-time.
The presentation will cover these topics:
- Business and technical requirements
- Data access patterns
- System architecture
- Kudu data modelling
- Kudu architecture for High Availability
- Experiences from development and operations