Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop Ozone

Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop Ozone

Wednesday, March 20
2:50 PM - 3:30 PM
Room 124-125

All the distributed file systems and NoSQL databases work very well during the normal operation. We can find the big differences if we investigate the behaviour in case of emergency. Data replication strategies and recovery algorithms are the key ingredients of a distributed data storage to save data.

Recent studies proved that the random replication is not the safest choice for storing data as it almost guarantees to lose data in the common scenario of simultaneous node failures. Copyset Replication method significantly reduces the frequency of data loss events with selecting the replica groups in a smart way.

In this talk we will introduce the key elements of a successful data replication and show how advanced data replication strategies could help to survive outages. We will show how Apache Hadoop Ozone solves the problem with advanced techniques and present the challenges of using Copyset algorithm with advanced cluster topology support.

Presentation Video


Márton Elek
Lead Software Engineer
Have more then 15+ years of Java experiences and during theses years worked with allmost all the form of Java solutions from the low-latency multithread application to highly distributed enterprise application as developer, architect and trainer. Currently working with the Apache bigdata projects and created various type of containerized solution for the components of the Hadoop ecosystem. Founder of the first Hungarian Java User group and regular speaker at meetup events and conferences. Committer of Apache Hadoop and Apache Ratis project and working on the Apache Hadoop Ozone project and the dockerization of Apache Hadoop,