Disaster Recovery Experience at CA-CIB: Hardening Hadoop for Critical Financial Applications


Authored by Mohamed Mehdi Ben Aissa, Credit Agricole Group Infrastructure Platform, and Abdelkrim Hadjidj, Cloudera

Introduction: Big Data at CA-CIB & CA-GIP

Nowadays, information represents the main source of competitiveness and income for businesses. Indeed, having the most reliable information as soon as it’s available means you are ahead of your competitors. However, this comes with a whole set of challenges. Data volume is increasing exponentially and managing its scale is a complex problem. Enforcing a robust data governance to safely use the data while maintaining the usability is a key success factor for the use cases. Finally, rapidly and efficiently analyzing data to transform it into insights and unique competitive advantage requires advanced business and technical skills.

For these reasons, Crédit Agricole CIB (CA-CIB), the Corporate and Investment Banking arm of the Crédit Agricole Group, the world’s n°13 bank measured by Tier One Capital, and now part of Crédit Agricole Group Infrastructure Platform (CA-GIP), leverage the most effective data analytics platforms and data management tools to make its information systems efficient, consistent and up to date. This enables CACIB to take the best decisions, in a timely fashion, in a financial context where the competition is increasingly harsh. The main use cases deployed on top of these platforms are described below.

Use Cases

Risk Management & Regulation

Before Big Data integration in CA-CIB Information Systems, existing Risk Management Platforms had already shown their benefits and allowed tackling the significant complexity of market risks indicators calculation. However, these systems showed some limitations when it comes to handling data that are categorized as “Big Data”. The legacy infrastructure doesn’t scale well with the steady and continuous data volume increase. This infrastructure only supports vertical scalability which makes the maximum capacity quite limited, and the costs very expensive. In addition, new compliance and regulation that require advanced and complex analytics has been introduced. The two main examples of these regulations are the BCBS239 principles (data dictionary, normalization, data lineage, audit trail, KPI, etc.) and the FRTB regulation (Fundamental Review of the Trading Book).

In 2015, CA-GIP implemented a new scalable Big Data Risk Management Platform based on Hortonworks Data Platform (HDP). This platform provides a high level of scalability, performance, data integrity and quality. It made possible the rationalization of the entire risk ecosystem and deprecation of obsolete systems which led to the harmonization of pricing libraries and the optimization of operational and decisional processes and costs.

Cash Management

For several years, CA-CIB has been working on a Cash Management International Solutions servicing both International Trade and Transaction Banking product lines. In this context, the CMT project (Cash Management Transformation) focuses on the transformation of the operational model of cash management, covering IT, processes and organization. The core of the Cash Management Platform is based on transactional backend that needs a stream processing engine that supports low latency and millions of transactions per day.

Since 2017, in collaboration with Hortonworks Solution Engineer and support team, CA-CIB began the implementation of a modern Cash Management Platform based on streaming technologies such as Kafka, Storm and Spark Streaming. Today, and thanks to the Hortonworks team, CA-GIP provides a full scalable transactional platform that enables CA-CIB to improve and expand its cash management business.

And …

These two use cases are only the beginning of the Big Data success story at CA-GIP and CA-CIB. The story continues today with new projects that will meet new functional and technical needs: security, finances management, modernization of Business Intelligence Platforms, Log Management, Security Monitoring, etc.

DRP & Stretch Cluster at CA-CIB

Because the platform is hosting many critical use cases, it becomes a critical platforms to ensure business continuity. In addition to that, the financial context remains complex and critical and add several production constraints such as security, integrity and resiliency across multiple data centers. One of the big challenges during the design and the implementation of our Big Data Service Offer was to ensure that all applications respect a high level of SLAs (service-level agreement) in terms of RPO (Recovery Point Objective) and RTO (Recovery Time Objective).

The Recovery Point Objective describes the maximum acceptable amount of data loss measured in time. In DR scenarios, it determines the minimum frequency at which data should be replicated between different data centers. In our context, data loss is not tolerated, and the majority of our applications requires an RPO of zero. Hence a synchronous replication between data centers is a must.

There are several Disaster Recovery strategies that can be used with Hortonworks Big Data platform. Each strategy has its own pros/cons and is suitable for a certain class of SLAs. Dual Ingest and Mirroring require two clusters deployed in two different data centers. On the one hand, dual ingest consists in having data ingested and processed by both clusters at the same time. These clusters can be used in active/active or active/passive scenarios. dual ingest can achieve very low RPO/RTO but requires important infrastructure investment and advanced DevOps methodology that can handle cluster drifts. On the other hand, mirroring uses the two clusters in active/passive mode. Data is ingested in the active cluster, then replicated to the passive cluster at a given frequency using tools like DistCP, DLM, BDR or NiFi. This strategy is easy and inexpensive to implement. However, the minimum RPO it guarantees is higher than other technics because of the asynchronous nature of the replication.

Stretch clusters are fundamentally different from other multi-datacenter scenarios. Only one cluster is deployed across multiple data-centers. As a result, a mirroring process keeping the two data centers in sync becomes irrelevant. Data replication is managed via the native synchronous replication mechanism of the HDP technologies (HDFS, Kafka, HBase, etc.) to keep cluster nodes in the two data-centers in sync. The advantage of this architecture is that both datacenters and all servers in the cluster are used. There is no waste of resources like the multi-cluster architectures. However, this architecture involves advanced configuration and data placement features to make sure each data point has replicas in multiple datacenters. In addition, this architecture requires a physical infrastructure that not all companies can provide:

  • Deploy across three (3) data centers in order to implement the quorum pattern and avoid the split-brain scenario
  • The three data centers should be within a single geographical region (typically a distance between 10km and 50km). Network latency tends to be higher and less predictable across geographic regions. It is not recommended to have clusters spanning across geographic regions.
  • Network: A high bandwidth and low latency between sites (should not exceed 10ms). 10 Gbps switches with sufficient port density to accommodate cross-site links as well as redundant links connecting each site are highly recommended.
  • Network throughput between sites will depend on how many nodes you have at each data center. Oversubscription ratios up to 4:1 are generally fine for balanced workloads.
  • Network monitoring is needed to ensure bandwidth is not the bottleneck for Hadoop. Real-world DRP tests should be performed regularly.


To deal with new regulations and custumer needs, the Hortonworks Data Platform was the ideal solution in order to maintain a consistent view of data across different regions, systems and infrastructures while coping with the production constraints: Security, DRP, Consistency, CI/CD, etc.

If you have critical applications and would like to achieve high SLA on your Big Data platform, join us for the next DataWorks Summit in Barcelona. We will hold a talk that will cover the above items in greater details. We will present the different architecture blueprints for disaster recovery as well as their corresponding SLA objectives. Then, we will focus on the stretch cluster and discuss the advantages, drawbacks and the impact of this approach on the global architecture. Finally, we will explain concretely in detail how to configure and deploy this solution and how to integrate each layer (storage layer, processing layer…) into the architecture.


Since January 1st 2019, Crédit Agricole Group Infrastructure Platform (CA-GIP) gathers more than 1500 employees who previously belonged to the IT Production of Crédit Agricole Corporate & Investment Banking, Crédit Agricole Assurances, Crédit Agricole Technology & Services and SILCA which is the Crédit Agricole S.A. group’s shared IT production unit. CA-GIP is a skills center that provides a wide range of service offers to Crédit Agricole group entities (especially CA-CIB in our context): management of production infrastructure, IT outsourcing, cloud solutions, Big Data, Security Operations and User Services (workstations, collaborative tools, etc.).


Crédit Agricole CIB (CA-CIB) is the Corporate and Investment Banking arm of the Crédit Agricole Group, the world’s n°13 bank measured by Tier One Capital (The Banker, July 2017), CA-CIB offers its clients a large range of products and services in capital markets, investment banking, structured finance and corporate banking in large international markets through its network with a presence in major countries in Europe, America, Asia Pacific and the Middle East.

Leave a Comment

Your email address will not be published.





Thank you, your newletter signup has been successfully submitted.

Get Social, Stay Connected!