Sustainability, reliability, and a higher level of data quality with SRE

Related services
SRE Consulting | Data

Industry
Retail

Company
Xebia

The client is a relatively young but major player in big data platforms. The company holds one of the largest clusters of the open-source framework Hadoop globally with over 600 PB of client data. Its Hadoop network allows it to operate with transparency and rapid-pace development. It has a user group of over 3000 internal users who consist of many data scientists and reporting analysts. Data includes booking data and client behavior that play a large part in core business decisions. The company has recently experienced large-scale growth leading to expansion in a number of markets.

Why

Organic growth at a high scale can lead to technical debt and instability

What

Use automation to maintain consistent data flow and reliability

How

Build a scalable, reliable data platform using SRE principles

Alleviate Growth Pains

Rapid expansion and growth in a short amount of time led to a lot of issues around scaling, cost, and stability for the client. This meant new feature releases were slow to come and even the smallest changes could affect each part of the existing business ecosystem. The company came to Xebia for help with streamlining these systems and creating steps to win in product strategy and make better decisions in costs, reliability, and sustainability. To do this, implementing SRE was instrumental. As the site manager explains: “We chose a Site Reliability engineering approach to ensure consistent data flow and platform stability.”

Build a Better Process

The client is truly data-driven, providing a software framework for storage and handling data. All this is done using a large network of servers and machines as its backbone. Big data is processed in large clusters but this existing way of working proved inadequate to handle such tremendous growth. The organization, with input from Xebia’s SRE experts, opted to change this by using SRE tooling and principles. Clusters with a single purpose helped to alleviate the bogged-down effect of having all the data on one platform. Moreover, the plan to tackle the problem heads on meant using SRE principles to neatly guide the organization through the process. This resulted in 45 % fewer costs.

“The choice for single purpose clusters also healed the growth pain of trying to balance out technical debt. We wanted our new architecture to be cutting edge.”

Product Manager

SRE: Less Toil, More Gain

Automation is pivotal to running a software and data-rich client such as this organization. With SRE principles on hand, the client’s team could identify what processes were causing repetitive, unnecessary work, ultimately making that work disappear. The product had to be up and running with data sets continuously being fed into the new platform. According to the product manager: “The team really appreciated the added role of SRE and the positive effects it had on the quality of our services.” SRE provided an additional lens and focus for the team to reach its goals. It also resulted in a more sustainable product with a feedback loop that provides insight and cuts costs.

This customer story is part of Xebia's SRE Consulting portfolio.

Sustainability and reliability: How to achieve a higher level of data quality with SRE

Alleviate Growth Pains

Build a Better Process

SRE: Less Toil, More Gain

Explore related customer stories

Get in touch with us