John Lewis & Partners
Solving the data pipeline puzzle at the John Lewis Partnership
Empowering data infrastructure through a Paved Road approach
The John Lewis Partnership (JLP) needs to use data creatively to thrive in the competitive world of modern retail. The Partnership Data Platform (PDP) was built to transform the way the Partnership operates, improving reporting and access to data, allowing the retailer to work more directly with agencies, product suppliers, and third parties. This includes maximising the existing investment in world-class data technologies like Snowflake to ultimately improve the shopping experience for customers.
Equal Experts was asked to review the PDP in late 2021; our objective was to understand the shared vision for the PDP and identify what challenges might be creating roadblocks. What we found was an opportunity to create a Paved Road for teams to build and operate their own data pipelines. The Paved Road team was formed in February 2022 with a blend of Partnership and Equal Experts data engineers, data product owners and data management experts.. Within 18 months, the PDP was host to 65 data products, and had generated an estimated £22.1m cost saving.
-
£22.1M
estimated cost saving
-
65
Data Products onboarding and running on the platform
-
18
months to a product mindset
About the client:
The John Lewis Partnership is the UK’s largest employee owned organisation and is home to a mid-premium omni-channel GM retailer - John Lewis and Partners and and premium grocery retailer Waitrose and Partners The Partnership has 23.1 million customers, over 400,000 products, and 74,000 partners (employees).
-
Industry:
Retail
-
Organisation Size:
(74,000 Partners)
-
Location:
UK
-
Service:
Data
-
Length of project
18 months
The challenge: Freeing up time to use data more effectively
Creating a Paved Road enables teams to build and operate their own data pipelines. At that time, even building a simple ingestion pipeline could take 3-6 months, due to complexities in the process and dependencies on multiple teams’ workloads.
Before we did anything, we wanted to get a clear idea of user needs, working firstly with Partners in the data product team within JLP’s Ratings and Reviews domain. The main frustration in their delivery turned out to be building the ingestion pipeline – without an effective ingestion pipeline they couldn’t even begin to consider value activities such as data sharing, transformation, reporting and training machine learning (ML) models.
Snowflake had been selected as the underlying data platform technology in 2020 on the basis of its powerful support for capabilities that were perfectly aligned with JLP’s growing data needs. Equal Experts and the Partnership worked collaboratively to leverage its capabilities to really empower data consumers.
Solution: A Paved Road to self-serving teams
The Paved Road team’s first goal was to provide a repeatable and re-runnable process for building data ingestion pipelines that would support teams to manage their data products. The intention was to enable domain teams to self-serve, providing capabilities that automated the path through these dependencies. As the work began to take shape, a JLP Data Engineer realised that our initial mix of ideas could be brought together neatly in the concept of a Data Product Definition.
The idea was to encode the operational needs of a data pipeline – e.g. service accounts, secrets, storage buckets, schemas and tables – and run this through a build and deploy pipeline to create and enable those resources.
Once data is ingested into the Snowflake platform using the assured process of a data pipeline deployed via the Paved Road, the Partnership team can leverage its capabilities, including:
● Built-in Security
● Automated Maintenance and Management
Creating a Data Product Definition also provided engineering tools such as source code repositories and CI/CD pipelines, enabling teams to manage their data products more efficiently. Collectively, we eliminated the need for hand-offs to other teams (and the time that takes) by automating some of the more complicated processes of building a data product. JLP data engineers could now focus more on higher-value data engineering, and less on building underlying infrastructure, allowing data product teams to self-serve.
The whole process was managed by code and repeatable. The picture below is an early visualisation of how this would work:
The paved road concept quickly became a product and has since gone through a number of iterations from creating and managing database objects to automating complex security processes like key rotation and reducing the dependency on orchestating pipelines.
Adoption from other PDP engineering teams was considerable and the team experienced a step change in both the number of Data Engineers engaging with them, and the number of data products being created on the Paved Road.
Result: New products in hours, not weeks
The team tested the Data Product Definition concept by using it to drive the creation of cloud storage buckets. These buckets were commonplace across the PDP and often form stages in an ingestion pipeline.
Take up was initially slow, but sometimes people will use your product in ways you hadn’t anticipated, and soon the team started getting requests from engineers who simply wanted to use the Paved Road pipeline to create repositories. Where previously these would have taken days of hand-offs, tickets and service level agreements to sort out, Paved Road tooling had it done in minutes.
The improvement in lead times wasn’t the only benefit of the Data Product Definition; it also enabled some standardisation in how repositories and other PDP concerns were managed plus, using Paved Road tooling, the desired security controls were built in!
With the support of our consultants, Partners on the Paved Road team have been able to showcase the creation of a real data product to the business in under 20 minutes. This included the creation of an event driven pipeline pushing data to Snowflake tables via an SFTP based data source and through the Partnership’s data security stages – together with repositories and CI/CD deployment pipelines to manage the data product. None of these resources existed before creating the Data Product Definition.
Conclusion
Before the Paved Road product was launched, building a data pipeline needed a standing team, an internal and external platform team, and a security and data supplier team as a minimum. Now, JLP can build new products in hours rather than weeks, reducing average lead times and saving around £0.5m for each data product deployed to production.
As of August 2024, there are 90 data products on the Paved Road, and its capabilities have enabled teams to launch over 150 data pipelines. The work has also provided a pathway for legacy pipelines running on legacy applications to move to newer, more supported versions of tooling. In doing so, they also enabled a shift to a You Build It You Run It model and a mindset shift to treating data as a product. Along with the Paved Road, this has given the Partnership the foundation they needed for sustainable innovation through data engineering.
A JLP Partner said of the work:
“Paved Road has massively accelerated the process of ingesting data into the Partnership Data Platform. Data Engineers used to spend most of their time getting raw data into the platform. Now as a tenant team on Paved Road our focus has shifted to supporting our users achieving insights from the data, while also increasing our overall productivity.”
A JLP PartnerWant to know more?
Are you interested in this project? Or do you have one just like it? Get in touch. We'd love to tell you more about it.