Our Thinking, Tech Focus Fri 9th July, 2021
Event driven architecture: the good, the bad, and the ugly
Using the Western cinematic epic to understand and explore event driven architecture.
In Sergio Leoni’s 1966 epic, ‘The Good, the Bad and the Ugly’, three cowboys navigate a series of dramatic challenges before unearthing a bounty of gold. It might seem like a stretch at first, but this strikes as a strangely fitting framework for event driven architecture in business today.
Why?
Firstly, if you’re yet to embrace event driven architecture, you’re like the cowboy characters. You’re already caught in a series of dramatic challenges. You just don’t know it yet.
Whatever your sector, your customers demand real-time, personally relevant interactions. From banking to retail, everyone wants ‘instant’: updates on order fulfillment and product delivery, stock and inventory levels, transaction information, and more. If you’re unable to provide easy, immediate and meaningful updates—across complex customer journeys for internal and external stakeholders alike—you’re on the cusp of becoming irrelevant. And investing more in cumbersome and outdated infrastructure like siloed ERP or monolithic systems—or pouring even more data into data lakes without making changes to the way you use that information—is simply not the answer.
In today’s customer experience, batched processing isn’t enough. You need data you can use in the moment.
If this resonates, you’re in luck. Remember, there’s a bounty of gold at the end of this story.
Event driven architecture is the new gold standard for leading enterprise
Event driven architecture is being embraced by the planet’s most successful enterprises, business leaders and technologists. Currently, 80% of the Fortune 100 use event streaming to power their business via Kafka. Just a few weeks ago, Confluent—a cloud data streaming company closely linked with event driven architecture—listed on the Nasdaq, with shares jumping 25% after IPO for a valuation of $9.1 billion USD. If ‘data is the new oil’, there’s clearly a gold rush for events.
Unfortunately, for many CEOs and CTOs, the value of event driven architectures remains buried; obscured under mounds of perceived risk, caution, and general misconception. It doesn’t need to be this way.
So, to help you find the gold, leverage the significant benefits of event driven architectures, and allay initial concerns, let’s shine a light on:
- The good: some of the key benefits you stand to gain through event driven architecture
- The bad: common preconceived risks, and strategies to mitigate them for your organisation
- The (potentially) ugly: common pitfalls to understand and avoid when approaching event driven architectures
Event Driven Architecture: the good.
What are the benefits?
Developing and deploying new services is simple and efficient.
In an event driven architecture, individualised services are configured to perform one hyper-specific task; everything is largely decoupled.
Microservices are configured to consume events and perform singular actions without interdependency. Adding an additional service—or series of services—to your existing architecture is quick, easy and effective. In many cases you can simply configure the new service to consume an existing stream of events for a new business requirement. The result? Newer features and differentiators for your end customers, faster. And greater flexibility and speed of implementation internally.
Reduce latency and redundancy.
Let’s look at an example from our work with one of Australia’s pioneering digital banks. In serving transaction information to customers via a mobile app, we configured a read-only view of every customers’ balance and transaction history. This is provisioned via a microservice called ‘account view’ and updated in real-time based on events occurring at a customer account level.
The master data is managed in the core banking platform, rather than the ‘account view’ service. However, by reproducing the information and storing it locally in ‘account view’, we return near real-time transactional information back to the customer with lightning-fast response times. The alternative? Querying the core banking platform, slowly retrieving all the information and eventually surfacing it for view.
From a redundancy perspective, the local read-only view of account information ensures customers can still access recent transaction history and details, even when services can’t connect with the core banking platform. This is just one of many possible examples.
Infinite scalability for robust and flexible services.
In event driven architecture, multiple microservices can consume the same ‘type’ of events. For example, sending an SMS every time a customer receives a deposit. Imagine ten million customers receive deposits on a daily basis. One service, continuously hit to send messages at this scale, will inevitably struggle to perform its function.
However you could easily deploy 50 identical microservices that can all work independently but perform the same function, and ‘listen’ for the same type of ‘send SMS’ events. Whenever a service reads a ‘send SMS’ event, it is marked as ‘read’, so there will be no duplication. All 50 services now actively listen for the next opportunity to consume a ‘send SMS’ event and perform the function. That’s seamless continuity of customer experience, at scale.
Decoupled and decentralized services can be ‘best-of-breed’ for their unique need.
Twenty years ago, it made sense to have everything in one place. Historically, dealing with a monolith of code, it was beneficial to provide immediate access to everything the monolith required to function. This negated the significant challenges associated with network-based communication. It removed the difficulty of querying external systems to work out interfacing, or the sourcing and processing of data.
Now, with smaller, more independent services in the cloud, decentralization becomes a more natural approach. Services can validly be distributed across clouds, or even organisations. Why struggle to create and configure one enormous monolith that attempts to do everything? A better approach is to develop services that perform one extremely specific function, then configure them with the infrastructure required to serve the business in the best way possible. This improves performance of services at an individual level, while reducing chains of interdependencies across the board. And reduced dependency means less risk for your organisation overall.
Extensibility: both of data and platforms.
This is particularly useful for any organisation using a data pipeline to capture events, without yet embracing event driven architecture. Stored events can be easily replayed for evolving business needs; a new marketing promotion or feature for banking customers who have historically deposited $5,000 each month for the past 12 months, for example. Historical events can quickly spin up multiple views of customer behaviors and profiles, creating crucial value for the future.
From a system perspective, events make it easy to slot third party systems in and out as business requirements evolve. For example, consider triggering an event for a deposit to a banking account. This event is consumed by a banking connector service, which passes the information into the core banking service. The business then determines a need for a different core banking product and wants to replace it. All that’s required from an implementation perspective is the creation of another banking connector service, and the new core product can immediately consume the same events being passed to the legacy core banking product, without any implication or effort for the wider platform.
Event Driven Architecture: the bad.
What are some of the perceived shortcomings? Are there strategies or ways to overcome them?
“Event driven architecture is overly complex.”
The perceived issue: with so many events, producers and consumers orbiting around different business flows and processes, it all seems too complicated.
The response: the architecture is ultimately a reflection of your business domain and the complexity that is inherent within it. Regardless of how that complexity is distributed or manifest, it still exists: whether centralised as a monolith or distributed and decentralised as microservices. The benefit of decentralising and embracing events? You understand that complexity better. You have more granular visibility of the components that comprise the complexity. You have more control over each component. And they’re all easier to maintain and change. With greater control and clarity as to the most complex aspects of your business domain, you’re empowered to make decisions to change—and even potentially reduce—the overall complexity.
“Security is compromised by event driven architecture.”
The perceived issue: in a point-to-point structure, service A can only talk to service B. If multiple producers and consumers of events are constantly shifting data points on and off the queue(s), does that equate to more opportunities for compromised data?
The response: producers and consumers of events can be restricted to interact with specific queues, with very clear definitions of which services can write and consume events. Additionally, you could also share keys to encrypt and decrypt data as each event goes on or off a queue. However, in most instances that approach seems prohibitively complex; it’s analogous to locking and unlocking every door in the house each time you go to the bathroom. By locking the front and back doors (or restricting services to only produce or consume certain events in relation to specific queues), you protect what’s important, with the same level of robustness seen in point-to-point.
“Event driven architecture makes debugging more complicated.”
The perceived issue: when you’ve got a straight line of process—A-B-C-D—it’s easy to follow a path of informational flow. Conversely, when you put an event on a queue and three different services consume that same event, with another service being triggered to consume a secondary event based on one of those services, things become harder to follow. It’s difficult to pinpoint the source of an issue.
The response: there’s no extra effort, just a reframing of approach. It’s true that you can’t ‘stack-trace’ in event driven architectures, but there are other options. Remember, each microservice is ideally hyper-specialised, with one dedicated and precise function. If a service receives an event that doesn’t contain everything it needs, one of two situations has occurred:
- Either the service didn’t use the data/event it received appropriately, which means the service in question—the event consumer— is the source of a bug, or
- The service didn’t receive the necessary data configured appropriately, which means the producer of that data/event is the likely source of a bug.
The tracking process may not be linear, but there are far better defined ‘contracts of connection’ between individualised microservices.
“Distributed, highly decoupled services make monitoring a challenge.”
The perceived issue: monitoring is trickier because each service is independent. If there’s a knock-on effect, where one service passes the wrong piece of data down the line, it can be challenging to identify.
The response: there are standard tools to use for monitoring of distributed microservices in event driven architectures. For example, you have a range of ‘out of the box’ options with Confluent Cloud. Or alternatively, you can hook into AppDynamics. For cloud-based instances you can use Prometheus and Grafana. Crucially, each individual microservice should be configured with an ability to be monitored. Each service needs to be up and available. If a certain service interacts with a third-party platform, tag it as doing so; these services can and should be responsible for understanding their own status, including the status of the third party platform they interact with.
“Rollback becomes impractical, if not impossible.”
The perceived issue: if service A puts an event on a queue, and service B picks it up, carries out an action, puts it onto another queue, service C picks it up, carries out some action, and puts this subsequent event on a queue… then service D picks it up and there’s a failure, how do you roll back to ‘C’, ‘B’, and ‘A’?
The response: this is a question of design. Where there are related services, it’s worth investing time up front to determine whether you want (or need) to make them asynchronous. See below for more on the importance of investing in initial design thinking. Alternatively, acknowledging that these scenarios can take place, you can also retrospectively implement a SAGA pattern. Event driven architectures are inherently more flexible and extensible than a monolithic counterpart.
Event Driven Architecture: the (potentially) ugly.
What are some of the general misconceptions that lead to issues when approaching an event driven architecture? How do things sometimes ‘get ugly’?
“Everything has to be asynchronous.”
Event driven architectures, by their nature, are asynchronous. The ability to process flows in non-linear and concurrent fashion is crucial to a large part of the value they deliver.
However, there’s still a valid role for synchronous, point-to-point processing. Particularly in those instances where there’s a clear flow or user intent. For example, consider processing a payment and notifying a banking customer of successful transaction. In certain scenarios, it’s worth pausing to ask: does this need to be asynchronous, or are we needlessly overcomplicating things? A point-to-point flow can still be event driven—creating events, placing them onto message platforms, etc—without the need for over-engineering.
“There’s a one-size-fits-all approach”.
To embrace event driven architecture effectively is to invest in design thinking upfront. Events are a set of ideas and frameworks for solving sophisticated business challenges, rather than a rigid set of rules or a series of platforms. Simply adding an event backbone will not solve the issues or deliver the most value. In the same way you can’t use the engine from a semi-trailer in a Mini Cooper, you can’t just add Apache Kafka or Azure Event Hub to your existing architecture and expect results. Take the time to look at individual problems, needs and challenges—at a micro level—and determine the best way forward. Ideally, with an experienced guide who’s done it all before with a wide range of organisations.