Understanding the performance measures of distributed serverless applications
Unlike in legacy systems, where performance is mostly measured by the speed of an operation as one big unit, in modern serverless applications, there isn’t a simple yardstick to measure performance. This is because there are different parts to an application that can be independently measured for efficiency, and each part can be unique in its criteria. In some cases, you look at each part in isolation and measure the latency; in others, you group a few or all of them to measure the end-to-end performance or the completion of a business process. Here are some examples of possible performance measures for different use cases:
End-to-end efficiency
An API that fetches the price of a given product and returns it to the frontend to display on the website should be highly performant end-to-end, as you don’t want a frustrated customer leaving your site for a competitor’s. This action crosses several layers in the architecture and chains multiple applications (possi‐ bly including external third-party applications) in the flow. Here, the end-to-end latency becomes the measure of efficiency.
Part efficiency
A customer who submits their order as the final step of a checkout flow only cares about receiving a quick acknowledgment to confirm everything is fine. The API that accepts the order details should efficiently serve the customer with a low latency. The actual processing of the order, which is hidden from the customer, still needs to be performant, but not necessarily as efficient as that API. Here, the efficiency is relative to each part.
Deliberate efficiencyand inefficiency
In an event-driven architecture, you may deliberately configure one processing pipeline to be faster than another. For example, an image processing website might handle images uploaded by a premium account holder faster than an anonymous customer. Here, you work with serverless resources capable of pro‐ viding the same efficiency level, but you deliberately downgrade one according to your business policy.
Expected efficiency
When the end-to-end latency is not time-bound, different parts of the solution can have different expected efficiency. For example, there are situations where a microservice expects to receive a domain event as quickly as possible from the producer. However, processing of the event data may take longer or be deferred to later.
Contractual efficiency
This measure is appropriate for use cases where there is time criticality in the flow of information from the start to delivery to a consuming application and the data may become obsolete or non-processable beyond a certain period, defined by SLAs and contract agreements. With ticket booking systems, for example, there is a session validity period; the overall success depends on the customer’s actions during the session, but contractual efficiency is required of the applica‐ tion supporting those actions within a session.
A legacy system that chains multiple applications in a syn‐ chronous end-to-end call cannot be considered “efficient” in modern computing if using low latency as the only measure of efficiency. The ability to scale with demand, 24/7 availability, resilience during disruptions, and operating in a secure envi‐ ronment all count toward being performant and efficient.
While assessing your business use cases and applications, you may identify some that are not a perfect match for serverless. These include:
Compute-heavy, complex applications
A compute-heavy engineering application that performs structural analysis and has high memory demands running on a High-Performance Computing (HPC) EC2 instance is one such example. AWS offers HPC-optimized instances as one of the high-end instance types, and you cannot match their power and performance with the resources available to a Lambda function. A serverless application can pick up the results of the computation and continue with the flow, but that core compute part won’t be efficient with serverless.
Long-running applications
Certain data processing batch jobs at banks and insurance companies, complex biomedical research tasks, etc., can take a very long time to complete, stretching beyond the timeout limit of a Lambda function. Unless you rearchitect to split a batch into manageable chunks of extract, transform, load (ETL) tasks, you cannot achieve the required efficiency with serverless.
Low-level computing tasks
Highly complex low-level programs require access to the underlying operating system, processors, and networking that are not suitable to operate as Lambda functions.
Applications that must consistently provide ultrafast response times
If you have a highly critical use case where the application is expected to respond within, say, 10 ms for almost 100% of the invocations, regardless of the frequency of invocations, you may find it challenging to meet this expectation with serverless.
Durable connection to ports using proprietary protocols
Legacy integrations that require maintaining connections to ports using nonstan‐ dard protocols are not ideal for serverless. Here, you can consider a hybrid architecture that uses container apps for such integrations and serverless to handle the downstream processing.