Serverless DevOps: Why Serverless For Operations People

Beautiful divider with lightning bolt in the middle
 

This is a part of our Serverless DevOps series exploring the future role of operations when supporting a serverless infrastructure. Read the entire series to learn more about how the role of operations changes and the future work involved.

blog-serverless-devops-book

I don't want to just operate systems anymore.

So far, I’ve presented serverless only as a disruptive technology in the world of operations. It’s going to change how operations functions and how we apply our existing skills. It's also going to require us to learn new skills.

In only that light, serverless seems like something to be afraid of. But it’s not! It’s something that should be embraced with optimism because of the changes it brings.

Let’s discuss why so many are excited about the effect of serverless on operations. In this chapter, I’ll talk about what drives many of us (including myself) in this field, where that drive has been lost, and why serverless brings it back.

Getting Into Ops

I got into operations for two main reasons: I enjoyed building things and I enjoyed solving problems. Unlike many people in the operations profession, I was not responsible in most of my jobs for “operating” software built by developers. Instead, I usually worked on infrastructure and service delivery teams.

For most of my career, particularly in the beginning, my two motivators were directly linked and operations was very enjoyable. There was a problem that bugged me and I built something to alleviate that problem. I was either solving a problem of mine or solving a problem that helped the teams I served.

Over time, though, I started getting bored. While the technology available to me has changed, the problems haven’t. Whether it was delivering a virtual machine on VMware or EC2 instances in AWS the problem was still, “How do I deliver compute to a waiting engineer?”

Similarly, whether it’s building an application packaging, deployment, and runtime platform or choosing to containerize applications with Docker, these two problems are largely the same: “How to I bundle an application and its dependencies, and deploy it to a standardized platform to run?”

I was tired of keeping up with changes to operating systems, too. The host operating system is largely a means to an end for me. I prefer to spend very little time logged into an application host. Most of what I need from a host — logs, metrics, etc. — should have been shipped to another system that I could then interact with. Changes to network device naming, increasing systemd complexity, or replacing a standard UNIX command with some new utility may benefit some people, but for me these things largely get in the way. They require engineering or effort on my part to keep up while providing little to no value.

What’s even more frustrating to me, the problems operations engineers are asked to fix across most organizations are largely similar. This has led to organizations differentiating themselves based on their technical stack and technical solutions to attract talent. And that technical stack may not even be the right choice for their problems, current maturity, or scale. I see startups deploying Kubernetes to run only handfuls of containers, for example.

“We’re building a service mesh on top of Kubernetes. That’s a great reason to work here!”

At this point in my career, the trendy option of building and operating a Kubernetes platform for container management is simply not appealing work anymore. It’s just the third iteration of a problem I’ve solved more than once before. And the sheer number of operations jobs asking for the same problems to be solved means most organizations fail to stand out. (And other differentiators like culture are hard to evaluate until you already work there.)

How Does Serverless Make That Happen?

Offloading operational work to public cloud providers is highly appealing. It lets us shed undifferentiated work that we’ve been doing repeatedly across our careers and focus on serving people and providing value. While at first I was worried that serverless would eliminate all of my work as an operations person, I eventually realized that would not be the case.

Why? Because most of us work in organizations where there’s more work to be done than there is capacity to perform the work. Serverless doesn’t result in NoOps, where the need for operations goes away, but instead what we might call DifferentOps.

The effects of serverless on operations can mostly be characterized by

  • greater emphasis higher up the the technical stack
  • time and effort freed up for tasks we couldn’t get to before
  • more time to solve business problems

 

Let’s briefly explain these and why it makes serverless exciting as an operation person.

Moving up the Tech Stack

I’ve managed Linux hosts for a long time and it’s fairly routine, except for when network device naming changes or a standard decades-old command is replaced for some reason. Whether it’s working directly on hosts, building machine images, configuration management, or figuring out how to deploy hosts, I’ve done it and I’ve done it for awhile, which now makes the work relatively routine and boring.

Take the infrastructure and the associated work away, and what do you have left? You still have much of the same work, but now you’re performing it against the application. If you’re monitoring and troubleshooting hosts today, then tomorrow you’re monitoring and troubleshooting applications. If you’re tracking host vulnerabilities and patching software today, then tomorrow you’re tracking and patching application and dependency vulnerabilities. And so on.

Many of us in operations are used to using tools like strace, sar, and vmstat to observe and debug what the host operating system is doing. This also means, as operations people, we’re going to have to dig into code. We’ll have to understand debuggers, profilers, and tracing. Personally, I’ve always wanted to learn those skills. And tomorrow I may be on a team with an experienced application developer who can help me.

The work is new and the work is different, but the fundamentals are the same. And while much of the work may be boring and tedious to an experienced developer, it’s work that I can tackle with the enthusiasm of a junior engineer excited to master new skills.

Assessing and Improving Software

We should acknowledge there’s always more work to be done in our infrastructure. That means the free time we gain from not doing much of the operations and infrastructure work can be used to improve the software we’re responsible for.

But many of us have a hard time picturing new work to do. That’s because we have become stuck in a rut, doing mostly the same things and solving the same problems day after day and job after job. But serverless provides an opportunity to break free from that rut.

Much of the new work we can do after service delivery is to continually assess our applications for reliability and resilience to failure. This new work starts with planning game days in your organizations. These are fire drills to assess our preparedness and response to failures in the environment.

A more rigorous discipline call chaos engineering is also developing, where teams take a disciplined and scientific approach to testing for failures. With chaos engineering, you form a hypothesis of what and how systems fail, perform controlled experiments to test whether you were correct, and then from the data collected learn and apply your new knowledge to improving a system.

There’s also a new push to start performing software and system tests on production instead of staging servers. The best tests of a production system are performed on that production system and not a staging environment that is mostly similar and under significantly less load. But to do that, you need to have your failure preparedness plans already in order and good knowledge of how your systems may fail.

I should point out that just going serverless isn’t going to magically make you able to perform these practices. But you should have the time to work on the people and process in your organization so that you can adopt these practices successfully. Serverless here provides an opportunity to make our organizations perform at their best.

Solving Business Problems

I’ve spent a bit of time in startups, and it’s dramatically altered my thinking about engineering and what’s really important over time. Now, I’m more interested in solving business problems and the growth of the organizations I work for.

At my first startup I was introduced to product metrics and terms like adoption, retention, and churn. Development teams released products and features, and their work drove metrics, which ultimately drove revenue. That’s what is so interesting to me about product pods and feature teams aligned around business problems. Adding a feature that delights users, fixing a bug that frustrates them, or any number of other product changes shipped had a measurable and noticeable effect on the organization.

But not a single customer really cared if we ran on-prem or in the cloud, ran in AWS or on OpenStack, or whether we were deploying Kubernetes. My work on an operations team never budged the metrics that were most important to the organization. Serverless, and the potential repositioning of operation roles in an organization, provides an opportunity to have a direct impact on the organizations we work for by increased emphasis on solving business problems over technical problems.

Today, I’m very interested in the intersection of solving business problems and engineering through the use of product engineering concepts. You’re starting to see product engineering concepts creep into operations already.

You may have heard the phrase “product, not project” more recently. This mentality involves solving problems, and engineering, iteratively. You start by identifying a problem and delivering small solutions quickly, then you measure the success of your solution to determine whether to continue your effort or take a new approach. This problem-solving and engineering is much more difficult than the two hardest problems in engineering, cache invalidation, naming things, and off-by-one errors.

Let’s also talk about judging our success. Projects are judged by benchmarks like schedule, budget, and technical correctness. An organization largely judges this on their own. A project is a success or failure because the organization declares it a success or failure.

Products are judged very differently. They’re judged on the previously mentioned metrics like adoption, retention, and churn. To be successful, a product has to solve a problem for which there is a demand and in a way that will make the user happy. Success is judged by the fickle whims of external arbiters. This makes attaining product success far harder than almost any technical problem out there. This is a whole new level of hard problems. And these are the challenges that I want to face.

You don’t have to work in a startup or product company to adopt this product-not-project mentality. Even as an internal engineering or service team you can still work this way. After you deliver a service do you follow up with users? Even if they haven’t complained? Do you check that people are using your service? Do you check that they’re satisfied? Do you check that their problem has been solved or improved? There’s so much more we can do to ensure we’re solving real business problems and having an impact on our organizations.

Dropping infrastructure operations work by going serverless and joining a product pod lets me focus on solving problems that are relatively unique to my organization and have a direct contribution to the growth and success of my organization. And these types of problems provide a level of difficulty and challenge that is far harder than anything I’ve experienced in my career.

I Don’t Want to Operate Systems

I’ve reached a point where I need to say this.

“I don't want to just operate systems anymore.”

I'll just come right out and say this. I didn't get into this profession to operate systems. The work of operating systems is obviously important, and still is with serverless, but I want to reduce my time spent on that work. I want to do other things that contribute to what I really enjoyed about operations. When I express this sentiment out loud I find people coming forward to express similar feelings. They’re less interested in operating systems and more interested in providing value and serving people.

Read The Serverless DevOps Book!

But wait, there's more! We've also released the Serverless DevOps series as a free downloadable book too. This comprehensive 80 page book describes the future of operations as more organizations go serverless. Whether you're an individual operations engineer or managing an operations team, this book is meant for you!

 

There's still more in ourServerless DevOps series! Read the next piece in our series The Need For Ops.

Read The Serverless DevOps Book!

But wait, there's more! We've also released the Serverless DevOps series as a free downloadable book, too. This comprehensive 80-page book describes the future of operations as more organizations go serverless.

Whether you're an individual operations engineer or managing an operations team, this book is meant for you. Get a copy, no form required.

[Free, Ungated Book] Serverless DevOps: What do we do when the server goes away?

We're ServerlessOps

We help you design, build, and run reliable serverless systems in AWS. Whether you're a startup, cloud native, or just beginning your cloud journey, we're here for you. Learn more about the services we offer, and how we can help you successfully accomplish your serverless operations goals.

Contact Us

Looking to get in touch with a member of our team? Simply fill out the form below and we'll be in touch soon!