One of our customers' cloud accounts gave me a jolt. Such a strong one, in fact, that here I am typing furiously into the night. Working on that account and seeing the final results took me aback.
They managed to bring down annual cloud costs by a cool
$2 million - a saving of 60%! No, they didn't wave a magic wand. We just needed to take a closer look at their cloud architecture. That's it!
The cloud is the future. Unmatched Scalability. We've all heard these buzzwords that businesses throw around so often. You've probably heard this from many sources already, before deciding to take the plunge. But let me guess. You now find that this scalability and high-availability of five 9s come at a price — one that's growing exponentially, spiralling out of control.
Although I do agree that the scalability of the cloud outweighs most other considerations, ballooning cloud costs is a genuine problem — especially if they're starting to affect your bottom line, or worse, your business' viability.
As the Director of Technology at Egen, I regularly interact with customer cloud accounts while managing their cloud architectures. Most report cost issues to us, and this is the most frustrating part — almost always, I've found that they could cut down on over 50% of their cloud spend with just a few simple steps.
Here are a few of the most common ways businesses overspend on the cloud below:
Reserved Instances: On-Demand Instances are Colossal Money Sinks for Production Environments!
Yes, I know. Cloud services became popular on the back of on-demand, per-second billing. But an on-demand pricing model is a truly appalling way to go about using a cloud platform for production environments that run 24x7. It never ceases to amaze me that several businesses continue to rely on an on-demand billing model even after years of moving to the cloud.
Stop. Wasting. 💰💰💰💰💰.
Just switching to a reserved instance will bring down your cloud costs without any additional investment/risk of disruption. I'm not kidding.
I'll illustrate the potential savings by comparing the costs of a popular AWS reserved instance vs. the same on-demand instance variant (Linux, m5.large, US East) over a time frame of 1 or 3 years according to pricing in September 2020.
- Hourly Rate: $0.0960
- 1 Year cost: 24 x 365 x 0.0960 = $840.96
- 3 Year cost: 3 x 24 x 365 x 0.0960 = $2,522.88
Standard 1-Year Term
- All Upfront Payment Option: $494 upfront + $0 hourly
- No Upfront Payment Option: $0 upfront + $0.060 hourly
Standard 3-Year Term
- All Upfront Payment Option: $949 upfront + $0 hourly
- No Upfront Payment Option: $0 upfront + $0.041 hourly
Savings with RIs over On-Demand
- All Upfront 1-Year Term: 41%
- No Upfront 1-Year Term: 38%
- All Upfront 3-Year Term: 62%
- No Upfront 3-Year Term: 57%
Incredible savings. Isn't it? Even without paying anything upfront. You would happily wait for 5 hours in a line during the Black Friday shopping if a retail store was giving that much discount. Won't you?
Without taking on additional risk or involving your technical teams, you're already on your way to cut your cloud costs in more than half with just a single step. All major cloud providers offer similar discounts for reserved compute provisioning, and I'd say they're well worth looking into for most organizations that run 24x7 cloud instances.
As with all things, there are bound to be trade-offs and disadvantages to using reserved instances. Some of them include not being able to upgrade to instances offering newer hardware or better pricing during the contracted term. AWS provides “convertible reserved instances” plans as well if you are interested in having upgrade flexibility.
Spot Instances: Save up to 90% on Fault-Tolerant Workloads
I love the idea behind spot instances. Cloud providers offer excess (unused) computing capacity that's lying idle at heavily-discounted rates (up to 90% in many cases). There's a caveat. Of course, there is!
Cloud providers can reclaim these spot instances at any time with a very short advance notice, which is precisely why these instances are a perfect match for fault-tolerant workloads like running stateless containers, batch workloads, or build pipelines.
Not everything needs to be up and running all the time. And even if something does, there are ways to provision them to mitigate those risks.
So, how much can you save to make the risk of an instance going ‘poof,' worth it? Let's take our m5.large, Linux instance on US East as an example to find out. At the time I'm writing this, the AWS spot instance advisor lists a cool 56% saving over on-demand instances. Although this might seem like it's close to the savings you'd enjoy on reserved instances, there's one key difference. You only pay for the hours you use because spot instances are best suited for workloads that won't need 24x7 uptime — making them significantly cheaper. Discounted instance offerings and prices do change over time and make sure you check the frequency of interruption as well.
Beware, though! Leveraging spot instances effectively requires careful setup and management if you want to mitigate the risks that accompany it. Walking full-tilt into spot instances, because those savings enticed you, is a perfect recipe for disaster and unexpected disruptions.
I have a guide about Spot Instances for Kubernetes for a more detailed analysis of the risks involved and how to mitigate them for business continuity.
Autoscaling Instances: Don't Pay for What You Won't Use
I've encountered numerous businesses that leave resources running on cloud instances even if they don't need them at all or only need a fraction of the computing power during a specific time frame.
Ah, if only there were a way to reduce the amount of compute power you're using or scale down when you don't need as much as you usually do. Well, it turns out, there is a way, albeit an oft-ignored one.
Come on — stop throwing your money away. Please.
All major cloud providers offer flexible Autoscaling features that will allow you to scale your computing power as you need.
If my business shuts down at 6PM, chances are, internal resources like development, QA, CI servers, and performance testing environments won't need to handle much load, if any at all. Why are they still running and adding to your cloud costs, you ask? Honestly, I don't know.
Simply scheduling to account for active work hours can do wonders for your cloud bill. You don't even need to shut those services or applications down. Just scale down to account for lower usages to reduce your costs. And that can be automated based on the system usage/load as well instead of the fix schedule.
Another enormous waste of money is keeping non-production environments going 24x7, regardless of whether they're spot instances, or so much worse, on-demand instances (oh, the horror).
Reduce Cloud Sprawl: Why are We Paying For That One Again?
If you're not familiar with the term, I'll catch you up. Simply put, it refers to machines with excess resource allocation (CPU and memory) and/or machines running in an organization's cloud (aka zombie instances), often without their knowledge. What's more, they remain part of cloud bills unnoticed for several years in far too many cases.
Seems improbable? Unfortunately, it's all too common. Cloud Sprawl is a pervasive issue and remains a regular contributor to cloud costs.
All major cloud providers offer cost-management tools to prevent precisely such scenarios. Yes, some cost-management tools are downright lacking (I'm looking right at you, AWS…).
Nonetheless, spending time with these tools, going over each billing, and tracing back to find the services using those resources will help you identify any cloud sprawl that you can address as soon as possible. Also, put CPU and memory usage monitoring for your services, you will be surprised how less resources your services may need. And if you are over-provisioning for the peak capacity, well, keep reading on.
If you find cost management and tracking tools from your cloud provider lacking, many excellent alternatives will present data to you more cleanly — allowing you to figure out any unnecessary expenses faster than ever. I'd recommend looking into services like CloudCheckr and Cloudability.
Track Data Transfer Costs: Do You Like Having Money? Then Stop Moving the Swimming Pool!
You're hosting a chunk of data in a particular region on the cloud. Moving it to another region, another availability zone, or another cloud provider seems straightforward, right? It is, but there's a catch. Most cloud providers charge for transferring data, and some will outright gouge you (still looking at you, AWS).
Moving data to and from the cloud to other regions or even to your organization's on-premises systems will attract costs. Although this one is easier said than done, make sure your cloud and application architecture is optimized to account for these costs. For example, if you're hosting an on-premises application that frequently needs to access data hosted on the cloud, you might cut down on your cloud bill by merely moving that application to the cloud as well.
If you do find yourself needing to transfer huge chunks of data in or out of the cloud, explore alternative solutions to more traditional network connection services like AWS's DirectConnect or Google Cloud's Interconnect. In many instances (no pun intended) where data is of the order of petabytes, offline transfer devices like AWS Snowball and Azure Data Box do work out cheaper.
Reducing data transfer costs isn't a turn-key solution (unfortunately), and it necessitates a certain degree of creativity and agility to pull off successfully.
Serverless Services: You Don't Always Have to Pay With Time
Let's get one thing out of the way. You still do need ‘servers,' you just won't need to think about them when using serverless services. As in how much computing power, memory, or storage you'll need. The services need to WORK. Reliably. And that's what they do.
Say, you're hosting a database on the cloud. You need it always available to handle any requests. Services like AWS Aurora Serverless allow you to provision an on-demand Relational database that will start, shut down, scale-up, or scale down depending on your application's requirements. Its pricing works out much better because AWS Aurora Serverless charges you per I/O (Input/Output) it handles instead of charging you per hour that database is hosted on its machines (you still pay for storage per month).
If you're picking out a traditional virtual machine, you'll need to account for your peak usage. Why? Well, you need your service to handle the incoming load, and hitting peak usage isn't a good reason for disruption. With serverless services, you can stop planning (and paying for) peak usage as the provider will scale up as and when required, without billing you for that compute power throughout.
Serverless services get along with workloads that require near-constant uptime but usually feature infrequent or intermittent accesses. Think stateless containers, databases, data pipeline tasks, build pipelines, and so on. Save your money and stop buying instances for every task in the universe.
Hey, don't go overboard. If your workload requires constant I/O pings up and down, going with serverless services will become painfully pricey. Plan your cloud architecture carefully.
We recently managed to save a whopping 70% on cloud costs for an Egen customer in the food delivery space by moving them to AWS Fargate (a serverless container service). Yes, 70% by just moving to a different service. You can find additional information in my detailed guide to serverless services.
Reduce Over Reliance: Stop Relying on a Single Cloud Provider for EVERYTHING
I know, I know. Having everything within a single ecosystem is an attractive proposition, and I fully acknowledge the convenience on offer. But starting to rely on AWS to get toilet paper is not going to end well. For any of us except, well, AWS.
Cloud providers know this and entice you with convenient but proprietary solutions.
Tying yourself down to these proprietary solutions from a single provider induces vendor lock-in — just a fancier way of saying, “Hey! You're stuck”. Don't reach a point where getting out of their web (really, not a pun) is impractical and just too damn costly.
If your provider decides to raise prices or another one drops their price drastically, guess what, there's nothing you can do about it. If you want to move away from their services, untangling yourself from their proprietary solutions is a complicated task at best. If you do manage to pull it off, it's going to be one hell of a story for the grandkids. No kidding!
Scenarios like those are why you should try relying on services, products, or applications that are wholly vendor-agnostic. It won't matter whether you're using AWS, Google Cloud, or Azure; your organization can move across providers when you see significant cost savings.
Kubernetes is Kubernetes and will work the same (almost) whether you pick AWS, Azure, or Google as your provider. It won't discriminate. Same thing is true for other services including relational databases like PostgreSQL. Doesn't matter if you run on AWS RDS, Azure Databases for PostgreSQL, or Google Cloud SQL. You should definitely go for that sweet spot between cloud managed services and vendor-agnostic frameworks.
Phew, that was tiring, but I think I'm done for the day. Come on, everyone! Stop just handing cloud providers your hard earned money (unless its from your VCs 😛). Take concrete steps to ensure that your cloud architecture is designed to leverage cost-saving services and products effectively.
Use the guides that I've linked throughout this post to start taking action and saving money today! If you'd like to have a chat about your organization's cloud architecture and spending, head over to Egen and schedule a free 30-minute consultation with me.
Originally published at https://praveen.salitra.io on September 19, 2020.