Public Cloud Migration

Public Cloud Migration Isn't About Cost Anymore; Here's What's Driving It

For years, cost savings topped the list of reasons businesses moved to the public cloud. But in recent years, the game has changed. While lowering CapEx remains a bonus, the true motivations behind cloud migration now revolve around speed, innovation, and scalability, not just budgets.

In the past: The Cost Factor

For many old-fashioned businesses, the problem started with how to cut their IT budget. The ability to easily set up an entire environment in just a few clicks (or API calls) and to choose how to pay for services (from pay-as-you-go to saving plans to Spot) was very appealing.

It was so appealing that many businesses, from small startups to large corporations, forgot to include cost in their design choices. This led to high monthly bills. They moved their data and workloads to the public cloud, but now they are talking about returning to on-prem and cloud repatriation.

Projects that rushed to the cloud without proper planning and considering all aspects (security, scalability, availability, and cost) failed.

Organizations that are more mature and have teams with a lot of experience (like developers, DevOps, architects, etc.) can design modern architectures that use a mix of managed services, APIs, and serverless services. This can save money on cloud services and be cost-effective. But for most companies just starting in the cloud or that don't have teams with a lot of experience, moving to the cloud is a big letdown when you look at the cost alone.

What Matters Now?

The Agility Factor: Agility was a big plus for the public cloud. It allowed companies to move quickly and deliver new products or services to their customers faster. Companies of all sizes could try out new services (or features), get used to new technologies (from the early days of serverless to the most recent improvements in GenAI services), deploy apps to test environments, and if the latest development was helpful to customers, deploy it on a production scale. The cloud freed businesses from the limitations of old data centers, which had long purchase cycles and required them to use the same hardware for years. This allowed them to test new features, quickly recover from failures, and keep trying until they got fully functional production services that met their customers' needs.

The Factor of Scalability

Compared to most companies' data centers, size is one of the best things about hyper-scale cloud providers. A data center has physical size limits, like the number of racks you can fit inside or the amount of power you can use to run physical infrastructure and cool the whole thing down. A traditional data center might be enough if your business has steady workloads and only a small peak in traffic or customer demand. However, scale is a very important factor for companies that do business worldwide and have customers with different traffic patterns (like Black Friday or Cyber Monday).

You may have an online store that needs to grow or shrink to meet customer demand at different times of the year. You may have a lot of work to do at the end of the month. You might be training a big language model on customer data. In all of those situations, having an almost infinite scale is very important, and the public cloud is the best place for this (if you use one of the hyper-scale cloud providers).

The Elasticity Factor

When making applications, it's essential to be able to add or remove resources as needed. Elasticity was so crucial and a huge benefit compared to the traditional data center because it combined (almost) infinite resources (like compute and storage), a microservices architecture (with the ability to scale up or down specific components according to load), and the ability to use serverless services (like FaaS, storage, database, etc.) that automatically respond to load and elastically manage the amount of required resources (lower the burden of human maintenance). Elasticity was a big reason why people used the public cloud. The ability to switch hardware (with little or no downtime) or use the latest GPUs for a new GenAI application (or a speedy storage service for a massive HPC cluster) and then shut it down and save money when they were done was a big reason.

The Factor of Efficiency

Efficiency may not have been a top priority before because of the limits of physical hardware. But in the last ten years, more and more companies have decided to incorporate efficiency into their design decisions. The cloud allowed us to achieve almost the same goals using many different patterns, such as containers, functions as a service, APIs, event-driven architectures, and more.

At any point in time, we can stop and question past decisions we took. Is our current workload running at the most efficient architecture, or can we make some adjustments to make it more cost-efficient, resilient, and respond quickly to customers' requests? Switching to newer hardware, different storage services (or even different storage service tiers), various types of databases (like relational vs. NoSQL, graph vs. time-series, etc.), or even from tightly-coupled to loosely-coupled architectures can sometimes make a workload run more efficiently.

Public Cloud Migration

The Factor of Automation

Although mature organizations with many servers and applications have been using automation scripts for many years to achieve fast and reproducible outcomes, the cloud has taken them to a new place where (almost) everything is exposed using APIs.

Infrastructure as Code allows organizations to automate everything from building entire environments across multiple SDLC stages (such as Dev, Test, and Prod) to multiple availability zones and even multiple regions (when a global footprint is required).

IaC languages such as Terraform (or OpenTofu) and Pulumi, or more vendor-opinionated native alternatives (such as CloudFormation or ARM templates), allowed organizations (after learning how to write IaC) to be proficient in workload deployments in a standard and automated way.

Policy as Code (from HashiCorp Sentinel, AWS SCP, Azure Policy, Google Organization Policies, or Open Policy Agent) lets organizations add a layer of rules about what resources can be used and their limits (like region or specific instance types). This makes security and configuration the same across the whole organization.

The Security Factor

When organizations' and customers' data is spread across multiple places (from on-prem data centers to SaaS applications and partners' data centers), the physical location can no longer be considered a security boundary. In many cases (but unfortunately, not all cases), services deployed on IaaS or PaaS are configured as secure by default. Although deploying computing resources with public IP still happens today, it is much rarer to deploy a publicly exposed object storage service (without specifically configuring it as a public resource).

Encryption, both in transit and at rest, is enabled by default in most cloud services. To get higher assurance of who has access to private data, most hyperscale cloud providers allow customers to configure customer-managed encryption keys. This ensures that organizations don't just control the encryption keys but also the key generation process.

Audit of admin activity is enabled by default (for user data access, we still need to consider whether to manually enable it due to its extra cost), and logs can be stored for an infinite amount of time (for as long as organizations need them for incident response processes or to satisfy regulatory requirements).

Network access in the cloud is still a pain for many organizations. The larger your cloud environment is (not to mention spread across regions and even multiple cloud providers), the more visibility you must have (sometimes using built-in services or open-source tools, and sometimes using third-party commercial solutions). Be alert when changes happen because, as in real life, someone will eventually come inside if you keep the door open.

Summary

There are many cases where organizations will choose to keep some of their workloads on-prem (or in co-location or hosting facilities) due to high service costs (from real-time storage to expensive hardware such as GPUs), requirements for low network latency (such as connectivity to a stock exchange), or data sovereignty requirements. We will probably still see hybrid architectures for many years, but the public cloud is becoming increasingly crucial in the design and architecture decisions of organizations of all sizes.

If we stop looking at the public cloud as a place to lower our costs (it is possible, but not for all use cases) and start looking at agility, scalability, elasticity, efficiency, automation, and built-in security (enabled by default) as important factors, we see the answer to the question of why organizations are migrating to the public cloud.