175 Zettabytes - A Data Volume Model for Valuing the Cloud

Andy LeRoy
Andy LeRoy

Table of Contents

The IDC Estimates that by 2025, world data volume creation will reach 175 Zettabytes.  As they allude to in their report - if you could store all of that data on DVD’s - that stack of DVD’s would circle the earth 222 times.  It’s an astronomical number.

Additional estimates indicate that the amount of data stored in the public cloud could reach 54% by 2025.

Given these numbers, can we construct revenue growth projections for the big cloud providers using data volume creation each year?

Why Does This Matter?

Investment Opportunity for Cloud Computing

In 2021, AWS, Azure, and Google Cloud brought in nearly $125 Billion in combined revenue.  

AWS: FY 2021 Revenue: $62 Billion, $18.5 Billion in Operating Income, 37% YoY Growth

Azure: 2021 (calendar year, and reported within Intelligent Cloud segment) Revenue: ~$43 Billion, 46% YoY Growth

GCP: FY 2021 (reported with GSuite) Revenue $19 Billion, 47% YoY Growth

Each company’s cloud segment is growing at 35%+ per year, with no signs of stopping.

From Amazon 10K (FY 2021): Cash capital expenditures were $35.0 billion, and $55.4 billion in 2020 and 2021, which primarily reflect investments in additional capacity to support our fulfillment operations and in support of continued business growth in technology infrastructure (the majority of which is to support AWS), which investments we expect to continue over time.

Capex continues to grow for the cloud providers - AWS has some $30 Billion allocated on a yearly basis up some 50% from prior year.

AWS has $80 Billion (as of 12/21/21) in contract backlog they expect to fulfill over the next ~4 years.

How long can this growth continue, and what are the implications on the respective stock valuations?

Understanding The Direction of Our Digital World

It's not quite apples to apples, but world GDP is ~95 Trillion.  When cloud computing reaches a trillion dollar run rate, it could be ~1% of World GDP.

Everything can now be broken down into 1’s and 0’s at some level.  At the compute layer, we invented a way to programmatically control energy and represent any type of information.  At the network layer, we have a decentralized way for anyone to ‘plug in’ to a global network and share any of this data instantaneously (the internet).  And as Matthew Ball alludes to in his Metaverse Essays, we may soon have new abstraction layers on top of the internet protocols and infrastructure.

Framework for the Metaverse — MatthewBall.vc
How much of the Metaverse is already here? When will it arrive? What does it need? And how will it grow? A Foreword to ‘THE METAVERSE PRIMER’

The metaverse, AI, self driving cars, blockchain technology - all of these require compute and networking.  Will the public cloud capture this growth in data volume and compute?  Understanding just how big our digital world could be gives context into what life could be like in the future.

From Amazon to Zoom: What Happens in an Internet Minute In 2021?
A lot can happen in an internet minute. This stat-heavy graphic looks at the epic numbers behind the online services billions use every day.

Opportunities for Blockchain and Decentralization

What role will blockchain technologies and decentralization play in the future?

As more workloads move to the cloud, could legacy enterprise data centers find a new role playing a part in decentralized compute or file storage?  

If Cloud Computing is a trillion dollar a year market in five years, how big can a world computer (Ethereum) or decentralized storage and protocols become? (IPFS, Filecoin, Siacoin, Storj, Filebase)

(We’ll revisit these ideas in the future)

Forms of Data Storage and Compute

Local Devices

Data can be stored on a local machine.  Depending on what type of data this is, that device could be a phone, a laptop, an assembly line robot, a car, a refrigerator, etc.  Really anything that has a computer chip in it can store data or process data (and oftentimes transmit the data elsewhere).

Enterprise Data Centers

Enterprises build and manage their own data centers.  Depending on the size and industry of the company, these can be massive - Visa and Facebook’s data centers are just two examples of the scale these can reach.

Inside Visa’s Data Center
Visa provides a behind-the-scenes look at its top-secret, high-performance network operations.

The Public Cloud (Cloud Computing)

AWS, Azure, and GCP lead the Cloud Computing Market, which is effectively an abstraction of running your own data center.  This solves problems of capital costs, time to value, scalability, maintenance, and redundancy, and offers a number of additional benefits.

Cloud Computing - Solving a Trillion Dollar Problem
“I’m in love with this slide. We update it but it doesn’t matter”. The numbers in this slide from Amazon Web Services (AWS) VP and Distinguished engineer James Hamilton at the AWS re:Invent conference in 2016 are astonishing. In 2015, Amazon Web Services (AWS) added enough server

Data Volume Valuation Model

Can we model out expected revenue for each of the major cloud providers given the overall themes of data growth and cloud workloads?  The full valuation model and all assumptions are linked below (note these are all estimates):

Cloud Computing Valuation Model
Cloud Data Cloud Computing Data & Market Share & Valuation ModelMetric,Assumption,Notes,LinksData Volume Growth CAGR,26.9%,IDC Analysis predicts 175 Zettabytes of data created on the internet in 2025, giving a CAGR of ~27%,<a href=“https://www.seagate.com/files/www-content/our-story/trends/fi...

Data Volume Growth

IDC estimates that 175 Zettabytes of data will be created in 2025, which represents a CAGR of 27%

Percent of Data Retained

Of all of the data created, how much is actually retained and persisted?  In this model we estimate 2% (going to 1.5%), which is obtained by looking at past cloud performance and storage estimates and backing into this figure.

Percent of Data Stored in Public Clouds

The volume stored in the cloud will continue to rise from an estimated 42% to 54% over the next five years.

Data Storage Costs

AWS/GCP/Azure have similar pricing on cloud storage.  For the model purposes, we use $240/TB/year for hot storage.

We estimate that 3.1% of data stored will be in ‘hot storage’ or more readily available.  The remainder will require less frequent access patterns, putting costs at $120/TB/year.

We also estimate that storage costs will drop 5% per year every year.

Lastly, we assume an overall discount of 50% off of marked prices to account for the aggregation of enterprise deals and other long term discounts from the major cloud providers.

Data Storage Revenues

Using all of these variables we can estimate data storage revenue per cloud provider by multiplying:

Data Storage Revenues =
Data Created
x % Data Retained
x % Data Stored in the Cloud
x Data Storage Costs
x % Market Share of Cloud Provider

Compute to Storage Revenue Ratio

Using estimates from the Snowflake pricing page, we estimate a ratio of 5.5 for compute to storage - meaning that for any data stored, the compute revenue (any combination of services) will be 5.5x that amount.  This could go up slightly as machine learning becomes more prevalent in future years - offset by the decreases in training costs, etc.

Putting all of these variables together we can project revenues for each major cloud provider.  A quick conclusion shows that the variance between the DCF projections and this data model projection are in line and this model offers one more data point confirming the growth potential.

If we refer back to the AWS DCF Model, we can see that this translates into a $750 Billion Valuation for AWS Specifically.

Amazon Valuation Model
Amazon DCF Summary Amazon Unlevered Free Cash FlowsModel SummaryModel Valuation Date,12/31/2021Amazon Revenue 5 Year CAGR,15.5%Amazon FCF 5 Year CAGR,34.7%Discount Rate (WACC),8.0%Terminal Value Long Term Growth Rate,4.0%Terminal Value EBITDA Multiple,18Equity Value Per Share (Growth in...

Where Does this Model Fall Short?

This is just one data model for predicting cloud valuations with broad assumptions and high level estimates - any more detail throughout would lend further credibility to the valuations.  

It would be incredible to see some of the real data and usage patterns aggregated by AWS/Azure/GCP across services, customers, and regions.  This presentation from Peter DeSantis at re:Invent 2021 offers just a glimpse into what an aggregated form of this data looks like - and the cloud companies have an incredible pool of data and insight into our digital world.

AWS re:Invent 2021 with Peter DeSantis

International Growth Specifics

Data creation and compute requirements are clustered and will see various growth rates.  IDC offers insight into China’s growing market share of the Cloud - could this skew future growth projections and pose challenges for US Based companies?

China’s Datasphere is expected to grow 30% on average over the next 7 years and will be the largest Datasphere of all regions by 2025

Application Specific Data?

The ~5 to 5.5x multiplier of compute to storage is based on just one example and some historical numbers.  This could vary by workload and application, and the aggregate could change.

Specific areas of high data volume increase, like self driving cars could take growth out of the cloud and to the edge or enterprise data centers (not the cloud).

New Cloud Competition?

Competition is fierce in cloud.  WSJ reported that Google was offering investments in companies to help win deals from Azure and AWS.  Can HPE, Oracle, Alibaba gain market share?  There are only a handful of companies that can keep pace with the required Capex needed to grow a $100 Billion dollar business at a 30% rate.

Can Decentralized Technologies Take Meaningful Market Share?

It is still very early for blockchain based technologies.  Can they continue their growth to become a dominant form of storage or compute?  

The Filecoin network stored 25 Petabytes of Data in 2021, up from just 1.5 Petabytes a year earlier.  This is not yet a comparison point for the major cloud providers - but if the Filecoin network can 10x for the next 5 years, they could store 2.5 Zettabytes on their network by 2027, which would be meaningful market share.

Filecoin in 2021: Looking Back at a Year of Exponential Growth


This model offers one more point of validation as to the anticipated 20-40% YoY growth the major cloud providers could see in the next five years.  Some additional macro factors are outlined here.

Networking and Internet Fundamentals - Connecting Every Computer
Because of the internet, we are already in the metaverse. The internet is a massive infrastructure and set of protocols coordinated and decentralized among private corporations, governments, and individuals.

For Amazon and AWS specifically this article (and video) goes into further detail and has a full DCF model with revenue estimates and additional drivers.

Using an EBITDA multiple of 18x, 5 year revenue CAGR of 24%, and 5 year FCF CAGR at 19%, puts AWS’s valuation alone at ~$750 Billion.

It will be interesting to see how all of this plays out in the next few years, and I welcome any thoughts, ideas, or feedback on the topic.

If you enjoy this kind of content, please subscribe to the Exponential Layers newsletter and YouTube channel to get all updates on new videos and articles.

ComputersInternetNetworkingDataCloud Computing