The artificial intelligence revolution has created unprecedented demand for specialized computing hardware, transforming the semiconductor supply chain in ways few predicted. What began as an acute shortage of graphics processing units has evolved into a complex landscape where scarcity persists in some areas while potential oversupply looms in others. This investigation examines the forces reshaping AI hardware supply chains, from NVIDIA’s continued dominance to emerging competitors and the strategic implications for model training costs.
The Persistent Nature of AI Chip Constraints
Beyond Simple Scarcity: Understanding the Bottlenecks
The narrative of GPU shortage has dominated headlines since the generative AI boom of 2023, but the reality is more nuanced than simple supply and demand imbalances. While front-end wafer fabrication has largely caught up with demand, bottlenecks have migrated to less visible parts of the supply chain, creating what industry insiders call “hidden shortages.”
TSMC Chairman Mark Liu acknowledged this shift bluntly in 2024: “It is not the shortage of AI chips, it is the shortage of our packaging capacity.” This statement reveals the true nature of the constraint. Advanced packaging, once considered a low-profile, low-margin afterthought, has become the critical chokepoint determining how many AI accelerators reach the market.
The numbers tell a stark story. TSMC’s CoWoS (Chip-on-Wafer-on-Substrate) packaging capacity, essential for stacking high-bandwidth memory alongside AI processors, saw demand triple year-over-year by late 2023. By mid-2024, TSMC revealed that its advanced packaging capacity for 2024 and 2025 was fully booked by just two clients: NVIDIA and AMD. Current average delivery times for chips requiring advanced packaging stand at approximately 40 weeks, though TSMC expects to cut this in half by expanding capacity.
High-Bandwidth Memory: The Other Critical Constraint
Parallel to packaging constraints, high-bandwidth memory has emerged as another fundamental bottleneck. HBM3, the memory standard powering data-hungry AI accelerators, sits at the center of a supply crunch. SK Hynix, Samsung, and Micron operate near full capacity, yet report lead times of six to twelve months. In some cases, when coupled with specialized packaging constraints, lead times extend even further.
The pricing impact has been significant. HBM3 pricing has risen 20% to 30% year-over-year, a trend expected to persist throughout 2025 as demand continues to outpace capacity expansion. This memory will account for more than 20% of the total DRAM market value starting in 2024, potentially exceeding 30% by 2025, according to TrendForce.
The Power and Infrastructure Challenge
Perhaps the most unexpected constraint has emerged not from silicon manufacturing but from basic infrastructure. Microsoft CEO Satya Nadella revealed in November 2025 that the company faces a paradoxical problem: “I don’t have warm shells to plug into. In fact, that is my problem today. It’s not a supply issue of chips; it’s actually the fact that I don’t have warm shells to plug into.”
This statement underscores a fundamental shift in the nature of AI infrastructure constraints. “Warm shells” refers to data center facilities with all necessary ingredients, particularly power and cooling, ready for immediate deployment. As data centers scale to unprecedented sizes driven by AI demands, the bottleneck is increasingly about gigawatts of electricity rather than teraflops of compute.
Existing data centers typically range from 50 to 200 megawatts. The AI explosion is necessitating construction of facilities with over a gigawatt of capacity. These massive installations, which could cost between $10 billion and $25 billion in five years compared to $1 billion to $4 billion today, require multi-year planning and construction timelines that cannot be accelerated to match chip production schedules.
NVIDIA’s Continued Market Dominance
The Scale of NVIDIA’s Position
NVIDIA’s dominance in AI accelerators has reached levels rarely seen in technology markets. Mizuho Securities estimates NVIDIA controls between 70% and 95% of the market for AI chips used for training and deploying models. The company’s data center revenue reached $30.7 billion in Q3 2024, representing approximately 87.7% of its total revenue of $35 billion, with net income surging 109% year-over-year to $19.3 billion.
The 2023 data center GPU market totaled $17.7 billion, with NVIDIA accounting for 65% market share, according to TechInsights. Intel held 22% and AMD 11%, with remaining companies accounting for less than 3%. Looking forward, Morgan Stanley projects that NVIDIA will consume 77% of wafers used for AI processors in 2025, up from 51% in 2024. This represents 535,000 300-millimeter wafers dedicated solely to NVIDIA’s AI products.
Perhaps most striking is NVIDIA’s gross margin of 78%, a stunningly high number for a hardware company that must manufacture and ship physical products. For comparison, rivals Intel and AMD reported gross margins in the latest quarter of 41% and 47%, respectively. This pricing power reflects not just market dominance but the depth of NVIDIA’s competitive moat.
The CUDA Ecosystem Advantage
NVIDIA’s market position rests on more than hardware performance. The CUDA software ecosystem, developed over nearly two decades, has become the de facto standard for AI development. Approximately 3.5 million AI developers worldwide code with CUDA. This creates formidable switching costs for any organization considering alternatives.
The ecosystem extends beyond software frameworks. NVIDIA has invested heavily in the broader AI landscape, with $47 billion worth of venture capital investments in AI companies from 2020 through September 2024, according to Pitchbook data. This strategic positioning creates network effects where NVIDIA chips become more valuable as more developers use them, which attracts more developers, and so on.
The Blackwell Architecture and Future Pipeline
NVIDIA’s product roadmap demonstrates the company’s commitment to maintaining its lead through continuous innovation. The Blackwell architecture, shipping in 2024 and 2025, delivers substantial performance improvements over the Hopper-generation H100. The H200 achieves 4.2 times speedup on large language model inference tasks compared to the H100.
However, Blackwell’s launch has not been without challenges. Packaging remains a bottleneck, with NVIDIA reportedly selling its newest chips as fast as TSMC can assemble them. The company’s CFO has indicated supply constraints through 2025, though production is being continuously scaled up.
Looking beyond Blackwell, NVIDIA has announced plans to accelerate its product cycles from biennial updates to annual releases. This strategy, described by CEO Jensen Huang as necessary given growing competition, aims to maintain technological leadership even as rivals close the performance gap.
Geopolitical and Regulatory Pressures
NVIDIA’s dominance has attracted regulatory scrutiny. In mid-2024, U.S. regulators opened probes into the conduct of dominant AI firms including NVIDIA. Areas of inquiry include whether NVIDIA favors certain customers, its pricing strategies, and its ecosystem control. CUDA’s closed nature has been a particular point of concern for open-source advocates.
Export controls add another layer of complexity. U.S. restrictions on high-end semiconductor exports to China have forced NVIDIA to develop specialized versions of its products for that market, sacrificing some performance to meet regulatory requirements. These restrictions simultaneously protect NVIDIA’s U.S. market position while limiting its addressable market globally.
AMD and Intel: The Pursuit of Market Share
AMD’s Aggressive Challenge
Advanced Micro Devices has emerged as the most credible challenger to NVIDIA’s dominance in AI accelerators. The company’s Instinct MI300 series, launched in December 2023, has gained significant traction with major customers including Microsoft, Meta, and Oracle.
The MI300X boasts impressive specifications: 192GB of HBM3 memory with bandwidth of 5.3 TB/s, significantly exceeding NVIDIA’s H100 specifications of 80GB memory. This larger memory capacity makes the MI300X particularly well-suited for large language models and high-throughput inference tasks. AMD claims the MI300X delivers 1.6 times the performance for inference on specific LLMs compared to NVIDIA’s H100.
The financial trajectory tells the story of AMD’s momentum. Wells Fargo forecasts AMD’s AI chip revenue surging from $461 million in 2023 to $2.1 billion in 2024, targeting a 4.2% market share. AMD CEO Lisa Su revealed that the MI300 series exceeded $1 billion in sales in less than two quarters, with data center GPU revenue for 2024 predicted to exceed $4.5 billion, higher than the previously estimated $3.5 billion.
AMD has outlined an aggressive annual release cycle. The MI325X, announced for Q4 2024 with shipments in Q1 2025, features enhanced memory of 256GB HBM3E with 6 TBps bandwidth. The MI350 series, including the MI355X chip released in June 2025, delivers four times the performance of the MI300X. The MI400 series is already on the roadmap, demonstrating AMD’s commitment to sustained competition.
ROCm: Addressing the Software Gap
AMD’s biggest challenge remains software. While its hardware increasingly matches or exceeds NVIDIA’s specifications, the ROCm software ecosystem lags CUDA in maturity and ease of use. While CUDA works out of the box for most tasks, AMD software requires significant configuration.
However, ROCm includes one critical feature: the Heterogeneous-compute Interface for Portability (HIP), which allows developers to port CUDA applications to AMD GPUs with minimal code changes. The latest version, ROCm 7, introduced in 2025, brings significant performance boosts, distributed inference capabilities, and expanded support across various platforms, making it a more mature and viable commercial alternative.
AMD has strategically invested in users of its solutions to drive adoption. Like NVIDIA, the company recognizes that ecosystem development requires more than just publishing APIs; it requires hands-on support and partnership with key customers.
Intel’s Cost Leadership Strategy
Intel approaches the AI accelerator market from a different angle than AMD. Rather than directly matching NVIDIA’s highest-performance offerings, Intel positions its Gaudi AI chips as cost-effective alternatives for enterprises seeking efficiency over absolute performance.
The Gaudi 3, built on a 5nm process, features 128GB of HBM3E memory and claims to be significantly faster (up to 1.7 times in training and 1.3 times in iteration) and more power-efficient (40% better) than NVIDIA’s H100. Most importantly, Intel offers an AI kit with eight Gaudi 3 chips for approximately $125,000, roughly two-thirds the cost of comparable NVIDIA platforms.
Gaudi 3’s architecture includes integrated networking for scaling multiple accelerators, giving it an edge for cost-sensitive training at scale. Cloud providers could use many Gaudi 3 accelerators to train models at lower costs than equivalent H100 clusters, even if each Gaudi 3 is slightly slower than its NVIDIA counterpart.
However, Intel faces significant headwinds. Its sales guidance for Gaudi 3 was approximately $500 million for 2024, substantially lower than the billions AMD projects. The company has also faced internal turmoil, with CEO Pat Gelsinger’s departure in December 2024 creating uncertainty about Intel’s strategy in the AI and foundry markets.
Intel’s longer-term roadmap included Falcon Shores, a next-generation data center GPU aimed at AI and high-performance computing, scheduled for 2025. However, the company has since pivoted strategy, canceling Falcon Shores and focusing less on standalone AI accelerators. The Jaguar Shores chip, successor to Gaudi 3, remains planned for 2026, but Intel’s strategic direction continues to evolve.
Custom-Designed Chips: The Hyperscaler Revolution
The Economic Imperative for Custom Silicon
The most significant long-term threat to NVIDIA’s dominance may come not from traditional chip manufacturers but from its own biggest customers. Hyperscale cloud providers, Google, Amazon, Microsoft, and Meta, are investing billions in custom AI silicon optimized for their specific workloads.
The economics are compelling. Meta reported that using its custom chip MTIA reduces total cost of ownership by 44% compared to GPUs. These savings encompass not just chip prices but electricity, cooling costs, operating expenses, and data center space. When operating tens to hundreds of data centers globally, such efficiency gains translate to billions in cost reductions.
JPMorgan projected in June 2024 that custom chips designed by companies like Google, Amazon, Meta, and OpenAI will account for 45% of the AI chip market by 2028, up from 37% in 2024 and 40% in 2025. This shift represents a fundamental restructuring of the market, with the largest customers becoming competitors to their primary supplier.
Google TPU: The Pioneer and Current Leader
Google pioneered custom AI silicon with its Tensor Processing Unit, first deployed internally in 2015. The company has since released seven generations, with the latest, Ironwood (TPU v7), announced in late 2025.
TPU v7’s specifications are formidable: 4,600 teraflops (trillion floating-point operations per second) with 144GB of HBM3 memory. This performance roughly matches NVIDIA’s B200, indicating that Google has closed the gap with the industry leader using its own custom silicon.
Google utilizes TPUs 100% for internal services, processing search queries, YouTube recommendations, Google Photos analysis, and training its Gemini AI models. The company has also made TPUs available through Google Cloud Platform, though they remain exclusive to Google’s infrastructure.
Recent developments suggest Google may be changing strategy. Reports in September 2024 indicated Google began physically selling TPU hardware to external cloud providers, directly competing with NVIDIA in the merchant market. DA Davidson analyst Gil Luria estimated Google’s TPU business, combined with its DeepMind AI segment, to be worth $900 billion, “arguably one of Alphabet’s most valuable businesses.”
Google’s advantage extends beyond hardware. Its XLA compiler and JAX framework are optimized specifically for TPU architecture, creating vertical integration advantages. Furthermore, Google’s decade-long head start (compared to other hyperscalers who began developing custom silicon in 2018-2019) shows in software stack maturity.
Pricing for TPUs has become increasingly aggressive. Google Cloud TPU v6e committed-use discounts go as low as $0.39 per chip-hour, cheaper than spot H100s in most regions when factoring in egress and networking costs. At scale, power efficiency creates additional advantage: TPU v6 consumes 300W compared to H100’s 700W and B200’s 1,000W. When running 100,000-plus chips, this 2.3 to 3.3 times power difference equals the annual energy consumption of Iceland.
AWS Trainium and Inferentia: The Dual-Purpose Approach
Amazon Web Services took a different architectural approach, developing separate chips for training (Trainium) and inference (Inferentia). After acquiring Israeli chip startup Annapurna Labs in 2015, AWS announced Inferentia in 2018 and launched Trainium in 2022.
Trainium2, generally available in November 2024, delivers up to four times the performance of the first generation, offering 30% to 40% better price-performance than current-generation GPU-based instances according to AWS. The architecture uses custom compute chiplets with four HBM3e stacks, delivering 20.8 petaFLOPS FP8 per 16-chip instance.
AWS has committed massive resources to Trainium deployment. Project Rainier, an AI supercomputer cluster in Indiana, contains approximately 500,000 Trainium2 chips. Anthropic, AWS’s strategic AI partner and $8 billion investment recipient, expects to scale to more than 1 million Trainium2 chips on AWS by the end of 2025. This represents one of the largest custom silicon deployments in history.
Trainium3, announced in December 2024 at AWS re:Invent, promises further performance improvements. The speed of AWS’s chip development cycle, particularly compared to traditional semiconductor companies, demonstrates the company’s commitment to this strategy.
Ron Diamant, Trainium’s head architect, states that Amazon’s ASIC achieves 30% to 40% better price-performance compared to other hardware vendors in AWS, and that Trainium chips can serve both inference and training workloads effectively.
Microsoft Maia and Meta MTIA: Later Entrants
Microsoft unveiled Maia 100 at Hot Chips 2024, their first custom AI accelerator designed to optimize large-scale AI workloads in Azure. Built on TSMC’s N5 process with advanced memory and interconnect technology, Maia 100 targets high throughput and diverse data formats.
However, Microsoft faces challenges. The company’s next-generation AI chip, code-named Braga, faces delays from 2025 to 2026 due to design changes, staffing constraints, and high turnover. This could result in the chip lagging behind NVIDIA’s Blackwell in power efficiency. Microsoft’s chip development, which began in 2019, started four to six years behind Google, and this gap shows in software stack maturity.
Meta’s custom chip, MTIA (Meta Training and Inference Accelerator), is optimized specifically for Meta’s ad ranking and content recommendation systems. While competitive with TPU v5 on recommendation workloads, for general LLM inference it remains 30% to 40% less efficient than TPU v6. Reports in November 2025 indicated Meta is negotiating to purchase billions of dollars worth of Google TPUs, suggesting Meta’s internal chip cannot yet meet all its needs.
Meta acquired AI chip startup Rivos in late September 2025, gaining expertise in RISC-V-based AI inferencing chips, with commercial releases targeted for 2026. This acquisition indicates Meta’s long-term commitment despite current limitations.
OpenAI’s Chip Ambitions
OpenAI, though not a hyperscaler itself, is finalizing the design of its first AI chip with Broadcom and TSMC using TSMC’s 3-nanometer technology. OpenAI’s chip team leadership has experience designing TPUs at Google, bringing valuable expertise to the project. They aim to have their chip mass-produced in 2026.
This development, if successful, would give OpenAI greater control over its cost structure and reduce dependence on NVIDIA supply chains. However, the challenges of chip development, particularly for a company without semiconductor heritage, should not be underestimated.
China’s Domestic Semiconductor Push
Strategic Imperative and Export Control Impact
U.S. export controls on advanced semiconductors to China, tightened progressively since 2022, have created both a crisis and an opportunity for China’s semiconductor industry. Unable to access the most advanced NVIDIA GPUs or the latest manufacturing equipment from ASML and Applied Materials, China has accelerated development of domestic alternatives.
The restrictions have forced Chinese AI companies to stockpile older-generation chips while domestic producers work to close the technology gap. NVIDIA developed specialized versions of its products for the Chinese market, such as the A800 and H800, which meet export control requirements by limiting chip-to-chip communication speeds. However, even these chips face procurement challenges as controls tighten.
Domestic Chip Development Progress
Chinese semiconductor companies have made notable progress despite constraints. SMIC (Semiconductor Manufacturing International Corporation), China’s largest foundry, has reportedly achieved production on a 7nm-equivalent process node, though this remains significantly behind TSMC’s 3nm and upcoming 2nm processes.
Huawei, once reliant on external suppliers, has developed its Ascend series of AI processors. The Ascend 910B, announced in 2023, aims to compete with NVIDIA’s A100, though independent benchmarks suggest it remains behind in performance and power efficiency.
Other Chinese players include Biren Technology, which filed for bankruptcy in 2024 after facing funding challenges; Cambricon Technologies, focusing on edge AI chips; and numerous smaller startups backed by government incentives.
The Memory and Packaging Challenge
China’s semiconductor ecosystem faces particular challenges in advanced packaging and high-bandwidth memory, the same bottlenecks affecting the global industry. Without access to the most advanced packaging equipment and processes, Chinese chip designers struggle to match the integrated performance of chips produced using TSMC’s CoWoS or similar technologies.
Similarly, while China has domestic DRAM and NAND flash producers (primarily CXMT for DRAM and YMTC for NAND), producing high-bandwidth memory meeting HBM3 specifications remains elusive. This gap limits the practical performance of any AI accelerator China develops.
The packaging challenge is particularly acute because it requires not just equipment but also process knowledge accumulated over years of production. Even with access to packaging equipment (which Chinese companies increasingly develop domestically), achieving the yields and reliability standards needed for high-performance AI chips requires extensive trial and error.
Memory production faces similar challenges. While Chinese manufacturers can produce DRAM using older process technologies, HBM requires stacking multiple dies with through-silicon vias, maintaining thermal management across the stack, and achieving bandwidth specifications that push the limits of current technology. These capabilities require years of development investment and close collaboration with chip designers, relationships that export controls disrupt.
State Support and Investment Scale
China has responded to semiconductor constraints with unprecedented government support. The National Integrated Circuit Industry Investment Fund (the “Big Fund”), in its third iteration, has allocated tens of billions of dollars to semiconductor development. Provincial and municipal governments provide additional incentives including subsidized land, utilities, and tax benefits.
This support creates a different competitive dynamic than market-driven Western chip development. Chinese companies can sustain losses that would bankrupt Western startups, pursuing long-term capability development with patient capital. However, state support also creates risk of inefficiency, with multiple companies pursuing parallel development paths without market discipline forcing consolidation.
The talent challenge presents another dimension. While China produces more semiconductor engineers than any other country, the most experienced talent often trained at Western companies or universities. Export controls limiting Chinese nationals’ access to advanced chip design tools and Western research collaborations constrain talent development, though this effect plays out over years rather than immediately.
Alternative Approaches and Workarounds
Faced with constraints on the most advanced nodes, Chinese companies are exploring alternative approaches. Some focus on optimizing architectures for older nodes, achieving acceptable performance through clever design rather than leading-edge manufacturing. Others are investing in novel computing paradigms including analog computing, neuromorphic computing, and optical computing, though these remain in early research stages.
Chinese AI companies have also adapted training approaches to work with available hardware. Techniques like model quantization, sparsity, and distributed training across larger numbers of less powerful chips help overcome individual chip limitations. While less efficient than training on cutting-edge GPUs, these approaches demonstrate the industry’s adaptability to constraints.
Long-Term Implications
The bifurcation of the AI chip market into U.S.-aligned and Chinese ecosystems has profound implications. In the near term, restrictions slow China’s AI development, providing strategic advantage to U.S. technology leadership. However, restrictions also create strong incentives for China to develop fully domestic supply chains.
If successful, this could result in parallel, incompatible technology ecosystems. Chinese AI models would be trained on Chinese chips using Chinese frameworks and software, potentially creating a significant divergence in AI development paths. The economic implications extend beyond semiconductors to the entire AI industry.
Chinese chip development focuses heavily on inference rather than training, recognizing that inference workloads are more tolerant of chip limitations and represent the larger market opportunity as AI deployment scales. This strategic focus could allow China to compete effectively in AI deployment even if its training capabilities lag frontier Western systems.
Some analysts argue that export controls, while slowing Chinese development, may ultimately prove counterproductive by forcing China to build capabilities it would otherwise have outsourced. The timeframe for China to achieve parity remains highly uncertain, with estimates ranging from five to fifteen years depending on assumptions about technology access and investment levels.
The geopolitical implications extend beyond technology. As China develops indigenous capability, it gains influence over countries and companies in its sphere. Nations facing Western export restrictions may turn to Chinese alternatives, creating parallel technology ecosystems with different standards, different security models, and potentially different values embedded in AI systems.
Implications for Model Training Costs
The Historical Cost Trajectory
AI model training costs have followed a dramatic trajectory over the past decade. Training GPT-3 in 2020 cost approximately $4.6 million in compute resources. GPT-4, released in 2023, reportedly required $78 million in hardware costs alone, according to industry estimates, though OpenAI has not confirmed exact figures.
Google’s Gemini Ultra, the most expensive model trained as of 2024, cost $191 million. These escalating costs reflect both the increasing size of models (from billions to trillions of parameters) and the hardware required to train them at reasonable timescales.
The Supply Constraint Impact on Costs
GPU shortages and extended lead times have created a premium market for AI compute capacity. H100 cards, with list prices ranging from $25,000 to $40,000, have traded on secondary markets for considerably higher amounts during supply crunches. Some AI startups report paying 50% to 100% premiums over list price to secure immediate capacity.
Cloud compute costs have risen correspondingly. Major cloud providers allocate capacity based on long-term contracts and strategic relationships, leaving startups and smaller organizations scrambling for spot capacity at premium prices. This dynamic has created a two-tier market where well-funded players with strong cloud provider relationships access compute at reasonable rates while others face significant cost barriers.
The Custom Silicon Impact
The emergence of custom silicon from hyperscalers fundamentally changes the cost equation for companies with access to these alternatives. Anthropic’s use of AWS Trainium and Google TPUs rather than NVIDIA GPUs potentially reduces training costs by 30% to 44%, depending on workload characteristics.
However, these savings come with significant switching costs. Migrating codebases from CUDA to alternative frameworks requires substantial engineering effort. Organizations must rewrite model training code, adjust hyperparameters, and extensively test on new hardware. For many organizations, particularly smaller ones without dedicated platform engineering teams, these switching costs exceed the potential savings.
The Inference Cost Revolution
While training costs capture headlines, inference costs increasingly dominate AI economics as models move into production. A model trained once but deployed at scale may handle millions or billions of inference requests, making per-query costs crucial.
Custom ASICs designed specifically for inference offer dramatic cost advantages. Google TPU v6e pricing, at $0.39 per chip-hour under committed-use discounts, enables inference costs substantially below NVIDIA GPU-based solutions. At 100,000 inference requests per hour, cost differences of even fractions of a cent per request translate to millions in annual savings.
The shift toward inference-optimized hardware represents a fundamental market transition. As AI deployment scales, the economics increasingly favor specialized inference accelerators over general-purpose GPUs. This trend benefits custom silicon providers and creates opportunities for startups like Groq, which claims 800-plus tokens per second inference speeds with specialized architectures.
Looking Forward: Cost Projections
Several countervailing forces will shape training costs over the next five years. On the side of cost increases:
Increasing model size as frontier labs push toward more capable models demands more compute. Models with trillions of parameters require thousands of GPUs or equivalent accelerators training for months.
Data scale continues growing, with some estimates suggesting GPT-5 could require 10 times the compute of GPT-4. As models exhaust publicly available text data, synthetic data generation and multimodal training add computational overhead.
Quality demands, particularly reducing hallucinations and improving reasoning, require more sophisticated training techniques like reinforcement learning from human feedback, which increases compute requirements per model improvement.
On the side of cost decreases:
Hardware improvements through successive chip generations provide consistent efficiency gains. NVIDIA’s roadmap shows 2 to 4 times performance per generation. Custom silicon shows similar or better improvement curves.
Algorithmic advances continue reducing compute requirements for given capability levels. Techniques like mixture of experts, better quantization methods, and improved training procedures increase efficiency.
Competitive pressure drives down prices as NVIDIA faces real competition for the first time. AMD, Intel, and custom silicon create price discipline absent when NVIDIA held monopolistic market share.
Infrastructure optimization, including better data center designs, liquid cooling, and power management, reduces operational overhead per compute unit delivered.
The net effect likely produces relatively stable or slightly declining costs per unit of model capability, even as absolute costs for frontier models continue rising. Organizations training models at scale will find costs manageable if they can access competitive hardware supply chains. Smaller organizations may face continuing challenges accessing sufficient compute at reasonable prices.
The Regulatory and Geopolitical Dimension
Export Controls and Supply Chain Fragmentation
The semiconductor supply chain has become increasingly entangled with geopolitical competition. U.S. export controls targeting China represent the most visible intervention, but they’re part of a broader trend of governments viewing semiconductor capability as strategic infrastructure.
The restrictions create complex compliance challenges for multinational chip companies. NVIDIA must design different product variants for different markets. Cloud providers must ensure that compute capacity isn’t resold or accessed by entities subject to controls. These compliance costs add friction to the global market.
Regional Concentration Risks
The concentration of advanced chip manufacturing in Taiwan creates systemic risk. TSMC manufactures 92% of advanced AI chips, making the semiconductor supply chain extraordinarily vulnerable to geopolitical disruption. A 7.4-magnitude earthquake in Taiwan in April 2024 temporarily halted output at several key facilities, highlighting the fragility of this concentration.
Similarly, SK Hynix provides 62% of high-bandwidth memory globally. These chokepoints affect the entire industry, not just individual companies. Any significant disruption, whether from natural disaster, political instability, or conflict, would create immediate global shortages.
Government Investment and Industrial Policy
Multiple governments have responded with massive investments in domestic semiconductor capabilities. The U.S. CHIPS Act provides $52 billion in subsidies and incentives. The EU’s Chips Act allocates €43 billion. Japan, South Korea, and India have launched similar programs.
These investments aim to diversify manufacturing capacity and reduce dependence on concentrated supply chains. However, building advanced fabrication facilities requires four to five years from groundbreaking to volume production, meaning these investments won’t materially affect supply for years.
Intel’s $20 billion investment in Ohio facilities, while significant, won’t produce advanced logic chips until 2027 or later. TSMC’s Arizona facility faces construction delays and workforce challenges, with full production now expected later than originally planned.
Strategic Recommendations for Operations and Strategy Leaders
Demand Forecasting and Capacity Planning
Organizations dependent on AI compute must treat GPU and accelerator capacity as strategic infrastructure requiring long-term planning. The days of purchasing compute on demand are giving way to a world where capacity must be secured months or years in advance.
Forward-thinking organizations are signing multi-year purchase agreements with cloud providers or chip manufacturers, committing to future capacity in exchange for price certainty and supply guarantees. These agreements require accurate forecasting of AI workload growth, a challenging task given the rapid evolution of AI capabilities and applications.
Organizations should develop scenario-based capacity models that account for different growth trajectories and model architecture evolutions. Having contingency plans for accessing alternative compute sources, whether through different cloud providers, different chip architectures, or different training approaches, provides resilience against supply disruptions.
Architectural Flexibility
The era of NVIDIA-only infrastructure may be ending. Organizations that can architect their AI systems to work across multiple accelerator types gain significant strategic flexibility. This requires investment in abstraction layers that can target different hardware backends.
PyTorch and TensorFlow provide some level of hardware abstraction, but achieving optimal performance on different accelerators often requires hardware-specific optimizations. Organizations should invest in platform engineering capabilities that can port and optimize models across NVIDIA GPUs, AMD GPUs, and potentially custom silicon from cloud providers.
This flexibility provides negotiating leverage with suppliers and resilience against supply chain disruptions. It also positions organizations to take advantage of price competition as the market diversifies.
Cloud Strategy Reconsideration
The emergence of hyperscaler custom silicon changes cloud strategy calculus. Organizations heavily invested in AWS, Google Cloud, or Azure may find compelling economics in using each provider’s custom chips for appropriate workloads.
However, this creates vendor lock-in at the infrastructure level. Code optimized for Trainium doesn’t easily port to TPUs or back to NVIDIA GPUs. Organizations must carefully weigh cost savings against strategic flexibility.
Multi-cloud strategies, while complex and costly to implement, provide options. Training models on one cloud while serving inference on another becomes feasible if organizations invest in the necessary abstractions and tooling.
Supplier Relationship Management
In a capacity-constrained market, relationships with chip suppliers and cloud providers carry strategic value. Organizations should cultivate direct relationships with NVIDIA, AMD, and cloud hyperscalers, clearly communicating capacity requirements and growth trajectories.
For larger organizations, reserved capacity agreements or strategic partnerships provide supply certainty. Smaller organizations might consider consortium purchasing or working through specialized brokers who aggregate demand.
Understanding the supplier’s perspective helps in negotiations. NVIDIA, for instance, prioritizes customers who can provide visibility into long-term demand and who are willing to commit to capacity in advance. Demonstrating these characteristics improves access to constrained supply.
Make-versus-Buy Analysis for Custom Silicon
For the largest organizations, the question of developing custom silicon deserves serious analysis. The hyperscalers have demonstrated that custom chips can deliver substantial cost advantages at sufficient scale.
However, the barriers to entry remain formidable. Chip development costs start at tens of millions of dollars and require multi-year commitments. Organizations need sufficient scale to justify these investments, access to the necessary expertise (often requiring acquisitions or aggressive hiring), and strategic clarity about their long-term AI infrastructure needs.
Most organizations will conclude that custom silicon doesn’t make economic sense. But for companies with truly massive AI workloads, potentially including large financial institutions, telecommunications companies, or major retailers, the analysis merits consideration.
Conclusion: Navigating the Transition
The AI hardware supply chain stands at an inflection point. The acute shortages of 2023 and 2024 are slowly easing as packaging capacity expands and memory production scales. Yet calling this a transition from scarcity to glut oversimplifies a complex, segmented market.
NVIDIA’s dominance will persist in high-performance training workloads, sustained by CUDA’s ecosystem advantages and continuous innovation through annual product releases. The company’s structural advantages, built over two decades, create switching costs no challenger has overcome. However, NVIDIA faces real competition for the first time, forcing price discipline and innovation that ultimately benefits customers.
AMD has established itself as a credible alternative for organizations willing to invest in migrating from CUDA to ROCm. Intel’s cost-leadership strategy targets a different market segment, potentially creating a tiered market where different chips serve different use cases and budget constraints.
The most profound disruption comes from hyperscaler custom silicon. As these chips mature and potentially become available beyond their original developers’ walls, they could capture 15% to 25% of the market by 2030, primarily in inference workloads where their economic advantages are most compelling.
For operations and strategy leaders, the implications are clear. AI infrastructure requires the same strategic planning as any other critical resource. Organizations must develop capacity forecasting capabilities, maintain architectural flexibility, cultivate supplier relationships, and continuously reassess their infrastructure strategies as the market evolves.
The transition from GPU scarcity isn’t to GPU glut but to a more diverse, competitive, and complex market. Organizations that navigate this transition strategically, avoiding both the complacency of assuming abundant supply and the panic of hoarding capacity they don’t need, will find themselves well-positioned for the AI-driven future. Those that treat AI infrastructure as tactical procurement will find themselves consistently supply-constrained, paying premium prices, and unable to execute on their AI strategies.
The semiconductor supply chain’s evolution over the next five years will shape which organizations can afford to compete in AI-intensive markets. The winners won’t necessarily be those with the largest budgets but those with the most sophisticated understanding of the supply chain dynamics and the strategic agility to adapt as conditions change.
Sources
- Bain & Company. Prepare for the Coming AI Chip Shortage. https://www.bain.com/insights/prepare-for-the-coming-ai-chip-shortage-tech-report-2024/
- Medium (elongated_musk). (2025, April 26). How the Chip Shortage Never Really Ended. https://medium.com/@Elongated_musk/how-the-chip-shortage-never-really-ended-fbcc663aa3bd
- Wccftech. (2023, September 8). NVIDIA’s AI GPU Shortage Could Last Till 2025 Due To Supply Constraints, Says TSMC. https://wccftech.com/nvidia-ai-gpu-shortage-could-last-till-2025-due-to-supply-constraints-says-tsmc/
- Sourceability. AI demand sparks memory supply chain strain. https://sourceability.com/post/ai-chip-shortages-deepen-amid-tariff-risks
- Tom’s Hardware. (2024, November 21). Nvidia warns of gaming GPU shortage this quarter, recovery in early 2025. https://www.tomshardware.com/pc-components/gpus/nvidia-warns-of-gaming-gpu-shortage-this-quarter-recovery-in-early-2025-chipmaker-rakes-in-record-profits-as-net-income-soars-by-109-percent-yoy
- CNBC. (2025, December 2). A ‘seismic’ Nvidia shift, AI chip shortages and how it’s threatening to hike gadget prices. https://www.cnbc.com/2025/12/02/nvidia-shift-ai-chip-shortages-threatening-to-hike-gadget-prices.html
- Eidosmedia. Why is there a chip shortage in AI? https://www.eidosmedia.com/updater/technology/the-global-impact-of-gpu-chip-shortage-on-generative-ai-models
- TechRepublic. (2024, September 27). AI Surge Could Trigger Global Chip Shortage by 2026. https://www.techrepublic.com/article/ai-chip-shortage-global-supply-crisis/
- Computerworld. (2025, May 30). AI chip shortages continue, but there may be an end in sight. https://www.computerworld.com/article/2098937/ai-chip-shortages-continue-but-there-may-be-an-end-in-sight.html
- Tom’s Hardware. (2025, November 2). Microsoft CEO says the company doesn’t have enough electricity to install all the AI GPUs in its inventory. https://www.tomshardware.com/tech-industry/artificial-intelligence/microsoft-ceo-says-the-company-doesnt-have-enough-electricity-to-install-all-the-ai-gpus-in-its-inventory-you-may-actually-have-a-bunch-of-chips-sitting-in-inventory-that-i-cant-plug-in
- PatentPC. (2025, December). The AI Chip Market Explosion: Key Stats on Nvidia, AMD, and Intel’s AI Dominance. https://patentpc.com/blog/the-ai-chip-market-explosion-key-stats-on-nvidia-amd-and-intels-ai-dominance
- AI Multiple Research. Top 20+ AI Chip Makers: NVIDIA & Its Competitors. https://research.aimultiple.com/ai-chip-makers/
- Yahoo Finance. (2025, October 20). Nvidia’s Big Tech customers might also be its biggest competitive threat. https://finance.yahoo.com/news/nvidias-big-tech-customers-might-also-be-its-biggest-competitive-threat-153032596.html
- CNBC. (2024, June 2). Nvidia dominates the AI chip market, but there’s more competition than ever. https://www.cnbc.com/2024/06/02/nvidia-dominates-the-ai-chip-market-but-theres-rising-competition-.html
- TechInsights. Data-Center AI Chip Market – Q1 2024 Update. https://www.techinsights.com/blog/data-center-ai-chip-market-q1-2024-update
- SQ Magazine. (2025, October 7). AI Chip Statistics 2025: Funding, Startups & Industry Giants. https://sqmagazine.co.uk/ai-chip-statistics/
- Tom’s Hardware. (2025, February 19). Nvidia to consume 77% of wafers used for AI processors in 2025: Report. https://www.tomshardware.com/tech-industry/artificial-intelligence/nvidia-to-consume-77-percent-of-wafers-used-for-ai-processors-in-2025-report
- TS2 Tech. (2025, June 27). NVIDIA 2025: Dominating the AI Boom – Company Overview, Key Segments, Competition, and Future Outlook. https://ts2.tech/en/nvidia-2025-dominating-the-ai-boom-company-overview-key-segments-competition-and-future-outlook/
- Octopart. NVIDIA Holds 80% AI Chip Market Share: Who’s the Next AI Chip Supplier? https://octopart.com/pulse/p/nvidia-holds-80-ai-chip-market-share-whos-next-ai-chip-supplier
- Technology Magazine. (2024, November 8). How Nvidia’s AI Made It the World’s Most Valuable Firm. https://technologymagazine.com/articles/how-nvidias-ai-made-it-the-worlds-most-valuable-firm
- Dolphin Studios. (2025, March 24). Comparing the NVIDIA H100, AMD MI300, and others. https://dolphinstudios.co/comparing-the-ai-chips-nvidia-h100-amd-mi300/
- FinancialContent. (2025, November 10). AMD Ignites AI Chip Wars: A Bold Challenge to Nvidia’s Dominance. https://markets.financialcontent.com/stocks/article/tokenring-2025-11-10-amd-ignites-ai-chip-wars-a-bold-challenge-to-nvidias-dominance
- Vertu. (2025, May 30). 10 Leading AI Hardware Companies Shaping 2025. https://vertu.com/ai-tools/top-ai-hardware-companies-2025/
- TechTarget. 10 top AI hardware and chip-making companies in 2025. https://www.techtarget.com/searchdatacenter/tip/Top-AI-hardware-companies
- SunTzu Recruit. (2024, August 9). AI Chip Market: The Rivalry Between Nvidia, Intel, and AMD. https://www.suntzurecruit.com/2024/08/09/ai-chip-market-the-rivalry-between-nvidia-intel-and-amd/
- Accio. Trending GPU Chips 2025: Top Picks for Gaming & AI. https://www.accio.com/business/trending-gpu-chips
- FinancialContent. (2025, September 9). AI Chip Wars Intensify: NVIDIA’s Dominance Challenged by Aggressive Rivals and Hyperscalers’ Custom Silicon Push. https://markets.financialcontent.com/stocks/article/marketminute-2025-9-9-ai-chip-wars-intensify-nvidias-dominance-challenged-by-aggressive-rivals-and-hyperscalers-custom-silicon-push
- CNBC. (2025, November 21). Nvidia Blackwell, Google TPUs, AWS Trainium: Comparing top AI chips. https://www.cnbc.com/2025/11/21/nvidia-gpus-google-tpus-aws-trainium-comparing-the-top-ai-chips.html
- Uncover Alpha. (2025). The chip made for the AI inference era – the Google TPU. https://www.uncoveralpha.com/p/the-chip-made-for-the-ai-inference
- MLQ.ai. AI Chips & Accelerators. https://mlq.ai/research/ai-chips/
- Tom’s Hardware. (2025). Google TPUs garner attention as AI chip alternative, but are only a minor threat to Nvidia’s dominance. https://www.tomshardware.com/tech-industry/semiconductors/nvidia-responds-as-meta-explores-switch-to-google-tpus
- CNBC Video. (2025, November 21). Nvidia GPUs, Google TPUs, AWS Trainium: Comparing the top AI chips. https://www.cnbc.com/video/2025/11/21/nvidia-gpus-google-tpus-aws-trainium-comparing-the-top-ai-chips.html
- FinancialContent. (2025, November 6). The Dawn of a New Era: Hyperscalers Forge Their Own AI Silicon Revolution. https://markets.financialcontent.com/stocks/article/tokenring-2025-11-6-the-dawn-of-a-new-era-hyperscalers-forge-their-own-ai-silicon-revolution
- AI News Hub. (2025). Nvidia to Google TPU Migration 2025: The $6.32B Inference Cost Crisis. https://www.ainewshub.org/post/nvidia-vs-google-tpu-2025-cost-comparison
- Bebooja. (2025). Google TPU to Replace GPUs? Will Nvidia’s Dominance End? https://www.bebooja.com/en/blog/market/2025-bigtech-custom-chips-2025-part1
- Data Center Frontier. Inside Anthropic’s Multi-Cloud AI Factory: How AWS Trainium and Google TPUs Shape Its Next Phase. https://www.datacenterfrontier.com/machine-learning/article/55335703/inside-anthropics-multi-cloud-ai-factory-how-aws-trainium-and-google-tpus-shape-its-next-phase
- Futurum Group. (2024, May 16). The Future of AI Infrastructure: Unpacking Google’s Trillium TPUs.https://futurumgroup.com/insights/the-future-of-ai-infrastructure-unpacking-googles-trillium-tpus/
