GitHub Copilot's Token Billing: The Enshittification Everyone Saw Coming

GitHub just announced something they’re calling a “sustainable business model.” What they mean is: they’re done absorbing the burn.

On June 1, 2026, GitHub Copilot moves from premium request counting to token-based billing. Your monthly subscription—whether it’s $10, $39, or per-seat enterprise—doesn’t change. But what you actually get for that money is about to shrink by 80 to 90 percent, depending on which model you were using.

This isn’t a technical improvement. It’s a retreat from a promise that was never sustainable to begin with, dressed up in product language that makes it sound inevitable. It’s not. It’s a choice. GitHub chose to stop paying for your AI agent’s wandering, and now you pay for every token it burns.

The pattern is familiar. Every SaaS company follows it: launch at unsustainable pricing to gain users, build critical adoption, then tighten the economics. The only question was never if, but when and how painfully.

We’re about to find out.

What GitHub Actually Announced

Let me translate the announcement from LinkedIn-speak to engineering-speak.

What they said: “We’re moving to a sustainable usage-based model aligned with actual consumption.”

What that means: Premium request counting was hiding how much this actually costs. Copilot agents can run for hours, reading codebases, iterating, second-guessing themselves, burning tokens like a jetplane burns fuel. The old model capped the damage. You got 300 requests per month, and whether you used them efficiently or inefficiently, the bill was the same. GitHub ate the variance.

Token-based billing exposes that variance. Every time the agent reads an unrelated file because your grep search was too broad, you pay. Every time it enters a thinking loop and argues with itself for 50 iterations, you pay. Every time it retries a task because it hallucinated, you pay.

What they said: “Code completions and Next Edit suggestions remain included and don’t consume credits.”

What that means: The only free part is the stuff that was already free and not very useful. The moment you open the chat, ask the agent anything, or let it do actual work, you’re on the meter.

What they said: “Fallback experiences will no longer be available.”

What that means: Today, when you run out of premium requests, Copilot falls back to GPT-4o Mini, which is slower but free (in terms of request accounting). Next month, when you run out of credits, you get nothing. The work stops. Your agent pauses. You either pay more or close the chat.

What they said: “Model multipliers will increase for annual plan subscribers.”

What that means: If you bought an annual plan, you just got your deal retroactively worse. Claude Sonnet 4.6 goes from a 3x multiplier to a 27x multiplier. GPT-5.4 goes from 1x to 6x. That’s not a typo. That’s nine times the previous cost for the same model.

The Math That Actually Matters

Let’s get concrete.

Under the old system (request-based):

Copilot Pro: $10/month → 300 premium requests
Copilot Pro+: $39/month → 500 premium requests (approximately)

A “request” wasn’t clearly defined. Sometimes it was a chat message. Sometimes it was a multi-file operation. But the point was: you got a fixed bucket, and it lasted roughly a month if you weren’t insane.

Under the new system (token-based):

Copilot Pro: $10/month → $10 in AI Credits
Credits consumed = tokens used × model rate

Here’s where it gets ugly.

Let’s say you’re running a multi-file task with Claude Sonnet 4.6 (a popular model for complex work). You give the agent:

Your question: ~100 tokens
A codebase dump: ~50,000 tokens (a medium-sized file tree)
The agent responds: ~5,000 tokens output
The model multiplier for Sonnet 4.6 on Copilot: 27x (new rates)

Cost per turn: (50,000 + 5,000 input tokens + 5,000 output tokens) × 27 × $0.000001 (approximate API rate)

One turn costs you approximately 3-4 dollars. A task with 3-4 iterations burns your entire month’s credit. Under the old system, that same task was one of your 300 requests. Free.

Multiply that across a team. You wanted to scale AI agents across fifty developers. You just quadrupled your bill and made the experience actively worse because now every dev is managing their token burn mentally.

Why This Happened

GitHub isn’t stupid. They knew this would be unpopular. They moved anyway. Why?

Because the math underneath forced their hand.

Copilot today isn’t the autocomplete tool from 2022. It’s an agent system that can run multi-hour coding sessions, spawn subagents, hit APIs, read entire repositories into context, and iterate autonomously. That’s expensive. A single $10/month user running heavy agent work could cost GitHub 50 or 100 dollars in infrastructure.

Microsoft (GitHub’s parent) committed to making LLMs competitive with human developers on real work. They built the agent framework, the integration with VS Code, the MCP protocol. All of that is real engineering, real hardware, real burn. Millions of dollars per month.

They expected to monetize it later. Later is now.

The only question was how to extract that money without losing all the users they gained by pricing it irrationally cheap. Token-based billing solves that: it looks like a “fair” pricing model that just “charges for what you use.” It’s not. It’s a deliberate compression of what you get for the same price, designed to look inevitable instead of greedy.

GitHub also has competitive pressure. Claude 3.5 Sonnet just came out and it’s better than GPT-4 Turbo in many tasks. Anthropic charges similar rates to OpenAI via API. If GitHub is reselling access through Copilot, they need to hit some margin target to justify the infrastructure. Token-based billing lets them set that margin explicitly.

It’s rational. It’s also ugly.

What They Won’t Tell You

The announcement has specific language that’s easy to miss. Let me highlight the traps.

Trap 1: “Plan prices aren’t changing”

Correct. The subscription price is the same. But the value is cut by 80-90 percent for heavy users. GitHub could have said “value per dollar is being reduced,” but they didn’t. Technically accurate, completely misleading.

Trap 2: “Full control over what you spend”

This is the cruelest line. You have “control” in the sense that you can set a budget cap and prevent overages. But if you cap spend at $10/month and you need to iterate on a complex task, the agent just stops. “Full control” in practice means “we’re gatekeeping your work.”

Compare this to the old system: overages didn’t exist. You ran out of requests and fell back to a slower model. You could still work. Now you can’t.

Trap 3: “Pooled included usage across a business”

This sounds nice until you realize what it means: stranded capacity doesn’t exist anymore. Under request-based billing, if a developer didn’t use all 300 requests, those requests died. Now, credits pool, which means accounting can see exactly which team burned how much, down to the individual. More visibility for you, more control for management to restrict your access.

Trap 4: No more fallback models for individuals

The old system: run out of requests → fall back to GPT-4o Mini.

The new system: run out of credits → blocked.

For enterprise customers, GitHub added a “promotional” inclusion: $70/month in free credits instead of $39 worth. They’re bribing you to accept a worse deal because it’s slightly better than it could have been.

Trap 5: Annual plans are getting massacred

If you bought an annual subscription, congratulations, you’re grandfathered into a worse deal than new customers get. Your model multipliers increased 9x on June 1. GitHub effectively retroactively doubled or tripled your subscription cost for people mid-contract.

What This Means in Practice

You’re a developer using Copilot as your daily agent. You run 3-4 coding tasks per day, each involving 3-5 agent iterations. Let’s do the math.

Old system (request-based):

300 requests/month
4 tasks × 20 work days = 80 tasks/month
4 iterations per task = 320 iterations/month
Some iterations were “light” (chat questions), some were “heavy” (multi-file operations)
Average: you hit your limit around day 25-27
Cost to you: $10/month

New system (token-based):

$10 in credits
Heavy task = ~$3-5 in tokens per iteration
4 tasks/day × 3 iterations = 12 iterations/day
12 iterations × 20 days = 240 iterations/month
If 20% are heavy (high token count) and 80% are light: ~$15-20 in costs
Cost to you: $10 + $5-10 overages = $15-20/month minimum

You just got a 50-100% price increase if you want to keep your productivity the same.

Now scale that to a team of fifty developers. What was a $500/month bill is now $1,500-2,000/month. Engineering managers will be asked to justify this to finance. Most will reduce Copilot access.

Your Real Options

Don’t panic into rage-quitting. But do understand what your real choices are right now.

Option 1: Accept the cost increase and manage your prompts

You can make this work if you optimize hard. Stop dumping entire codebases into context. Write precise, narrow prompts. Use plan mode instead of agent mode to reduce iterations. Stop asking the agent to “figure it out” and instead give it step-by-step instructions.

This works. I know developers doing this. The tradeoff: you’re now doing more of the thinking work, and the AI is doing less of the creative exploration. You’re using it as a tool, not a partner. For some tasks, that’s fine. For exploratory architecture work, it’s painful.

Cost: $10-30/month depending on discipline.

Effort: High. You’re constantly aware of token burn.

Option 2: Switch to OpenAI API directly with your own client

You pay OpenAI directly for what you use. GPT-4o runs about $0.005 per 1K input tokens, $0.015 per 1K output tokens. A heavy iteration costs $0.05-0.15. You get unlimited iterations and full control over the model selection.

Tools like Continue (VS Code extension) let you point at any OpenAI API key and run local agent loops. You wire your own API billing and you’re not dependent on GitHub’s infrastructure or their credit pool.

Advantages: Lower cost if you’re disciplined, full control, no fallback models or credit limits.

Disadvantages: You manage your own billing, you manage your own agent framework, if you screw up the API setup you’re debugging infrastructure, not shipping features.

Cost: $5-20/month if you’re disciplined, unlimited if you’re not.

Option 3: Local models with Ollama or LM Studio

Run Claude 3.5 Haiku or Qwen 3.6 locally. You need a GPU (Mac with 64GB unified memory works, AMD 7900 XTX works, RTX 4090 works). You get unlimited usage, zero inference cost, full privacy.

Advantages: No cloud bill, no GitHub dependency, models run offline, total control.

Disadvantages: Models are smaller and slower, code completion takes 5-10 seconds instead of 1, you’re managing your own LM infrastructure, multifile operations are harder to orchestrate.

Cost: One-time GPU hardware ($500-2000), then $0.

Option 4: Claude API directly (via Anthropic)

Claude is available via direct API from Anthropic. Run it locally or cloud, manage your own credits. You pay for tokens, nothing else. No per-request pricing, no monthly fees, no surprise multipliers.

Advantages: Competitive pricing, excellent model quality, you own the relationship with the model provider.

Disadvantages: You lose VS Code integration (partly), you lose GitHub’s agent framework (you use Claude’s agent API instead), setup is more technical.

Cost: $5-20/month for typical developer usage.

Option 5: Hybrid approach

Use Copilot for autocomplete and light chat (free). Use Claude API or local Ollama for heavy agent work. Use OpenAI’s API for quick tasks when you need GPT-4o specifically. Mix and match based on task type.

This is what I’d recommend for most developers. You’re not all-in on any vendor, you’re paying for exactly what you need, and you’re keeping your optionality.

Cost: $20-40/month total across all tools, much better ROI than Copilot Pro at $39 alone.

The Deeper Problem

Here’s what GitHub doesn’t want you to think about: this is textbook enshittification.

Enshittification happens in three phases:

Hook the user: Platform launches at unsustainable pricing, builds user base, creates switching costs and workflow dependency.
Exploit the lock-in: Platform raises prices or reduces quality, betting users won’t leave because they’re now dependent.
Monetize the extracted value: Platform extracts maximum revenue from users until competition emerges or users finally leave.

GitHub is in phase 2 right now. They successfully got millions of developers hooked on Copilot as an integral part of their daily coding. Many of you have muscle memory around it, workflows built on it, teams standardized on it. Now they know you’re unlikely to all migrate at once.

The token-based model is just the instrument. The real value extraction is this: GitHub now has perfect visibility into what you do. They can see which models you use, which tasks you run, how much you’re willing to pay. They can adjust pricing per team, per user, per geography if they want. They have economics data on millions of developers’ work patterns. That data is valuable.

Eventually, phase 3 happens. A competitor (Claude native integration, local models, OpenAI native integration) takes over. GitHub loses market share. They’ve extracted their money. They move on.

You can see this coming because it’s the same cycle every infrastructure company has gone through. AWS, GitHub itself (through GitHub Enterprise), Slack. Launch cheap, lock in, extract, lose to a fresher competitor with better pricing.

The question isn’t whether this will happen. It’s whether you’re going to be caught in phase 3 with no other options, or if you’ll start hedging now.

What To Do Right Now

Don’t rage-cancel. That might feel good but it’s reactive.

Instead:

Audit your actual Copilot usage. Log into GitHub’s billing page on May 1 when the preview bill is available. See your projected costs. If it’s under $15/month, probably fine to stay. If it’s $30+, seriously consider alternatives.
Build a hybrid setup. Continue using Copilot for autocomplete and quick chat. Set up an API key for Claude or OpenAI for heavy agent work. Use VS Code extensions like Continue to orchestrate multiple model sources.
Test local models. Spend a weekend setting up Ollama or LM Studio with Qwen 3.6 or Claude Haiku. See if the speed tradeoff is acceptable for your use case. You’ll know quickly whether “good enough” is really good enough.
Don’t let sunk cost bias trap you. You’ve been paying for Copilot Pro at $10/month and feeling like it’s cheap. That price anchoring is part of what GitHub is betting on. Calculate your total 2024-2026 spend. That’s water under the bridge. Your decision now should be based on what’s rational for the next 12 months, not what you’ve already paid.
Talk to your team. If you’re leading developers, flag this to engineering leadership now. Don’t wait for June when people are discovering $50/month bills. Run the projections for your team. Budget for it or make a plan to switch tools. Don’t let this surprise you.
Keep your options warm. Start a small pet project using Claude API directly, or Ollama locally. When you inevitably need to make a switch, you won’t be starting from zero.

The pattern isn’t new, but the speed is. GitHub didn’t need to move this fast. They chose to. That tells you everything about the pressure they’re under and how badly they want to establish token-based pricing before enough of you leave. Don’t reward that by staying complacent.