Copilot using excessive tokens as Claude 4 'reasons' to waste even more!

Cheese · August 11, 2025, 10:21pm

Appears Claude 4 in Copilot is now using excessive tokens as it is reasoning more. Wondered why usage limits were being hit so quick! Bit naughty. But typical of Anthropic! Their WHOLE business model seems to be waste tokens, waste more tokens, waste even more tokens, then on top waste some more tokens...

Cheese · August 11, 2025, 10:40pm

And when its not reasoning it uses more tokens...

Summarizing conversation history...

Again and again and again (ten fold more than Gemini or any other model).

Cheese · August 12, 2025, 12:27am

I'd say using Claude 3.7 uses 40% less reasoning than Claude 4 and gets the same if not better results (JavaScript especially). If anyone is interested... We have also updated our LLM instructions to only respond minimally (ie just acknowledge 'Yes that is done' rather than giving us the dogs danglies about everything Claude just did all eighteen paragraphs of it in a Biblical fashion including chapter and verse yada-yada).

Evaldas · August 12, 2025, 6:57am

Yup, I am starting to think whether this isn't done deliberately to force users to subscribe to Claude Max. Warp handles context compression very well; I never have to compress my context, ever! However, the issue remains the same: it spits out large responses that consume A LOT of tokens, and reasoning? Well, you don't really have much control over it. Claude Code does allow you to set the reasoning level, but inside Copilot, Warp, Windsurf, etc.? You're shit out of luck!

Vibe coders are by far the biggest market for them, and since the PRO plan alone does not one-shot a complete SaaS application, they're forced to go with the MAX plan, since the Pro plan on Opus literally gives you 10 prompts before you hit the rate limit.