400 prompt token count of 100701 exceeds the limit of 90000

Made my first AI request of the day and that's the error I was met with. Any ideas anyone?
Ask mode; Claude Sonnet 3.7
Gemini 2.5 went through

I have no idea how tokens work, but maybe your request was too long? Just guessing here. No idea how that works.

68 words, 410 characters - and Gemini responded without hesitation - so I'm betting it's Claude.
But that whole token counting has got me lost too.. :grin:

Lets cheat and use AI to explain it...

Token Calculation by Model

1. OpenAI GPT-4 / GPT-4o

  • Tokenizer: Uses tiktoken, a byte pair encoding (BPE) tokenizer.
  • Token size:
    • 1 token β‰ˆ 4 characters (English), 0.75 words.
    • Emojis, punctuation, and whitespace are all tokenized.
  • How it's calculated: You can use tiktoken to tokenize and count tokens.
  • Example:
    "I love AI." β†’ 5 tokens: ["I", " love", " AI", ".", ""]

2. Anthropic Claude (Opus, Sonnet, Haiku)

  • Tokenizer: Custom tokenizer, similar to sentencepiece.
  • Token size:
    • 1 token β‰ˆ 3–4 characters on average.
  • Notable Feature: Claude's tokenizer is more optimized for English than OpenAI's in some ways, meaning fewer tokens for the same sentence.
  • How it's calculated: Anthropic has not open-sourced its tokenizer, but 3rd party tools estimate similar token counts to OpenAI's.

3. Google Gemini (formerly Bard)

  • Tokenizer: Based on SentencePiece, often with BPE or Unigram LM.
  • Token size:
    • 1 token β‰ˆ 3–4 characters (varies significantly by language).
  • Multilingual support: Highly optimized for multilingual input.
  • How it's calculated: Use Google’s open-source SentencePiece models to replicate.

Let's break down what 100,701 tokens roughly equates to.

Since token-to-word ratios vary by language and complexity, here's a range:

Language Complexity Words per Token Estimated Words for 100,701 Tokens
Simple English (average) ~0.75 ~75,525 words
Complex English (technical/legal) ~0.66 ~66,460 words

So, 100,701 tokens β‰ˆ 66,000–75,000 words.


:globe_with_meridians: Web Page Size Estimate

1. By Word Count

Based on common webpage word counts:

Page Type Avg. Word Count 100,701 Tokens Can Fit...
Blog Post ~1,000 ~66–75 pages
Marketing Landing Page ~500 ~132–150 pages
Technical Docs Page ~2,000 ~33–38 pages

2. By Character Count / File Size

  • 1 token β‰ˆ 4 characters β†’ 100,701 tokens β‰ˆ 402,804 characters
  • That’s about 400 KB of raw text (uncompressed).
  • In terms of HTML page size, with basic styling/markup, that would be about 500–700 KB.

For comparison:
A typical full-featured web article (with light media and CSS) is 200–500 KB.

:brain: Summary:

  • 100,701 tokens = about 66,000–75,000 words
  • That's like:
    • A short novel (e.g., The Great Gatsby)
    • A manual or guidebook
    • 60–75 blog posts
    • A medium-sized website's text content

LOL - showoff -

but how do I get an error on the first paragraph of the day

Hahahahaha! I asked AI!

:rofl:

Ask Claude. I'm sure it will apologise in a very nice way.

:innocent:

In all seriousness I don't know Jimed. Seems very strange though.

1 Like

See also my explanation about the context size and tokens usage as we supply Wappler specific knowledge to the AI models:

So on larger pages it is advisable to use models with larger context size.

We will be adding soon also the Google Gemini as direct provider which goes up to 1MB on tokens context size and should be plenty

2 Likes