You can increase the context level (up to one million tokens in Qwen2.5-7B-Instruct-1M / 14B-Instruct-1M and 128k tokens in Qwen2.5-7B-Instruct) in LM Studio for Qwen models. A quick search should reveal how to do this @AdrianoLuiz
Some Qwen2.5-Instruct models (like the standard Instruct variants) typically support up to 32 K tokens natively.
However, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M models support up to 1 million tokens—a gigantic increase—using advanced methods like Dual Chunk Attention and length extrapolation techniques. Hugging FaceBind AI IDEQwenSimon Willison’s Weblog
2. Enabling Extended Context via YaRN
For Qwen2.5-7B-Instruct, you can extend the context window from 32 K to 128 K tokens by enabling YaRN (a RoPE scaling technique). GitHub
However, this currently uses static YaRN (available via vLLM): while it does extend context, it may slightly reduce performance on short-context tasks. GitHub
Dynamic YaRN (which adjusts based on input length) is only available via Alibaba Cloud ModelStudio API as of now. GitHub
3. LM Studio capabilities
LM Studio allows customizing context length when loading a model. For example:
import lmstudio as lms
model = lms.llm("qwen2.5-7b-instruct", config={
"contextLength": 8192,
# other settings...
})
You can set contextLength, ropeFrequencyBase, ropeFrequencyScale, and other advanced parameters. LM Studio+1
However, some users have encountered errors when trying to load models with very large context lengths—even if the model claims to support them—depending on your hardware constraints. GitHub+1
Summary Table
Model / Setting
Native Context Support
Extended Support
Qwen2.5-7B-Instruct
32 K tokens
Up to 128 K via static YaRN
Qwen2.5-7B-Instruct-1M / 14B-Instruct-1M
Up to 1 M tokens
Fully supported via DCA, etc.
LM Studio Configuration
Depends on hardware
Customizable via contextLength, ropeFrequencyScale, etc.
What You Can Do
Check the model's context length in LM Studio using model.getContextLength() before running. LM Studio
If you're using a 1M-capable model, you can indeed set contextLength up to 1,000,000 in your load config—assuming your GPU/VRAM can handle it (e.g., 120 GB+ for 7B-Instruct-1M, 320 GB+ for 14B-Instruct-1M). Simon Willison’s WeblogBind AI IDE
If you're on Qwen2.5-7B-Instruct (standard) and want to push beyond 32K tokens, you can enable YaRN via model config with caution—this may impact short-context performance. GitHub
Test different configurations and monitor memory consumption. If loading fails at very high values, try lowering until stable.