You can increase the context level (up to one million tokens in Qwen2.5-7B-Instruct-1M / 14B-Instruct-1M and 128k tokens in Qwen2.5-7B-Instruct) in LM Studio for Qwen models. A quick search should reveal how to do this @AdrianoLuiz
Some Qwen2.5-Instruct models (like the standard Instruct variants) typically support up to 32 K tokens natively.
However, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M models support up to 1 million tokens—a gigantic increase—using advanced methods like Dual Chunk Attention and length extrapolation techniques. Hugging FaceBind AI IDEQwenSimon Willison’s Weblog
2. Enabling Extended Context via YaRN
For Qwen2.5-7B-Instruct, you can extend the context window from 32 K to 128 K tokens by enabling YaRN (a RoPE scaling technique). GitHub
However, this currently uses static YaRN (available via vLLM): while it does extend context, it may slightly reduce performance on short-context tasks. GitHub
Dynamic YaRN (which adjusts based on input length) is only available via Alibaba Cloud ModelStudio API as of now. GitHub
3. LM Studio capabilities
LM Studio allows customizing context length when loading a model. For example:
import lmstudio as lms
model = lms.llm("qwen2.5-7b-instruct", config={
"contextLength": 8192,
# other settings...
})
You can set contextLength, ropeFrequencyBase, ropeFrequencyScale, and other advanced parameters. LM Studio+1
However, some users have encountered errors when trying to load models with very large context lengths—even if the model claims to support them—depending on your hardware constraints. GitHub+1
Summary Table
Model / Setting
Native Context Support
Extended Support
Qwen2.5-7B-Instruct
32 K tokens
Up to 128 K via static YaRN
Qwen2.5-7B-Instruct-1M / 14B-Instruct-1M
Up to 1 M tokens
Fully supported via DCA, etc.
LM Studio Configuration
Depends on hardware
Customizable via contextLength, ropeFrequencyScale, etc.
What You Can Do
Check the model's context length in LM Studio using model.getContextLength() before running. LM Studio
If you're using a 1M-capable model, you can indeed set contextLength up to 1,000,000 in your load config—assuming your GPU/VRAM can handle it (e.g., 120 GB+ for 7B-Instruct-1M, 320 GB+ for 14B-Instruct-1M). Simon Willison’s WeblogBind AI IDE
If you're on Qwen2.5-7B-Instruct (standard) and want to push beyond 32K tokens, you can enable YaRN via model config with caution—this may impact short-context performance. GitHub
Test different configurations and monitor memory consumption. If loading fails at very high values, try lowering until stable.
Resurrecting this thread to see what open-weight models the Wappler community’s had the most success with (for those that have the sufficient VRAM to run local). If you have a beefy local dev rig (>24GB of VRAM), what models have you found most “knowledgeable” of the Wappler framework? GPT-OSS120, GLM-4.5 Air, Qwen 3 Coder, Codestral, etc? Appreciate any input from those that’ve maybe tried a bunch.
My company’s thinking about picking up an RTX6000 Pro for local inference, so I’m interested to know what’s best to throw at it for Wappler specifically.
Before going all in I'd suggest trying Digital Oceans services for AI/ML workloads and play around with it all (rent by the hour). Will give you quick answers without the initial investment. For us the DeepSeek and Qwen models worked best (although I hear GPT-OSS120 can be quite good but when we went in on it this model was not yet released). DeepSeek has recently released their latest version which looks very promising but am yet to play around with it so don't really know any more about its full capabilities other than what is published (you can head on over to DeepSeek and find out for yourself). The previous incarnations did have a reasonable understanding of Wappler though. We put together some nice Projects with both of these models. Lots of errors and issues but ironed them out in the end. As it is Open at least you don't have to pay for the failures. Perseverance and keeping at it will get you there in the end.
Seems pretty clued up... I went ahead and asked about Wappler and here is the response:
Hey Deepseek what is your understanding of Wappler? Do you have enough knowledge of Server Connect and App Connect and structures to be viable to use in creating Projects using Wappler as the IDE?
Hmm, this is a specific technical question about Wappler and its components. The user is asking about my understanding of Wappler and whether I have sufficient knowledge of Server Connect and App Connect to assist with projects.
I should confirm my awareness of Wappler as a low-code/no-code IDE and demonstrate familiarity with its key features. The user likely wants to know if I can provide practical guidance, not just general information.
I recall that Server Connect handles backend integrations while App Connect manages frontend interactions. A structured breakdown of these components would be helpful. Since Wappler projects involve specific patterns like data bindings and REST API generation, I should mention those too.
The user might be evaluating whether to use me as a resource for Wappler development, so I should emphasize my ability to assist with architecture, troubleshooting, and best practices while acknowledging some limitations regarding real-time testing. A concrete example could make this more tangible.
Yes, I have a solid understanding of Wappler and its core components. Here's my knowledge breakdown:
Understanding of Wappler
Wappler is a visual, low-code/no-code IDE for full-stack web and mobile app development. It combines:
Visual development with drag-and-drop UI building
Real-time visual data binding to backend services
Direct code editing alongside visual tools
Integrated deployment and version control
Key Components I Understand:
1. Server Connect
Backend API & Database integration layer
Visual configuration of database connections (MySQL, PostgreSQL, etc.)
Auto-generation of REST APIs from database schemas
Architecture planning for Server Connect data models
API design and endpoint structure
Data binding strategies in App Connect
Component composition and reuse patterns
Authentication flows and user management
Database schema design and optimization
Performance optimization strategies
Troubleshooting common Wappler issues
Best practices for maintainable Wappler projects
Integration patterns with external services
Limitations to Note:
I cannot directly execute Wappler-specific commands
I don't have real-time access to the Wappler interface
Some visual configuration specifics might require screenshots for precise guidance
Project setup details might need Wappler documentation reference
Example Areas I Can Help:
Designing a Server Connect data model for a multi-user app 2. Creating efficient App Connect bindings for complex UI 3. Implementing proper authentication/authorization flows 4. Optimizing database queries and API responses 5. Structuring reusable components 6. Planning deployment strategies
Yes, I have sufficient knowledge to be viable for assisting with Wappler projects. Many developers use me as a "thinking partner" for architectural decisions, troubleshooting, and learning best practices while working in Wappler.
Would you like help with a specific Wappler project or concept?
Thanks for the thoughtful response @Cheese ! (I assume by latest-version on Deepseek, you must be referring to 3.2) I appreciate the votes on both that and Qwen, on your prior projects, approx what param count / quantization were you running, if you remember?
I logged our optimum settings as took a while to figure it all out. Probably not too useful but DeepSeek on dual H200's (Digital Ocean) ran best as follows:
Which gave us on average between 150-250 tokens per second per GPU. I know that is quite a phenomenal number! Especially now that we use Open Router or Copilot. Man they feel soooooo slow. Just can't afford to sit on 2 x H200's 24/7 hahaha. One day!!
Yes that is the most recent. The response I posted above (regarding Wappler) is from 3.2.
EDIT: I must add the support Digital Ocean provided us was amazing! Can not fault those guys in any way whatsoever. Without them we would have been dead in the water as such a steep learning curve. All questions we asked, and we asked hundreds, were answered within minutes by their team.
Excellent info, thanks again @cheese for sharing a bit of what you guys did. Knowing what context+parm targets to look for from someone who’s done it helps a lot in VRAM fitting vs just blundering about pulling 100GB+ models on end.
We’re going on-prem because the ultimate goal is to wedge some business-knowledge RAG between Wappler and the local LLM. This totally doable in D.O. as well, of course, but since the stack is going to be near-always-on for other business use, at the D.O. billing rate we’d hit the cost of the hardware outlay in less than 3 months!