How to run a local LLM with LM Studio inside Wappler?

Hi,

I’d like to integrate a local Large Language Model running in LM Studio with Wappler’s built-in AI features (AI Tools / AI Chat).

Instead of using OpenAI or other cloud providers, I want Wappler to send prompts directly to a model hosted locally via LM Studio’s API.

Has anyone done this before?

  • Which API endpoint from LM Studio should be used in Wappler’s AI provider settings?
  • Do I need to configure a custom AI Provider in Wappler for this?
  • Any tips for handling authentication, model selection, or streaming responses?
  • Are there CORS or HTTPS issues when connecting Wappler to a local LLM server?

If you have experience connecting Wappler to a local LLM, please share the setup steps.

Thanks!

1 Like

In your Wappler options under AI set Chat Provider to Custom.

Then set the Base URL to the place where LM Studio server is running like http://localhost:1234/v1.

LM Studio as a Local LLM API Server | LM Studio Docs
OpenAI Compatibility API | LM Studio Docs

2 Likes

Thanks @patrick :star_struck:

Hi Patric, look at this! Which model do you recommend using in Wappler for a MacBook M4 Pro with 24GB?

You can increase the context level (up to one million tokens in Qwen2.5-7B-Instruct-1M / 14B-Instruct-1M and 128k tokens in Qwen2.5-7B-Instruct) in LM Studio for Qwen models. A quick search should reveal how to do this @AdrianoLuiz

Can You Increase the Context Length?

1. Model-dependent support

  • Some Qwen2.5-Instruct models (like the standard Instruct variants) typically support up to 32 K tokens natively.
  • However, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M models support up to 1 million tokens—a gigantic increase—using advanced methods like Dual Chunk Attention and length extrapolation techniques. Hugging FaceBind AI IDEQwenSimon Willison’s Weblog

2. Enabling Extended Context via YaRN

  • For Qwen2.5-7B-Instruct, you can extend the context window from 32 K to 128 K tokens by enabling YaRN (a RoPE scaling technique). GitHub
  • However, this currently uses static YaRN (available via vLLM): while it does extend context, it may slightly reduce performance on short-context tasks. GitHub
  • Dynamic YaRN (which adjusts based on input length) is only available via Alibaba Cloud ModelStudio API as of now. GitHub

3. LM Studio capabilities

  • LM Studio allows customizing context length when loading a model. For example:
import lmstudio as lms
model = lms.llm("qwen2.5-7b-instruct", config={
    "contextLength": 8192,
    # other settings...
})
  • You can set contextLength, ropeFrequencyBase, ropeFrequencyScale, and other advanced parameters. LM Studio+1
  • However, some users have encountered errors when trying to load models with very large context lengths—even if the model claims to support them—depending on your hardware constraints. GitHub+1

Summary Table

Model / Setting Native Context Support Extended Support
Qwen2.5-7B-Instruct 32 K tokens Up to 128 K via static YaRN
Qwen2.5-7B-Instruct-1M / 14B-Instruct-1M Up to 1 M tokens Fully supported via DCA, etc.
LM Studio Configuration Depends on hardware Customizable via contextLength, ropeFrequencyScale, etc.

What You Can Do

  1. Check the model's context length in LM Studio using model.getContextLength() before running. LM Studio
  2. If you're using a 1M-capable model, you can indeed set contextLength up to 1,000,000 in your load config—assuming your GPU/VRAM can handle it (e.g., 120 GB+ for 7B-Instruct-1M, 320 GB+ for 14B-Instruct-1M). Simon Willison’s WeblogBind AI IDE
  3. If you're on Qwen2.5-7B-Instruct (standard) and want to push beyond 32K tokens, you can enable YaRN via model config with caution—this may impact short-context performance. GitHub
  4. Test different configurations and monitor memory consumption. If loading fails at very high values, try lowering until stable.
2 Likes

thanks

1 Like

Resurrecting this thread to see what open-weight models the Wappler community’s had the most success with (for those that have the sufficient VRAM to run local). If you have a beefy local dev rig (>24GB of VRAM), what models have you found most “knowledgeable” of the Wappler framework? GPT-OSS120, GLM-4.5 Air, Qwen 3 Coder, Codestral, etc? Appreciate any input from those that’ve maybe tried a bunch.

My company’s thinking about picking up an RTX6000 Pro for local inference, so I’m interested to know what’s best to throw at it for Wappler specifically.

Before going all in I'd suggest trying Digital Oceans services for AI/ML workloads and play around with it all (rent by the hour). Will give you quick answers without the initial investment. For us the DeepSeek and Qwen models worked best (although I hear GPT-OSS120 can be quite good but when we went in on it this model was not yet released). DeepSeek has recently released their latest version which looks very promising but am yet to play around with it so don't really know any more about its full capabilities other than what is published (you can head on over to DeepSeek and find out for yourself). The previous incarnations did have a reasonable understanding of Wappler though. We put together some nice Projects with both of these models. Lots of errors and issues but ironed them out in the end. As it is Open at least you don't have to pay for the failures. Perseverance and keeping at it will get you there in the end.

Seems pretty clued up... I went ahead and asked about Wappler and here is the response:


Hey Deepseek what is your understanding of Wappler? Do you have enough knowledge of Server Connect and App Connect and structures to be viable to use in creating Projects using Wappler as the IDE?


Hmm, this is a specific technical question about Wappler and its components. The user is asking about my understanding of Wappler and whether I have sufficient knowledge of Server Connect and App Connect to assist with projects.

I should confirm my awareness of Wappler as a low-code/no-code IDE and demonstrate familiarity with its key features. The user likely wants to know if I can provide practical guidance, not just general information.

I recall that Server Connect handles backend integrations while App Connect manages frontend interactions. A structured breakdown of these components would be helpful. Since Wappler projects involve specific patterns like data bindings and REST API generation, I should mention those too.

The user might be evaluating whether to use me as a resource for Wappler development, so I should emphasize my ability to assist with architecture, troubleshooting, and best practices while acknowledging some limitations regarding real-time testing. A concrete example could make this more tangible.

Yes, I have a solid understanding of Wappler and its core components. Here's my knowledge breakdown:

Understanding of Wappler

Wappler is a visual, low-code/no-code IDE for full-stack web and mobile app development. It combines:

  • Visual development with drag-and-drop UI building
  • Real-time visual data binding to backend services
  • Direct code editing alongside visual tools
  • Integrated deployment and version control

Key Components I Understand:

1. Server Connect

  • Backend API & Database integration layer
  • Visual configuration of database connections (MySQL, PostgreSQL, etc.)
  • Auto-generation of REST APIs from database schemas
  • Model definitions and relationships
  • Authentication/authorization setup (user management, roles, permissions)
  • Server actions for custom business logic
  • File storage and asset management configuration
  • Cron jobs and scheduled tasks
  • Webhook handling

2. App Connect

  • Frontend data binding and state management
  • Visual binding of UI elements to data sources
  • Component properties and event handling
  • Form handling with validation
  • Dynamic content loading and filtering
  • Real-time updates via WebSockets
  • Mobile app capabilities (with Capacitor integration)
  • Offline functionality and data sync

3. Project Structures

  • Standard Wappler project organization
  • Resource Manager for assets and dependencies
  • Component libraries and custom components
  • Page templates and layouts
  • Theme system with CSS variables
  • Build configurations for different environments
  • Deployment targets (Docker, shared hosting, cloud services)

How I Can Assist with Wappler Projects:

I can help with:

  • Architecture planning for Server Connect data models
  • API design and endpoint structure
  • Data binding strategies in App Connect
  • Component composition and reuse patterns
  • Authentication flows and user management
  • Database schema design and optimization
  • Performance optimization strategies
  • Troubleshooting common Wappler issues
  • Best practices for maintainable Wappler projects
  • Integration patterns with external services

Limitations to Note:

  • I cannot directly execute Wappler-specific commands
  • I don't have real-time access to the Wappler interface
  • Some visual configuration specifics might require screenshots for precise guidance
  • Project setup details might need Wappler documentation reference

Example Areas I Can Help:

  1. Designing a Server Connect data model for a multi-user app 2. Creating efficient App Connect bindings for complex UI 3. Implementing proper authentication/authorization flows 4. Optimizing database queries and API responses 5. Structuring reusable components 6. Planning deployment strategies

Yes, I have sufficient knowledge to be viable for assisting with Wappler projects. Many developers use me as a "thinking partner" for architectural decisions, troubleshooting, and learning best practices while working in Wappler.

Would you like help with a specific Wappler project or concept?


Thanks for the thoughtful response @Cheese ! (I assume by latest-version on Deepseek, you must be referring to 3.2) I appreciate the votes on both that and Qwen, on your prior projects, approx what param count / quantization were you running, if you remember?

I logged our optimum settings as took a while to figure it all out. Probably not too useful but DeepSeek on dual H200's (Digital Ocean) ran best as follows:

Parameters: 350/400B models.
Quantization : 4-bit (AWQ/GPTQ).
Context : 32,000-64,000 tokens.

Which gave us on average between 150-250 tokens per second per GPU. I know that is quite a phenomenal number! Especially now that we use Open Router or Copilot. Man they feel soooooo slow. Just can't afford to sit on 2 x H200's 24/7 hahaha. One day!!

Yes that is the most recent. The response I posted above (regarding Wappler) is from 3.2.

EDIT:
I must add the support Digital Ocean provided us was amazing! Can not fault those guys in any way whatsoever. Without them we would have been dead in the water as such a steep learning curve. All questions we asked, and we asked hundreds, were answered within minutes by their team.

1 Like

Excellent info, thanks again @cheese for sharing a bit of what you guys did. Knowing what context+parm targets to look for from someone who’s done it helps a lot in VRAM fitting vs just blundering about pulling 100GB+ models on end.

We’re going on-prem because the ultimate goal is to wedge some business-knowledge RAG between Wappler and the local LLM. This totally doable in D.O. as well, of course, but since the stack is going to be near-always-on for other business use, at the D.O. billing rate we’d hit the cost of the hardware outlay in less than 3 months!

Thanks again!

1 Like