I installed and configured one of the NIVIDA Spark boxes. Loaded several different LLMs on it and was able to successfully communicate with it through wappler API calls.
Might want to post what your api call looks like? You know is your authentication configured correctly? Any firewall blocking issues. It could be all kinds of issues. We just need a little more information on what you are doing exactly.
I ran the following on the server:
python3 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-72B-Instruct-AWQ --dtype float16 --quantization awq_marlin --gpu-memory-utilization 0.90 --max-model-len 32768 --host 10.80.3.3 --port 8000
btw, If the path to the AI server is specified incorrectly, Wappler freezes at startup (and sends a report about this to the server after crash). I had to manually edit the options.json file to get Wappler to start.
When the LLM use an Open AI compatible API it should work fine.
I believe the server you've setup is not configured correctly, you should set those 2 required options from the error message. I have no experience with vLLM, you should check their documentation on how to configure.
I fixed the VLLM, so now I can type “Hi” into the Wappler AI assistant, but after that, Wappler tries to send more than 32,768 tokens to the model, and the model crashes. I increased the model's input token limit to 65,535, and it started working. However, the model returned an error; though it did warn right away that it only works with 32k tokens. Now it’s clear why Wappler burned through $20 in ChatGPT in 15 minutes
I have an RTX PRO 6000 with 96 GB of memory. Is this card powerful enough to run the LLM for Wappler? Or do I need 8 B300s?
I’ll keep experimenting and let you know when something works.
The initial message includes the system prompt which contains all the base knowledge and guidelines for the open project that the LLM should know. If the LLM backend does the caching correctly the follow up calls should cost a lot less and with a local model it doesn't cost anything except GPU power and memory. The needed GPU depends on the LLM model that is being used.
I'm not able to reproduce the issue, when having an invalid url/path to the AI server it doesn't freeze or crash.
I looked up the error reports that are send to sentry and noticed that you had several crashes with Out of Memory error. I didn't see in the data that much memory was used, I believe it was around 500Mb and enough free memory. Also didn't notice any loops or so in the stack trace. Can't say what exactly caused it, I've counted 4 of these errors and they seem to happen when the V8 engine is doing Garbage Collection.
Will do some more testing but not sure if the Out Of Memory crashes and incorrect AI url are related.
I tried again. I didn't run LLM at all. In Wappler, I entered the wrong port in the URL— I put 8080 instead of 8000. I closed Wappler. I opened it again. Wappler froze and started using 16% of the CPU; memory usage jumped from 2 GB to about 7 GB in roughly 2–3 minutes, and Wappler threw an error and sent you a report. Not a maximum memory was used. Still 30% free.
I set the port back to the correct one, and Wappler started running. I'm not sure if vLLM is working right now because it's currently downloading a new model, and I don't think it should be responding to API requests, but I'm not sure.
I tried to talk to the AI agent, but it returned a connection error. The LLM is definitely not running. It turns out that in this case, Wappler freezes if the port in the URL is incorrect in the settings.
Checked error reports and have a new error now, it seems that in your case it indeed comes in a loop with the AI manager. I will try to fix it for the next update.
Qwen3-32B worked with Wappler, but when I gave it a sufficiently large task, the LLM crashed. Next, I used Qwen3-coder-30B-A3B-Instruct and gave it the same task (an 18-KB text description of a 10-page website). After that, Wappler wrote:
"I'll create a modern website structure for the IT business company based on your requirements. Let me start by examining the current project structure and then implement the requested pages.
First, let me check what files already exist in the project:"
and then went silent. There were no further requests to LMM.
No errors, but nothing is happening.
However, such a large request using ChatGPT wasn’t the best idea; I’ll continue exploring how this can be used....
Is there anywhere I can view the log of how Wappler communicates with the LLM? So I can understand why it stopped working....
UPD:
I tried talking to the AI again:
"- Are you sleeping?
I'm awake and ready to help you create the website structure. Let me first check what files already exist in your project to understand the current structure."
Wappler AI loaded the LLM for literally 10 seconds and went back to sleep....
UPD2:
At least for now, I can chat with Wappler AI; it apologizes and says, “I'll get to work now,” but nothing happens....
Idk, no errors, everything is ok but nothing happend. I am new in LLM, will think how to check or fix this on llm side. But if some tool are not working properly better to have some message from Wappler too….
for prompt “create index page“ Wappler write:
”Failed to create index.ejs,
Failed to edit index.ejs
Failed to read index.ejs”
This is happend if index.ejs opened.