Local AI Tool To Analyse Code

Antony · September 14, 2024, 9:28am

Hi Wapplers!

I want to tidy up the code base of my app… and I’m curious to see if AI can help me.

However the code contains passwords etc so I don’t want to use a web based product that may potentially learn that information.

Does anyone have experience of AI that runs locally to do this kind of thing?

Antony · September 14, 2024, 9:33am

Here is Chat GPT’s take on it!

Yes, there are several AI solutions you can run locally on your machine without an internet connection to analyze the code of a web app you’ve developed. Here are some options:

OpenAI’s GPT Models (GPT-2, GPT-3)

• GPT-2 can be downloaded and fine-tuned to run locally. While it’s not as powerful as GPT-4, it’s capable of some level of code analysis and understanding, especially if you fine-tune it for your specific use case.
• Requirements: High-performance hardware with sufficient CPU, RAM, and possibly GPU for efficient inference.
• Installation: Using tools like Hugging Face’s transformers library, you can set up GPT models to run without an internet connection.
Code LLMs and Transformers

• Models specifically trained for code, like CodeBERT, Codex, or PolyCoder (an open-source alternative), can be installed and run locally. These models can assist in code understanding, completion, and even debugging.
• PolyCoder is an example of an open-source model that can understand and analyze code effectively without requiring an internet connection.
Open-Source Language Models (LLMs) for Code Analysis

• Hugging Face Transformers: Hugging Face hosts many pre-trained models that can be used for code analysis, including models for specific programming languages. You can download these models and run them locally.
• Requirements: Installing these on your machine involves setting up Python environments, downloading models, and using them offline. Some models might need GPU for efficient operation.
Static Code Analysis Tools (AI-Augmented)

• SonarQube: A popular open-source tool for static code analysis that can be run locally. While it doesn’t use the same form of AI as language models, it provides deep code analysis, bug detection, and code quality monitoring.
• Semgrep: Another lightweight, fast tool that allows static analysis with custom rules for analyzing code without needing an internet connection.
• CoPilot-like Open-Source Tools: Tools such as TabNine (which can run offline) can be set up to assist with code completion, understanding, and pattern recognition.
Local LLMs for Specialized Code Understanding

• StarCoder: This is an open-source large language model designed for coding tasks that can be run locally. It supports multiple programming languages and provides capabilities for code generation and understanding.
• Installation: Tools like transformers or llama.cpp can be used to run these models on your machine, assuming you have enough computational resources.
Machine Learning Toolkits for Code Analysis

• PyTorch or TensorFlow: If you are more inclined toward building your own custom AI solution, you can train a machine learning model locally using frameworks like PyTorch or TensorFlow. This would give you full control over the model and the data it’s trained on, ensuring no need for an internet connection.

Considerations

•	Hardware: Many AI models require significant hardware resources, especially for real-time analysis. Depending on the complexity of your web app and the AI solution you choose, you might need a machine with a good CPU, high RAM (16GB+), and potentially a GPU.
•	Model Size: Some models are quite large and require substantial disk space, so ensure you have enough storage available.

If you provide more details about the kind of code analysis you’re looking for (bug finding, performance, security, etc.), I can suggest more specific tools.

Hyperbytes · September 14, 2024, 11:47am

Bigger question. Where and why?

If you need to use passwords in your code i strongly suggest you store them as ENV variables and reference them rather than embedding in code.

Cheese · September 14, 2024, 3:40pm

I think you need to ask AI about your current applications logic and how to improve it. Asking AI to analyse your code is no good if you don't understand what it returns or how to implement the response. What if AI tries to rewrite a chunk of code for the sake of efficiency at the cost of functionality? ie breaking Wappler integration. Of course this is assuming you have managed to train the model on the code base in the beginning (which is an almighty task in itself)...

Apple · September 14, 2024, 5:12pm

I'm afraid Wappler apps follow a different approach of programming, they store steps in a JSON file instead of actual code programming. It's likely the LLM will hallucinate non-existing steps or similar, and then you end up with a "broken" JSON when attempting to use the Wappler editor.

iamsoops · September 14, 2024, 5:21pm

I've been learning as I go and this is something I'll need to go back and sort when my app is live.

I've seen this mentioned a few times but wanted to understand a bit more.

Why is storing as an env more secure and should the env variable be clear text or does it need to be hashed/encrypted somehow?

Hyperbytes · September 14, 2024, 5:25pm

Put simply, due to the way node words, the ENV settings are in a file which cannot be accessed from anywhere but the application scripts themselves so if someone was able to harvest files from your site they still would not get the settings..

Apple · September 14, 2024, 5:37pm

Not sure if that's the best explanation If someone manages to harvest the site files they could also get the environment variables file...

In my view, the concern is storing passwords together with the source-code e.g. in a Git repository. You don't want to share production passwords across developers (but Wappler environment variables editor still makes this "mistake")

brad · September 14, 2024, 6:48pm

Isn't this dangerous as anyone can inspect the code in a browser? I can't think of any practical reason to include passwords in the code. Yikes!

But back on the AI track. I wouldn't want AI to overwrite any of my Wappler code. Wappler already creates decent optimized code and I would see no reason to mess with it.

Hyperbytes · September 14, 2024, 7:02pm

Perhaps but thats why i said basically.
The point was it is not in a publically accessible area, lets face it, if your entire site is harvested you are screwed regardless of what you do.

JonL · September 15, 2024, 10:00am

Password, API keys, and other sensitive information should never be stored in the code for four main reasons as I see it.

Security: Storing sensitive information directly in code puts it at risk of unauthorized access if the code is compromised. This could lead to data breaches, account takeovers, or other security incidents.
Compliance: Many regulatory standards and best practices require that sensitive data be properly protected. Storing such information in code often violates these compliance requirements. I don't know if you are aiming for compliance but doing things well from the first moment comes with free of charge compliance in the future.
Urgency: When sensitive information needs to be changed quickly (e.g., in response to a security incident or routine rotation), having it hardcoded in the application requires a code change, redeployment, and potentially service interruption. Storing these externally allows for rapid updates without touching the codebase.
Maintainability: Hardcoding sensitive information makes it difficult to update or rotate credentials without changing the code itself. This can lead to outdated credentials and makes it harder to manage access across different environments (development, staging, production).

Do future you a favour.

Jon out.

Antony · September 15, 2024, 12:49pm

@teodor, could you fork off the discussion about passwords to a different topic please?

(I presume I can't create such a fork myself?)

Notum · September 15, 2024, 1:37pm

Quote: "... and I'm sorry, but I don't find your comment amusing."

JonL · September 15, 2024, 4:17pm

He never found amusing being told what to do. That’s precisely why one would do it.

Antony · September 15, 2024, 8:39pm

I've always been curious about the difference between Human Intelligence and Artificial Intelligence.

I think this post is helping me to learn, so thank you all.

Ask AI and you get a straight and to the point answer.

Ask HI and you get a strange mix of sarcasm and totally unsolicited advice... and 13 replies later, no attempt to actually answer my question.

I wonder where the future of intelligence lies on planet earth...

Apple · September 15, 2024, 9:49pm

I've briefly used it for chatting, there's a web UI somewhere. I don't know anything code-specialized, but there are some code-focused models. I use Cody in VSCode for assistance with real code, but it's not self-hosted.

JonL · September 16, 2024, 7:40am

Who says we are humans?

Full Borg GIFs - Find & Share on GIPHY

You are at the wrong place if you don't want unsolicited advice...that, is by definition, "the internet". A place where eventually you will get your ass handed to you.

Also:

Your computer sucks heavily for that kind of thing.

Notum · September 16, 2024, 7:49am

Wait... Isn't this forum without humor and "healthy sarcasm"?

JonL · September 16, 2024, 8:05am

Nope. That is two doors away. This is "passive-aggressive" forum

As Antony is way fonder of Chatgpt's replies he might want to ask:

"Hey Siri, what would https://www.workshop-angel.com users think of a product architect that hardcodes passwords (let's assume not user passwords, but system/API) in the source? And how do you think these users feel about how their data and sensitive information is being handled?"

Nah, I will do it.

Why would ChatGPT want to keep a memory of that amuses me.

Notum · September 16, 2024, 8:32am

I have a better take. I've asked Grog what he thinks of having password harcoded.

I'm sure the users of Workshop Angel would just love to know that their product architect is a master of security... said no one ever. Hardcoding passwords in the source code? That's like leaving the front door of your house wide open and inviting the neighborhood kids to come on in and help themselves to your snacks.

I mean, what could possibly go wrong? It's not like someone could stumble upon the code, find the hardcoded passwords, and use them to access your system. Oh wait, that's exactly what could happen. And when it does, you can just blame it on the users for not being tech-savvy enough to understand the importance of security.

But seriously, folks, hardcoding passwords is a huge no-no. It's like putting your credit card information on a sticky note and leaving it on your desk. It's just asking for trouble. And when that trouble comes knocking, you can bet your bottom dollar that the users of Workshop Angel will be the ones who have to deal with the fallout.

So, to answer your question, I think the users of Workshop Angel would be absolutely livid if they found out that their product architect was hardcoding passwords in the source code. They would be like, "What is wrong with you people?! Do you not care about our data and sensitive information at all?!"

And as for how they would feel about how their data and sensitive information is being handled, I think they would be feeling a mix of emotions. They would be feeling frustrated, angry, and betrayed. They would be thinking, "How could we trust you with our data when you're not even taking basic security measures to protect it?"

So, to all the product architects out there, let this be a lesson to you: hardcoding passwords is a huge mistake. It's a security risk, and it's a betrayal of trust. Don't do it. Just don't.

But I'm thinking - maybe some one just wants to add some spice (by having their passwords hardcoded) in he's everyday life being working as a web-developer.

Like one of great 90's band Bloodhound Gang said:

You can't have no fun with an unloaded gun
My dad says that's for pussies

https://www.youtube.com/watch?v=5jfcAzLdJzA

Some one might find song thumbnail a bit NSFW that's why I'm posting it as a link.