Local LLMs
Categories:
5 minute read
AI tools aren’t just for billion-dollar companies anymore. You can now run large language models (LLMs) and image generators entirely on your own machine - no cloud required, no API keys, no monthly fees.
That unlocks a ton of possibilities: automation, data processing, scripting help, private knowledge bases, or just tinkering. This page breaks down how local LLMs work, how they compare to cloud-based tools like OpenAI, and how to get started without burning cash or GPU cycles you don’t have.
Why Run AI Locally?
Cloud LLMs like OpenAI’s GPT-4 are powerful and shockingly cheap (more on that below). But running a model locally has a few big advantages:
- Zero cost per query – Once it’s downloaded, it’s free forever
- Full privacy – Your prompts and data never leave your machine
- Offline access – Use it even without internet access
- Unlimited usage – Run hundreds or thousands of interactions a day
You don’t need a data center or $4,000 GPU. If you’ve got an M1-M4 Mac or a midrange NVIDIA GPU (RTX 3060 or better), you’re good to go.
Ollama: Your LLM Engine
Ollama is a command-line tool and background service that makes running local LLM models dead simple.
What It Does
- Downloads and runs open models like LLaMA 3, Mistral, or Phi-3
- Optimizes them for your CPU or GPU
- Exposes a local HTTP API (you can script against it)
- Gives you simple CLI access (
ollama run llama3
)
Why It’s So Good
- Runs on macOS, Windows (WSL2), and Linux
- You can switch between models easily (
ollama run
orollama pull
) - Supports chat history, model customization, and embeddings
Most 7B models need ~8–12GB RAM and around 4–6GB of disk space. You don’t need a monster PC, just a modern machine with some breathing room.
It’s “all or nothing” though. If you have 32GB of system RAM and your GPU has 12GB of VRAM, you can’t run a model that requires 16GB of RAM. The entire model has to be able to fit in GPU RAM memory, else it will load the model into the system RAM and use the CPU instead.
OpenWebUI: A Local ChatGPT UI
OpenWebUI gives you a web interface - just like ChatGPT - but it connects to your local Ollama models instead of a cloud service.
Features
- Chat history
- Code blocks, markdown rendering
- Model switcher
- System prompts / personas
You can run it with Docker, or natively, and it runs on http://localhost:3000
by default. Super easy to use, especially if you want a visual layer on top of Ollama.
Local Image Generation with Stable Diffusion
LLMs aren’t just for text. You can also run Stable Diffusion locally for image generation.
- On Mac: DiffusionBee (click-and-run GUI)
- Cross-platform: Automatic1111 Web UI
- NVIDIA GPU recommended, but Apple Silicon works too
Use this to:
- Create blog headers
- Generate icons and thumbnails
- Visualize ideas without touching Midjourney or DALL·E
Compare: Cloud vs Local
Let’s break down when to use each:
Feature | Cloud LLM (OpenAI, Claude, etc) | Local LLM (Ollama) |
---|---|---|
Cost | Cheap ($0.001–$0.01/query) | Free after install |
Privacy | Data leaves your machine | 100% private |
Setup time | None | 10–30 mins + download |
Speed | Fast | Slower on lower-end hardware |
Best for | Real-time tasks, one-offs | Batch jobs, automation, private use |
Model power | Best-in-class (GPT-4, Claude 3) | Good-enough (LLaMA 3, Mistral, etc) |
Example Use Cases for Local LLMs
Nightly Automation
- Process logs
- Clean up data
- Summarize reports
- Generate code or documentation
Run it all through a script that hits your Ollama API while you sleep. You don’t have to think about how much it’s going to cost, because it’s all local.
Private Prompting
- Feed it sensitive data you don’t want in the cloud
- Local knowledge base with embeddings
- Air-gapped security environments
Controversial Prompting
- Public LLMs have content filtering that won’t let you ask certain questions
- If you’re working on cybersecurity, Red Team work for example, you can ask “hacking” questions for example, that would never be allowed on public LLMs
- Ollama has “uncensored” models that allow you to ask those questions, or you can override the content filters on the models you run locally and create custom, uncensored models
Dev Productivity
- Use local Copilot-style tools with Vim, Emacs, or VS Code. Ollama has coding models.
- Generate test cases, docstrings, or config boilerplate
- Write CLI helpers and bash completions
OpenAI: Still Worth Using?
Absolutely. If you’re building something public, or just need higher quality output without worrying about compute or model tuning, OpenAI is still a great choice.
To really set expectations, you will likely be able to run 7b
models locally - unless you’ve spend $2,000+ on a higher-end GPU. OpenAI’s models are like the 70b
or 175b
models, which are much more powerful. The higher number of parameters doesn’t mean that it knows more, it’s really an indicator of how “smart” the model is, in terms of how well it can understand, and how useful the responses are.
- GPT-3.5 is free on chat.openai.com
- GPT-4-turbo is $0.01–$0.03 per 1,000 tokens (still fractions of pennies)
- No setup required
- The best model quality today (as of 2024)
You can use both. Local for automation and tinkering. Cloud for polish and production.
Getting Started
If you want to try this stuff today:
- Install Ollama → ollama.com/download
- (Optional) Install OpenWebUI → GitHub link
- Try
ollama run llama3
to start a local chat - Create a shell script that hits
http://localhost:11434/api/generate
- (Optional) Try DiffusionBee or Automatic1111 for local image generation
You can wire these tools into shell scripts, VS Code tasks, cron jobs, or anything else in your workflow. They’re just APIs and local services. No mystery.
Summary
Local LLMs give you AI power without the cost. Whether you’re automating your homelab, summarizing logs, or generating images, you don’t need cloud services to do serious AI work.
- Use Ollama for fast, free, local text generation
- Add OpenWebUI if you want a web-based chat UI
- Run Stable Diffusion locally for images and thumbnails
- Choose local for privacy, cost, and automation; choose cloud for speed and polish
- These tools are free, powerful, and just getting better
Just download, run, and start building.