Local LLMs

How to run AI models on your own machine instead of relying on cloud services. Great for privacy, offline work, or just experimenting without API costs. We’ll show you what works and what’s more trouble than it’s worth.

Tags:

5 minute read

AI tools aren’t just for billion-dollar companies anymore. You can now run large language models (LLMs) and image generators entirely on your own machine - no cloud required, no API keys, no monthly fees.

That unlocks a ton of possibilities: automation, data processing, scripting help, private knowledge bases, or just tinkering. This page breaks down how local LLMs work, how they compare to cloud-based tools like OpenAI, and how to get started without burning cash or GPU cycles you don’t have.

Why Run AI Locally?

Cloud LLMs like OpenAI’s GPT-4 are powerful and shockingly cheap (more on that below). But running a model locally has a few big advantages:

Zero cost per query – Once it’s downloaded, it’s free forever
Full privacy – Your prompts and data never leave your machine
Offline access – Use it even without internet access
Unlimited usage – Run hundreds or thousands of interactions a day

You don’t need a data center or $4,000 GPU. If you’ve got an M1-M4 Mac or a midrange NVIDIA GPU (RTX 3060 or better), you’re good to go.

Ollama: Your LLM Engine

Ollama is a command-line tool and background service that makes running local LLM models dead simple.

What It Does

Downloads and runs open models like LLaMA 3, Mistral, or Phi-3
Optimizes them for your CPU or GPU
Exposes a local HTTP API (you can script against it)
Gives you simple CLI access (ollama run llama3)

Why It’s So Good

Runs on macOS, Windows (WSL2), and Linux
You can switch between models easily (ollama run or ollama pull)
Supports chat history, model customization, and embeddings

What About RAM and Disk?

Most 7B models need ~8–12GB RAM and around 4–6GB of disk space. You don’t need a monster PC, just a modern machine with some breathing room.

It’s “all or nothing” though. If you have 32GB of system RAM and your GPU has 12GB of VRAM, you can’t run a model that requires 16GB of RAM. The entire model has to be able to fit in GPU RAM memory, else it will load the model into the system RAM and use the CPU instead.

OpenWebUI: A Local ChatGPT UI

OpenWebUI gives you a web interface - just like ChatGPT - but it connects to your local Ollama models instead of a cloud service.

Features

Chat history
Code blocks, markdown rendering
Model switcher
System prompts / personas

You can run it with Docker, or natively, and it runs on http://localhost:3000 by default. Super easy to use, especially if you want a visual layer on top of Ollama.

Local Image Generation with Stable Diffusion

LLMs aren’t just for text. You can also run Stable Diffusion locally for image generation.

On Mac: DiffusionBee (click-and-run GUI)
Cross-platform: Automatic1111 Web UI
NVIDIA GPU recommended, but Apple Silicon works too

Use this to:

Create blog headers
Generate icons and thumbnails
Visualize ideas without touching Midjourney or DALL·E

Compare: Cloud vs Local

Let’s break down when to use each:

Feature	Cloud LLM (OpenAI, Claude, etc)	Local LLM (Ollama)
Cost	Cheap ($0.001–$0.01/query)	Free after install
Privacy	Data leaves your machine	100% private
Setup time	None	10–30 mins + download
Speed	Fast	Slower on lower-end hardware
Best for	Real-time tasks, one-offs	Batch jobs, automation, private use
Model power	Best-in-class (GPT-4, Claude 3)	Good-enough (LLaMA 3, Mistral, etc)

Example Use Cases for Local LLMs

Nightly Automation

Process logs
Clean up data
Summarize reports
Generate code or documentation

Run it all through a script that hits your Ollama API while you sleep. You don’t have to think about how much it’s going to cost, because it’s all local.

Private Prompting

Feed it sensitive data you don’t want in the cloud
Local knowledge base with embeddings
Air-gapped security environments

Controversial Prompting

Public LLMs have content filtering that won’t let you ask certain questions
If you’re working on cybersecurity, Red Team work for example, you can ask “hacking” questions for example, that would never be allowed on public LLMs
Ollama has “uncensored” models that allow you to ask those questions, or you can override the content filters on the models you run locally and create custom, uncensored models

Dev Productivity

Use local Copilot-style tools with Vim, Emacs, or VS Code. Ollama has coding models.
Generate test cases, docstrings, or config boilerplate
Write CLI helpers and bash completions

OpenAI: Still Worth Using?

Absolutely. If you’re building something public, or just need higher quality output without worrying about compute or model tuning, OpenAI is still a great choice.

To really set expectations, you will likely be able to run 7b models locally - unless you’ve spend $2,000+ on a higher-end GPU. OpenAI’s models are like the 70b or 175b models, which are much more powerful. The higher number of parameters doesn’t mean that it knows more, it’s really an indicator of how “smart” the model is, in terms of how well it can understand, and how useful the responses are.

GPT-3.5 is free on chat.openai.com
GPT-4-turbo is $0.01–$0.03 per 1,000 tokens (still fractions of pennies)
No setup required
The best model quality today (as of 2024)

You can use both. Local for automation and tinkering. Cloud for polish and production.

Getting Started

If you want to try this stuff today:

Install Ollama → ollama.com/download
(Optional) Install OpenWebUI → GitHub link
Try ollama run llama3 to start a local chat
Create a shell script that hits http://localhost:11434/api/generate
(Optional) Try DiffusionBee or Automatic1111 for local image generation

Use Your Tools

You can wire these tools into shell scripts, VS Code tasks, cron jobs, or anything else in your workflow. They’re just APIs and local services. No mystery.

Summary

Local LLMs give you AI power without the cost. Whether you’re automating your homelab, summarizing logs, or generating images, you don’t need cloud services to do serious AI work.

Use Ollama for fast, free, local text generation
Add OpenWebUI if you want a web-based chat UI
Run Stable Diffusion locally for images and thumbnails
Choose local for privacy, cost, and automation; choose cloud for speed and polish
These tools are free, powerful, and just getting better

Just download, run, and start building.