Local LLMs

How to run AI models on your own machine instead of relying on cloud services. Great for privacy, offline work, or just experimenting without API costs. We’ll show you what works and what’s more trouble than it’s worth.

AI tools aren’t just for billion-dollar companies anymore. You can now run large language models (LLMs) and image generators entirely on your own machine - no cloud required, no API keys, no monthly fees.

That unlocks a ton of possibilities: automation, data processing, scripting help, private knowledge bases, or just tinkering. This page breaks down how local LLMs work, how they compare to cloud-based tools like OpenAI, and how to get started without burning cash or GPU cycles you don’t have.


Why Run AI Locally?

Cloud LLMs like OpenAI’s GPT-4 are powerful and shockingly cheap (more on that below). But running a model locally has a few big advantages:

  • Zero cost per query – Once it’s downloaded, it’s free forever
  • Full privacy – Your prompts and data never leave your machine
  • Offline access – Use it even without internet access
  • Unlimited usage – Run hundreds or thousands of interactions a day

You don’t need a data center or $4,000 GPU. If you’ve got an M1-M4 Mac or a midrange NVIDIA GPU (RTX 3060 or better), you’re good to go.


Ollama: Your LLM Engine

Ollama is a command-line tool and background service that makes running local LLM models dead simple.

What It Does

  • Downloads and runs open models like LLaMA 3, Mistral, or Phi-3
  • Optimizes them for your CPU or GPU
  • Exposes a local HTTP API (you can script against it)
  • Gives you simple CLI access (ollama run llama3)

Why It’s So Good

  • Runs on macOS, Windows (WSL2), and Linux
  • You can switch between models easily (ollama run or ollama pull)
  • Supports chat history, model customization, and embeddings
What About RAM and Disk?

Most 7B models need ~8–12GB RAM and around 4–6GB of disk space. You don’t need a monster PC, just a modern machine with some breathing room.

It’s “all or nothing” though. If you have 32GB of system RAM and your GPU has 12GB of VRAM, you can’t run a model that requires 16GB of RAM. The entire model has to be able to fit in GPU RAM memory, else it will load the model into the system RAM and use the CPU instead.


OpenWebUI: A Local ChatGPT UI

OpenWebUI gives you a web interface - just like ChatGPT - but it connects to your local Ollama models instead of a cloud service.

Features

  • Chat history
  • Code blocks, markdown rendering
  • Model switcher
  • System prompts / personas

You can run it with Docker, or natively, and it runs on http://localhost:3000 by default. Super easy to use, especially if you want a visual layer on top of Ollama.


Local Image Generation with Stable Diffusion

LLMs aren’t just for text. You can also run Stable Diffusion locally for image generation.

Use this to:

  • Create blog headers
  • Generate icons and thumbnails
  • Visualize ideas without touching Midjourney or DALL·E

Compare: Cloud vs Local

Let’s break down when to use each:

FeatureCloud LLM (OpenAI, Claude, etc)Local LLM (Ollama)
CostCheap ($0.001–$0.01/query)Free after install
PrivacyData leaves your machine100% private
Setup timeNone10–30 mins + download
SpeedFastSlower on lower-end hardware
Best forReal-time tasks, one-offsBatch jobs, automation, private use
Model powerBest-in-class (GPT-4, Claude 3)Good-enough (LLaMA 3, Mistral, etc)

Example Use Cases for Local LLMs

Nightly Automation

  • Process logs
  • Clean up data
  • Summarize reports
  • Generate code or documentation

Run it all through a script that hits your Ollama API while you sleep. You don’t have to think about how much it’s going to cost, because it’s all local.

Private Prompting

  • Feed it sensitive data you don’t want in the cloud
  • Local knowledge base with embeddings
  • Air-gapped security environments

Controversial Prompting

  • Public LLMs have content filtering that won’t let you ask certain questions
  • If you’re working on cybersecurity, Red Team work for example, you can ask “hacking” questions for example, that would never be allowed on public LLMs
  • Ollama has “uncensored” models that allow you to ask those questions, or you can override the content filters on the models you run locally and create custom, uncensored models

Dev Productivity

  • Use local Copilot-style tools with Vim, Emacs, or VS Code. Ollama has coding models.
  • Generate test cases, docstrings, or config boilerplate
  • Write CLI helpers and bash completions

OpenAI: Still Worth Using?

Absolutely. If you’re building something public, or just need higher quality output without worrying about compute or model tuning, OpenAI is still a great choice.

To really set expectations, you will likely be able to run 7b models locally - unless you’ve spend $2,000+ on a higher-end GPU. OpenAI’s models are like the 70b or 175b models, which are much more powerful. The higher number of parameters doesn’t mean that it knows more, it’s really an indicator of how “smart” the model is, in terms of how well it can understand, and how useful the responses are.

  • GPT-3.5 is free on chat.openai.com
  • GPT-4-turbo is $0.01–$0.03 per 1,000 tokens (still fractions of pennies)
  • No setup required
  • The best model quality today (as of 2024)

You can use both. Local for automation and tinkering. Cloud for polish and production.


Getting Started

If you want to try this stuff today:

  1. Install Ollamaollama.com/download
  2. (Optional) Install OpenWebUIGitHub link
  3. Try ollama run llama3 to start a local chat
  4. Create a shell script that hits http://localhost:11434/api/generate
  5. (Optional) Try DiffusionBee or Automatic1111 for local image generation
Use Your Tools

You can wire these tools into shell scripts, VS Code tasks, cron jobs, or anything else in your workflow. They’re just APIs and local services. No mystery.


Summary

Local LLMs give you AI power without the cost. Whether you’re automating your homelab, summarizing logs, or generating images, you don’t need cloud services to do serious AI work.

  • Use Ollama for fast, free, local text generation
  • Add OpenWebUI if you want a web-based chat UI
  • Run Stable Diffusion locally for images and thumbnails
  • Choose local for privacy, cost, and automation; choose cloud for speed and polish
  • These tools are free, powerful, and just getting better

Just download, run, and start building.