00

Why this matters

The assistants most people use are a window to someone else’s computer. Every question you type travels to a company, is answered there, and on the free tiers is kept and used to train the next model. A local model turns that around.

A capable AI can run on the laptop in front of you. You install a small program, download a model once, and after that it answers from your own machine. There is no account to make, no subscription to pay, and nothing to send. The chat that on a cloud service would be logged and learned from happens in your own memory and is gone when you close it.

This is the clearest meeting point of the two halves of this site, the new AI tools and the older idea of holding your own things. The same instinct that keeps passwords on a key in your pocket keeps a model on your own disk. What you ask it is yours, and the companion guide on keeping cloud chats off the training set becomes a setting you no longer need, because there is no server to reach.

There is one honest catch, and the next section deals with it first. A model is a large thing to run, so this asks a little more of your hardware than opening a website does.

What this is, in one line

Install one free program, download a model once, and from then on you have an assistant that runs on your machine and sends nothing anywhere.

01

Will your laptop run it

This is the part to settle before you start. A model runs in your computer’s memory, so the question is how much memory you have, and that decides how large a model you can hold.

The rule of thumb is simple. A model squeezed down to a compact form needs roughly its own size in memory, a little less. A model of three billion parameters wants about two to three gigabytes free, and one of seven or eight billion about five to six. So a laptop with 8 GB of memory comfortably runs a small model, 16 GB is room for a mid-sized one, and 32 GB or more opens up the larger models.

Two things help. A recent Apple Silicon Mac shares its memory between the processor and the graphics chip, which suits these models well. A separate graphics card with its own memory makes replies much faster. Without one, the model runs on the main processor instead, which works and is slower, so the smaller models are the happier choice there.

What quantised means

You will see models offered in versions labelled with a Q and a number, such as Q4. Quantising squeezes the model’s internal numbers into fewer bits so it fits in less memory, for a small loss of accuracy. A four-bit build is the usual sweet spot, and the tools below pick a reasonable default, so this is good to recognise rather than something to agonise over.

02

Set it up

There are two good tools, and the right one depends on your comfort with a terminal. Ollama is a few words typed into a black window and is the quickest to script. LM Studio is an ordinary app with a chat window and never needs the terminal. Pick one.

The terminal way: Ollama

1
Install Ollama

Download it for your system

Go to ollama.com and get the version for your system. Windows and Mac have an installer to double-click. On Linux, the site gives a single line to paste into a terminal. Ollama runs on Windows, macOS and Linux.

2
Run a model

One command downloads and starts it

Open a terminal and run the line below. The first time, it downloads the model, a couple of gigabytes, then drops you into a chat where you type a question and press Enter.

Terminal
ollama run llama3.2

That model, Llama 3.2, is a small one that suits most laptops. After the download, the same command starts it instantly.

3
Everyday commands

Leave, list, add and remove

Type `/bye` to leave the chat. The handful of commands below cover the rest. Browse the model library for others, and match the size to your memory using the previous section.

Terminal
ollama list              # models you have
ollama pull mistral      # download another
ollama rm llama3.2       # delete one

The no-terminal way: LM Studio

1
Install LM Studio

Download the app

Get LM Studio for your system and install it like any other application. It is free for home and work use, and runs on Windows, Mac and Linux.

2
Get a model

Search and download from inside the app

Open the search tab, type a model name such as Llama 3.2 or Qwen 3, and pick a version sized for your machine. The app marks which ones will fit your memory. Click to download.

3
Chat

Load it and start typing

Open the chat tab, load the model you downloaded, and type. Everything runs on your machine, and the model answers in the window like any chat app.

4
Optional: serve it

Let other apps use your model

LM Studio can run a local server so other programs can use the model, in the same shape the big providers use. Turn it on only if you want a separate tool to talk to your local model. Most people never need this.

Which model to start with

On a laptop with 8 GB of memory, start with a three-billion model such as Llama 3.2. With 16 GB, step up to an eight-billion model such as Llama 3.1, Qwen 3 or Mistral, which answer noticeably better. Try one, and adjust by how quickly it replies.

03

Prove it is private

The whole promise is that nothing leaves your machine. You can check it yourself in a few seconds, rather than take it on trust.

Once the model is downloaded, turn off your wifi, or pull out the network cable, and ask the model a question. It answers exactly as before. With no connection, there is nowhere for your words to go, so the reply is proof that the work is happening on your own computer and nowhere else.

The one moment it needs the internet

Downloading a model is the only step that needs a connection. After that first pull, the model lives on your disk and runs offline. You can even copy the files to a machine that has never been online and run it there.

04

What it is good for

A laptop-sized model is a capable everyday tool, not a match for the largest cloud systems. Knowing where it shines keeps the experience a good one.

It is well suited to drafting and rewriting text, summarising a document you paste in, brainstorming, explaining ideas, and helping with code. Because it runs locally, it is the natural choice for anything private or sensitive, a medical question, a draft of something personal, work you are not allowed to send to a third party, and for working with no connection at all.

The limits are worth stating plainly. A model small enough for a laptop is less capable than the biggest cloud models, so expect good help rather than the last word. It has no live access to the web unless you add a tool for that, so it does not know today’s news. Very long documents strain memory. The honest summary is that you are buying privacy and independence, and paying for them with some raw capability. For a great many tasks, that is a fair trade.

05

If something breaks

SymptomWhat to try
replies are slowUse a smaller model, close other heavy apps, and prefer a Q4 build. On a machine with no separate graphics card, a three-billion model is the comfortable floor.
it runs out of memory or crashesThe model is too big for your memory. Drop to a smaller size, or a more compressed build with a lower Q number.
the answers feel weakStep up to a larger model if your memory allows, or choose one tuned for your task, such as a coding model.
it will not use my graphics cardLook in the tool’s settings for a GPU option. Ollama and LM Studio detect most cards, but some need it switched on by hand.
I am low on disk spaceModels are large files. Remove ones you do not use with `ollama rm` or LM Studio’s model manager.
will it work with no internetYes, once the model is downloaded. Only that first download needs a connection.
06

Quick reference

WantDo
The simple terminal wayInstall Ollama, then run ollama run llama3.2.
The no-terminal wayInstall LM Studio, search a model, download, chat.
A model for 8 GB of memoryA three-billion model such as Llama 3.2.
A model for 16 GB of memoryAn eight-billion model such as Llama 3.1, Qwen 3 or Mistral.
Prove it is privateTurn the wifi off and ask it something.
Use it from other appsTurn on LM Studio’s local server.
Free up spaceRemove a model with ollama rm or in LM Studio.
07

Common questions

The questions people ask before they download their first model.

Is it really free?

Yes. Ollama and LM Studio are free, and the models are open weights you download once. There is no account and no subscription. The only cost is the disk space the models take and a machine capable enough to run them.

How good is it compared to ChatGPT or Claude?

A model sized for a laptop is not as sharp as the largest cloud models, and it is fair to expect that. A current model of seven or eight billion parameters is still useful for drafting, summarising and everyday questions. You trade some capability for privacy and for the ability to work offline.

What hardware do I need?

Eight gigabytes of memory runs a small model, sixteen is comfortable for a mid-sized one, and more lets you run larger models. Apple Silicon Macs handle this well because they share memory between processor and graphics. A dedicated graphics card makes it faster but is not required.

Does it work offline?

Yes, after the one-time download of the model. With the model on disk it answers with the wifi switched off, which is the clearest proof that nothing is being sent anywhere.

Does it train on my chats or send them anywhere?

No. The model does not learn from your conversations, and nothing leaves the machine. This is the opposite of the cloud assistants and their training settings, and it needs no settings to switch off because there is no server to send to.

What does quantised mean, and which should I pick?

It means the model's numbers are squeezed into fewer bits so the model fits in less memory, for a small loss of accuracy. A four-bit build, often labelled Q4, is the usual sweet spot of size against quality. The tools pick a sensible default, so you rarely choose by hand.

Which model should I start with?

On a modest laptop, a three-billion model such as Llama 3.2. With sixteen gigabytes of memory, an eight-billion model such as Llama 3.1, Qwen 3 or Mistral. Try one, see how it runs, and step up or down from there.

Can other apps use it?

Yes. Both LM Studio and Ollama can expose the model to other programs through a local server that speaks the same shape as the big providers' interfaces. A tool that expects a cloud model can often be pointed at yours instead, with nothing leaving your machine.