Local Model Guide

BrowserOS works great with local models for Chat Mode. Run models completely offline — your data never leaves your machine.

Context Length

Ollama defaults to 4,096 tokens of context — this is too low for BrowserOS. Below 15K tokens, the context overflows and the agent gets stuck in a loop constantly trying to recover. Only Chat Mode will work at low context lengths. Set at least 15,000–20,000 tokens for local models to function properly.

Set context length when starting Ollama:

OLLAMA_CONTEXT_LENGTH=20000 ollama serve

Increasing context length uses more VRAM. Run ollama ps to check your current allocation. See the Ollama context length docs for more details.

Setup

Ollama
LM Studio

The easiest way to run models locally.

Install Ollama

Download from ollama.com and install it.

Pull a model

ollama pull qwen/qwen3-4b

Start Ollama with higher context

OLLAMA_CONTEXT_LENGTH=20000 ollama serve

Configure in BrowserOS

Go to chrome://browseros/settings
Click USE on the Ollama card
Set Model ID to qwen/qwen3-4b
Set Context Window to 20000
Click Save

Nice GUI if you don’t want to use the terminal.

Install LM Studio

Download from lmstudio.ai and install it.

Load a model

Open LM Studio → Developer tab → load a model. It runs a server at http://localhost:1234/v1/.

Configure in BrowserOS

Go to chrome://browseros/settings
Click USE on the OpenAI Compatible card
Set Base URL to http://localhost:1234/v1/
Set Model ID to the model you loaded
Set Context Window to at least 20000
Click Save

Recommended Models

Pick a model based on your available RAM/VRAM. Smaller models are faster but less capable.

Lightweight (under 5 GB)

Good for machines with 8 GB RAM. Fast responses, suitable for simple chat tasks.

Model	Publisher	Params	Quant	Size
`qwen/qwen3-4b`	Qwen	4B	4bit	2.28 GB
`mistralai/ministral-3-3b`	Mistral	3B	Q4_K_M	2.99 GB
`deepseek-r1-distill-qwen-7b`	lmstudio-community	7B	Q4_K_M	4.68 GB
`deepseek-r1-distill-llama-8b`	lmstudio-community	8B	Q4_K_M	4.92 GB

Mid-range (10–15 GB)

Needs 16+ GB RAM. Better reasoning, handles longer conversations well.

Model	Publisher	Params	Quant	Size
`openai/gpt-oss-20b`	OpenAI	20B	MXFP4	12.11 GB
`mistralai/magistral-small`	Mistral	23.6B	4bit	13.28 GB
`mistralai/devstral-small-2-2512`	Mistral	24B	4bit	14.12 GB

Heavy (60+ GB)

For workstations with 64+ GB RAM. Closest to cloud model quality.

Model	Publisher	Params	Quant	Size
`openai/gpt-oss-120b`	OpenAI	120B	MXFP4	63.39 GB

Start with qwen/qwen3-4b if you’re unsure — it’s small, fast, and surprisingly capable for its size.

Get Started

Core Features

Integrations

Troubleshooting

Contributing

Context Length

Setup

Recommended Models

Lightweight (under 5 GB)

Mid-range (10–15 GB)

Heavy (60+ GB)

Get Started

Core Features

Integrations

Troubleshooting

Contributing

​Context Length

​Setup

​Recommended Models

​Lightweight (under 5 GB)

​Mid-range (10–15 GB)

​Heavy (60+ GB)

Context Length

Setup

Recommended Models

Lightweight (under 5 GB)

Mid-range (10–15 GB)

Heavy (60+ GB)