> ## Documentation Index
> Fetch the complete documentation index at: https://docs.browseros.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Bring Your Local Model

> Run AI models locally with Ollama or LM Studio for free, private, offline use

BrowserOS works great with local models for Chat Mode. Run models completely offline — your data never leaves your machine.

## Context Length

<Warning>
  **Ollama defaults to 4,096 tokens of context — this is too low for BrowserOS.** Below 15K tokens, the context overflows and the agent gets stuck in a loop constantly trying to recover. Only Chat Mode will work at low context lengths. Set at least **15,000–20,000 tokens** for local models to function properly.
</Warning>

Set context length when starting Ollama:

```bash theme={null}
OLLAMA_CONTEXT_LENGTH=20000 ollama serve
```

<Info>
  Increasing context length uses more VRAM. Run `ollama ps` to check your current allocation. See the [Ollama context length docs](https://docs.ollama.com/context-length) for more details.
</Info>

***

## Setup

<Tabs>
  <Tab title="Ollama" icon="terminal">
    The easiest way to run models locally.

    <Steps>
      <Step title="Install Ollama">
        Download from [ollama.com](https://ollama.com) and install it.
      </Step>

      <Step title="Pull a model">
        ```bash theme={null}
        ollama pull qwen/qwen3-4b
        ```
      </Step>

      <Step title="Start Ollama with higher context">
        ```bash theme={null}
        OLLAMA_CONTEXT_LENGTH=20000 ollama serve
        ```
      </Step>

      <Step title="Configure in BrowserOS">
        1. Go to `chrome://browseros/settings`
        2. Click **USE** on the Ollama card
        3. Set **Model ID** to `qwen/qwen3-4b`
        4. Set **Context Window** to `20000`
        5. Click **Save**

                   <img src="https://mintcdn.com/browseros/SQ44qH9ZJeym2Wd9/images/byollm--ollama-config.png?fit=max&auto=format&n=SQ44qH9ZJeym2Wd9&q=85&s=b97afe7be4d0bd97e6b7d9125fef7a0b" alt="Ollama in BrowserOS" width="4460" height="2930" data-path="images/byollm--ollama-config.png" />
      </Step>
    </Steps>
  </Tab>

  <Tab title="LM Studio" icon="desktop">
    Nice GUI if you don't want to use the terminal.

    <Steps>
      <Step title="Install LM Studio">
        Download from [lmstudio.ai](https://lmstudio.ai) and install it.
      </Step>

      <Step title="Load a model">
        Open LM Studio → **Developer** tab → load a model. It runs a server at `http://localhost:1234/v1/`.

        <img src="https://mintcdn.com/browseros/SQ44qH9ZJeym2Wd9/images/setting-up-lm-studio/lmstudio-step1.png?fit=max&auto=format&n=SQ44qH9ZJeym2Wd9&q=85&s=60501d94cbae2985b217046644c64756" alt="LM Studio" width="1818" height="1527" data-path="images/setting-up-lm-studio/lmstudio-step1.png" />
      </Step>

      <Step title="Configure in BrowserOS">
        1. Go to `chrome://browseros/settings`
        2. Click **USE** on the **OpenAI Compatible** card
        3. Set **Base URL** to `http://localhost:1234/v1/`
        4. Set **Model ID** to the model you loaded
        5. Set **Context Window** to at least `20000`
        6. Click **Save**

                   <img src="https://mintcdn.com/browseros/SQ44qH9ZJeym2Wd9/images/byollm--lmstudio-config.png?fit=max&auto=format&n=SQ44qH9ZJeym2Wd9&q=85&s=1dcca18d017584dc75e9e68ff5d3cefc" alt="LM Studio in BrowserOS" width="4460" height="2930" data-path="images/byollm--lmstudio-config.png" />
      </Step>
    </Steps>
  </Tab>
</Tabs>

***

## Recommended Models

Pick a model based on your available RAM/VRAM. Smaller models are faster but less capable.

### Lightweight (under 5 GB)

Good for machines with 8 GB RAM. Fast responses, suitable for simple chat tasks.

| Model                          | Publisher          | Params | Quant    | Size    |
| ------------------------------ | ------------------ | ------ | -------- | ------- |
| `qwen/qwen3-4b`                | Qwen               | 4B     | 4bit     | 2.28 GB |
| `mistralai/ministral-3-3b`     | Mistral            | 3B     | Q4\_K\_M | 2.99 GB |
| `deepseek-r1-distill-qwen-7b`  | lmstudio-community | 7B     | Q4\_K\_M | 4.68 GB |
| `deepseek-r1-distill-llama-8b` | lmstudio-community | 8B     | Q4\_K\_M | 4.92 GB |

### Mid-range (10–15 GB)

Needs 16+ GB RAM. Better reasoning, handles longer conversations well.

| Model                             | Publisher | Params | Quant | Size     |
| --------------------------------- | --------- | ------ | ----- | -------- |
| `openai/gpt-oss-20b`              | OpenAI    | 20B    | MXFP4 | 12.11 GB |
| `mistralai/magistral-small`       | Mistral   | 23.6B  | 4bit  | 13.28 GB |
| `mistralai/devstral-small-2-2512` | Mistral   | 24B    | 4bit  | 14.12 GB |

### Heavy (60+ GB)

For workstations with 64+ GB RAM. Closest to cloud model quality.

| Model                 | Publisher | Params | Quant | Size     |
| --------------------- | --------- | ------ | ----- | -------- |
| `openai/gpt-oss-120b` | OpenAI    | 120B   | MXFP4 | 63.39 GB |

<Tip>
  Start with `qwen/qwen3-4b` if you're unsure — it's small, fast, and surprisingly capable for its size.
</Tip>
