# open-computer-use

**Repository Path**: RexHuang936/open-computer-use

## Basic Information

- **Project Name**: open-computer-use
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-08-04
- **Last Updated**: 2025-10-08

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Open Computer Use

A secure cloud Linux computer powered by [E2B Desktop Sandbox](https://github.com/e2b-dev/desktop/) and controlled by open-source LLMs.

https://github.com/user-attachments/assets/3837c4f6-45cb-43f2-9d51-a45f742424d4

## Features

- Uses [E2B](https://e2b.dev) for secure [Desktop Sandbox](https://github.com/e2b-dev/desktop)
- Operates the computer via the keyboard, mouse, and shell commands
- Supports 10+ LLMs, [OS-Atlas](https://osatlas.github.io/)/[ShowUI](https://github.com/showlab/ShowUI) and [any other models you want to integrate](#llm-support)!
- Live streams the display of the sandbox on the client computer
- User can pause and prompt the agent at any time
- Uses Ubuntu, but designed to work with any operating system

## Design

![Open Computer Use Architecture](./assets/architecture.png#gh-dark-mode-only)
![Open Computer Use Architecture](./assets/architecture-light.png#gh-light-mode-only)

The details of the design are laid out in this article: [How I taught an AI to use a computer](https://blog.jamesmurdza.com/how-i-taught-an-ai-to-use-a-computer)

## LLM support

Open Computer Use is designed to make it easy to swap in and out new LLMs. The LLMs used by the agent are specified in [config.py](/os_computer_use/config.py) like this:

```
grounding_model = providers.OSAtlasProvider()
vision_model = providers.GroqProvider("llama3.2")
action_model = providers.GroqProvider("llama3.3")
```

The providers are imported from [providers.py](/os_computer_use/providers.py) and include:

- Fireworks, OpenRouter, Llama API:
  - Llama 3.2 (vision only), Llama 3.3 (action only)
- Groq:
  - Llama 3.2 (vision + action), Llama 3.3 (action only)
- DeepSeek:
  - DeepSeek (action only)
- Google:
  - Gemini 2.0 Flash (vision + action)
- OpenAI:
  - GPT-4o and GPT-4o mini (vision + action)
- Anthropic:
  - Claude (vision + action)
- HuggingFace Spaces:
  - OS-Atlas (grounding)
  - ShowUI (grounding)
- Moonshot
- Mistral AI (Pixtral for vision, Mistral Large for actions)

If you add a new model or provider, please [make a PR](../../pulls) to this repository with the updated providers.py!

## Get started

### Prerequisites

- Python 3.10 or later
- [git](https://git-scm.com/)
- [E2B API key](https://e2b.dev/dashboard?tab=keys)
- API key for an LLM provider (see above)

### 1. Install the prerequisites

In your terminal:

```sh
brew install poetry ffmpeg
```

### 2. Clone the repository

In your terminal:

```sh
git clone https://github.com/e2b-dev/open-computer-use/
```

### 3. Set the environment variables

Enter the project directory:

```
cd open-computer-use
```

Create a `.env` file in `open-computer-use` and set the following:

```sh
# Get your API key here: https://e2b.dev/
E2B_API_KEY="your-e2b-api-key"
```

Additionally, add API key(s) for any LLM providers you're using:
```
# You only need the API key for the provider(s) selected in config.py:
# Hugging Face Spaces do not require an API key.
FIREWORKS_API_KEY=...
OPENROUTER_API_KEY=...
LLAMA_API_KEY=...
GROQ_API_KEY=...
GEMINI_API_KEY=...
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
MOONSHOT_API_KEY=...
# Required: Provide your Hugging Face token to bypass Gradio rate limits.
HF_TOKEN=...
```

### 4. Start the web interface

Run the following command to start the agent:

```sh
poetry install
```

```sh
poetry run start
```

The agent will open and prompt you for its first instruction.

To start the agent with a specified prompt, run:

```sh
poetry run start --prompt "use the web browser to get the current weather in sf"
```

The display stream should be visible a few seconds after the Python program starts.