Skip to main content

Command Palette

Search for a command to run...

How to Install & Run DeepSeek R1 Locally with GUI on Windows, Linux, and macOS | Step-by-Step Guide

Updated
โ€ข6 min read
How to Install & Run DeepSeek R1 Locally with GUI on Windows, Linux, and macOS | Step-by-Step Guide
S

I'm a curious Geek with an insatiable thirst to learn new technologies and enjoy the process every day. I aim to deliver high-quality services with the highest standards and cutting-edge DevOps technologies to make people's lives easier.

What is Deepseek R1 model?

DeepSeek-R1 is an advanced open-source artificial intelligence model developed by the Chinese startup DeepSeek. It is designed to excel in complex reasoning tasks, including mathematics, coding, and logical problem-solving. Notably, DeepSeek-R1 achieves performance comparable to leading models like OpenAI's o1, but with significantly lower development costs and computational requirements.

Significance of DeepSeek-R1:

  • Cost Efficiency: Developed with a budget of less than $6 million, DeepSeek-R1 challenges the high-cost approaches of competitors, making advanced AI more accessible.

  • Open-Source Accessibility: By open-sourcing DeepSeek-R1, DeepSeek promotes transparency and collaboration, allowing researchers and developers worldwide to study, modify, and enhance the model.

  • Technological Impact: The model's emergence has prompted a reevaluation of AI development strategies, emphasizing efficiency and innovation over sheer computational power.

Advantages of Running DeepSeek-R1 Locally:

  • Data Privacy: Processing data on local machines ensures that sensitive information remains secure, mitigating risks associated with transmitting data to external servers.

  • Customization: Running the model locally allows for tailored modifications to meet specific project requirements, facilitating experimentation and optimization.

  • Reduced Latency: Local deployment eliminates the need for internet-based API calls, resulting in faster response times crucial for real-time applications.

  • Cost Savings: Operating the model on local hardware can reduce expenses related to cloud-based services and data transfer.


Key Considerations for Running DeepSeek-R1 Locally

Before proceeding, keep the following DeepSeek-R1 models and their corresponding sizes in mind:

Parameters (B)Size (GB)
1.5B1.1 GB
7B4.7 GB
8B4.9 GB
14B9.0 GB
32B20 GB
70B43 GB
671B404 GB

When running DeepSeek-R1 locally on your computer, you should consider the following factors:

1. Hardware Requirements

  • VRAM (GPU Memory):

    • The different model sizes range from 1.1GB (1.5B model) to 404GB (671B model).

    • If you have a consumer-grade GPU (e.g., RTX 3060, 3070, 4080), you should opt for the 7B model (4.7GB VRAM required) or the 8B model (4.9GB VRAM required).

    • Larger models (14B, 32B, 70B) require more powerful GPUs with at least 10GB+ VRAM.

  • CPU and RAM:

    • A powerful CPU (e.g., AMD Ryzen 9 / Intel i9) is recommended for inference.

    • You will need at least double the VRAM size in system RAM. For example, if you run the 7B model (4.7GB VRAM), you should have at least 16GB RAM for smooth performance.

  • Storage:

    • Ensure you have enough disk space. The 7B model alone requires 4.7GB, while the larger models (70B+) need hundreds of gigabytes.

    • An NVMe SSD is preferable for faster model loading.

2. Software Requirements

  • CUDA / ROCm (for GPU Acceleration)

    • If you have an NVIDIA GPU, install the latest CUDA and cuDNN.

      • When you install the NVIDIA GeForce driver, it typically includes the CUDA runtime librariesโ€”the components needed to run CUDA-accelerated applications.
    • For AMD GPUs, you will need ROCm.

    • If you use CPU-only inference, performance will be significantly slower.

3. Model Selection

  • Choose a model that balances performance and hardware limitations:

    • 1.5B: Very lightweight, suitable for older GPUs or CPU-only.

    • 7B / 8B: Good for mid-range GPUs with 6GB+ VRAM.

    • 14B+: Requires high-end GPUs (e.g., RTX 3090, A100, H100).

4. Optimization & Performance

  • Quantization: Reducing precision (e.g., GGUF 4-bit, 8-bit quantization) helps reduce VRAM usage.

  • Batch Size / Context Length: Adjust to balance response quality and speed.

  • Multi-GPU: If you have multiple GPUs, some inference frameworks support model sharding.


Ollama and Deekseek Installation

Before starting with the installation process make sure that if you are using windows, your nvidia graphic card is up to date. you can download and install the latest version of the graphic card from here: https://www.nvidia.com/en-us/geforce/drivers/

To find out whether Ollama supports your GPU you can visit: https://github.com/ollama/ollama/blob/main/docs/gpu.md

First, install Ollama and let it run in the background:

curl -fsSL https://ollama.com/install.sh | sh

Next, download and install the model version that best fits your needs based on the explanation above.

To do this:

  1. Visit Ollamaโ€™s DeepSeek-R1 Library: https://ollama.com/library/deepseek-r1

  2. Choose your preferred model version (e.g., 8B).

  3. Copy the provided command and paste it into your terminal.

This installation process is the same for Windows, macOS, and Linux.

At this stage, you can start using DeepSeek-R1 directly from the command line. However, to create a more ChatGPT-like experience, we will install AnythingLLM for an enhanced user interface.


AnythingLLM Installation and Configuration

Configure AnythingLLM:

  • Open AnythingLLM after installation.

  • Follow the configuration steps as shown in the screenshots below to set it up properly.

This setup will enhance your experience by providing a ChatGPT-like interface for interacting with DeepSeek-R1 locally. ๐Ÿš€

Go to settings:

Configure your LLM provider and then go back to its main page:

Create a new workspace, give it a name, and save it:

Go to the settings of your workspace and configure it according to the screenshot:

After that choose default or new thread to start a new conversation:

Congratulations! You now have a powerful OpenAI-O1-like model running locally on your machine! ๐Ÿš€


AnythingLLM Alternative

If you're looking for an alternative to AnythingLLM, you can also use LM Studio. One key advantage of LM Studio is that you can install models directly from within the app, eliminating the need for manual downloads or additional setup.

How to Get Started with LM Studio

  1. Install LM Studio โ€“ Download and install the software from https://lmstudio.ai/

  2. Search for Your Model โ€“ Use the built-in search feature to find DeepSeek R1 or any other model.

  3. Install & Run โ€“ Click to install the model directly from the app and start chatting instantly.

This makes LM Studio a convenient and user-friendly option for running local AI models with minimal hassle. ๐Ÿš€

Then start asking questions:


Ollama commands

To see the version of Ollama installed on your system:

ollama -v

To see a list of installed models with Ollama:

ollama list

To see how Deepseek is performing on your system:

# First run deepseek in teminal:
ollama run deepseek-r1:8b --verbose
# To exit chat mode:
/bye

Ask a question and check the stats at the end.

To determine if the model is running on your CPU or GPU, use the following command:

ollama ps

Output:

NAME              ID              SIZE      PROCESSOR          UNTIL
deepseek-r1:8b    28f8fd6cdc67    6.3 GB    26%/74% CPU/GPU    28 seconds from now
๐Ÿ’ก
The larger the model, the more VRAM it requires. If your GPU runs out of available memory, the system may offload part of the workload to the CPU, resulting in slower performance.

To make sure that your system has detected your GPU, you can check the server.log located at C:\Users\<username>\AppData\Local\Ollama

Feel free to drop any questions in the comments sectionโ€”Iโ€™m happy to help! ๐Ÿ˜Š

192 views
M
Mina Nami1y ago

Thanks for sharing, was handy.

1
S

You are welcome. happy it was useful ๐Ÿ˜Š