DIY AI: Running Models on a Gaming Laptop for Beginners!

When DeepSeek AI burst onto the scene a week or two ago, it shook up the industry by proving that large language models can be made more efficient – in fact it’s possible to get the full DeepSeek model running on hardware that a mere mortal could acquire with a few thousand bucks. This shift raises an interesting question—can useful AI models run locally on consumer-grade computers now without relying on cloud-based data centers?

In my latest video, we take a look at running some “distilled” open source versions of DeepSeek and Meta’s Llama large language models. I’m surprised how far the quality of locally has come in such a short period of time.

To find out, I tested a distilled version of the DeepSeek model on a Lenovo Legion 5 laptop, which is equipped with an Nvidia 3070 GPU and 8GB of VRAM. The goal was to see if local AI could generate useful results at a reasonable speed.

The setup process was straightforward. After downloading and installing Nvidia’s CUDA toolkit to enable GPU acceleration. I then installed Ollama which is a command line interface for many of the available models. From there, it was just a matter of selecting and downloading an appropriate AI model. Since the full DeepSeek model requires an impractical 404GB of memory, I opted for the distilled 8B version, which uses 4.9GB of video memory.

With everything in place, I launched the model and checked that it was using the GPU correctly. The first test was a basic interaction in the command line. The DeepSeek model responded quickly and even displayed its thought process before generating a reply, which is a unique feature compared to traditional locally hosted chatbots. Performance-wise, it was surprisingly snappy for a locally run AI.

To gauge the model’s practical utility, I compared it to Meta’s open-source Llama model, selecting a similarly sized 8B variant. Performance between the two was comparable in terms of speed, but the responses varied. While DeepSeek’s output was structured and fairly coherent, Llama’s responses felt more refined in certain cases.

To take things further, I integrated Open WebUI, which provides a ChatGPT-style interface for easier interaction. This required installing Docker, but once set up, it significantly improved usability.

Next, I tested both models with a programming task—creating a simple Space Invaders game in a single HTML file. DeepSeek struggled, generating a mix of JavaScript and Python code that didn’t function correctly. Even when prompted differently, the results were inconsistent. The larger 14B version of DeepSeek running on my more powerful gaming PC did slightly better but still failed to produce a playable game. The Llama model performed marginally better, generating a somewhat functional version, but it was still far from the quality produced by cloud-based AI models like ChatGPT, which created a polished and working game on the first attempt.

For a different type of challenge, I had the models generate a blog post based on a video transcript. Initially, DeepSeek only provided an outline instead of a full narrative. After refining the prompt, it did produce something usable, though still less polished than ChatGPT’s output. Llama performed slightly better in this task, generating a clearer and more structured narrative after a nudge to get it out of its outlining mindset.

While local AI models aren’t yet on par with their cloud-based counterparts, the rapid improvements in efficiency suggest that practical, high-quality AI could soon run on everyday devices. Now that DeepSeek is pushing the industry to focus on optimization, it’s likely that smaller, more specialized models will become increasingly viable for local use.

For now, running AI on consumer hardware remains a work in progress. It’s come considerably far from where it was just a year ago, so it’ll be exciting to see what happens next.