The easiest way to run an LLM locally on your Mac
Last Update: Jun 7, 2024
I wrote a book! Check out A Quick Guide to Coding with AI.
Learn how to use Generative AI coding tools as a force multiplier for your career.
Use my code mlmorgan3 to get 50% off (Until Sept 27th).
I’ve written about running LLMs (large language models) on your local machine for a while now. I play with this sort of thing nearly every day. So, I’m always looking for cool things to do in this space and easy ways to introduce others (like you) to the world of LLMs. This is my latest installment.
While I’ve been using Ollama a ton lately, I saw this new product called LMstudio come up and thought I’d give it a shot. I’ll install it and try it out. This is my impression of LM Studio. But there’s a twist!
I’m doing this on my trusty old Mac Mini! Usually, I do LLM work with my Digital Storm PC, which runs Windows 11 and Arch Linux, with an NVidia 4090. It runs local models really fast. But I’ve wanted to try this stuff on my M1 Mac for a while now. I decided to try out LM Studio on it.
Will it work? Will it be fast? Let’s find out!
Installing LM Studio on Mac
This installation process couldn’t be any easier. I went to the LM Studio website and clicked the download button.
Then, of course, you just drag the app to your applications folder.
The first screen that comes up is the LM Studio home screen, and it’s pretty cool. It has a bunch of models listed, and you can click on them to see more information about them.
You can select a model and download it.
Running a Model Under an Inference Server
There’s a cool option here to run it as an inference server and write code to talk to it.
Running as an “inference server” loads up the model with an interface with minimal overhead. That way, you can talk directly to the model with an API, and it allows customizable interactions.
It even provides the code to run in several languages if you want to connect to it.
Running a Model as a Chat Server
You can run a chat server if you’re more familiar with things like ChatGPT.
Custom Options
You can use a few configuration options to work with these models. These options include preset styles and an option to use the Apple Metal GPU. Cool!
I found running the Microsoft Phi 2 model to be very responsive and generate clean results quickly. It has “real-time” chat speed and generates large amounts of texts fairly quick.
How does it run a 7B Model?
LM Studio has a nice home screen here that lists a bunch of models. Let’s try a 7B model. I run these routinely on my Windows machine with an RTX 4090, and I don’t think my M1 will get anywhere close, but it’s certainly worth a try.
I loaded it up and found it to be surprisingly fast. This is too slow for a chat model you’d run on a web page, for instance, if you wanted to simulate chatting with a real person. But it’s certainly useable for question-answer type prompting or code generation. It’s not too bad at all!
Let’s throw a our programming question at it.
It generated some cool code, pretty fast. I will experiment with this more in the coming days, it’s a really neat interface.
Should you Try This on Your Mac?
I was surprised by two things: The installation is SO easy. It’s as easy as installing any other application, and you can get going in minutes.
I was also surprised at how fast it is on the M1 Mac. I can only imagine it’s much quicker on M2 or M3 machine. It’s nowhere near as fast as these models run on my 4090, but they’re much faster than I expected them to be.
Whether you’re a seasoned LLM expert or just a little curious about LLMs and Generative AI, this is a great product to try out. Heck, it’s free, so why not?
You can download LM Studio here for Mac, Linux, and Windows.
Now go make some cool stuff!
Questions? Comments? Let me know!