Chatting With LLaMA, Offline, Using a MacBook M1 Pro (32GB RAM)

by fdivitto in Workshop > Science

114 Views, 0 Favorites, 0 Comments

Chatting With LLaMA, Offline, Using a MacBook M1 Pro (32GB RAM)

Next-generation-of-Llama-2-AI_header.jpg copy.jpg
Meta-AI-Llama.png

How to run Meta LLaMA 2 AI model locally on a MacBook Pro M1.

Supplies

MacBook M1 Pro, 32 GB RAM

Request for LLaMA 2 Link

Fill Meta form here:

https://ai.meta.com/llama/

and wait for an email from Meta with link and download instructions.

Get the Downloader

Get the downloader:

git clone https://github.com/facebookresearch/llama

Download the Language Model

Enter in downloader folder and execute:

sh ./download.sh


The downloader will ask for the model to download (ie "7B") and the download link.

Convert the Model to HF (HuggingFace)

Rename the model folder leaving just the model size. For example if you downloaded llama-2-7b, rename it to "7B":

mv llama-2-7b 7B


Install FastChat (fschat), which is the inference engine for the language model:

pip3 install fschat


Download the tool to convert llama to HF:

wget https://github.com/huggingface/transformers/raw/main/src/transformers/models/llama/convert_llama_weights_to_hf.py


Convert the model:

 export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
 python3 convert_llama_weights_to_hf.py --input_dir ./ --model_size 7B --output_dir ./hf_7B

Run the Model - Chat!!

Testing LLaMA 2 with a MacBook M1 Pro 32 GB

It is time to chat, just execute:

python3 -m fastchat.serve.cli --model-path ./hf_7b --device mps --load-8bit --style rich