Chatting With LLaMA, Offline, Using a MacBook M1 Pro (32GB RAM)
by fdivitto in Workshop > Science
114 Views, 0 Favorites, 0 Comments
Chatting With LLaMA, Offline, Using a MacBook M1 Pro (32GB RAM)
How to run Meta LLaMA 2 AI model locally on a MacBook Pro M1.
Supplies
MacBook M1 Pro, 32 GB RAM
Request for LLaMA 2 Link
Fill Meta form here:
and wait for an email from Meta with link and download instructions.
Get the Downloader
Get the downloader:
git clone https://github.com/facebookresearch/llama
Download the Language Model
Enter in downloader folder and execute:
sh ./download.sh
The downloader will ask for the model to download (ie "7B") and the download link.
Convert the Model to HF (HuggingFace)
Rename the model folder leaving just the model size. For example if you downloaded llama-2-7b, rename it to "7B":
mv llama-2-7b 7B
Install FastChat (fschat), which is the inference engine for the language model:
pip3 install fschat
Download the tool to convert llama to HF:
wget https://github.com/huggingface/transformers/raw/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
Convert the model:
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
python3 convert_llama_weights_to_hf.py --input_dir ./ --model_size 7B --output_dir ./hf_7B
Run the Model - Chat!!
It is time to chat, just execute:
python3 -m fastchat.serve.cli --model-path ./hf_7b --device mps --load-8bit --style rich