Several tech enthusiasts have successfully quantized the Code Llama language models to 4-bit using MLX, allowing them to run super fast on Apple Silicon. The models include 70B, 13B, and 7B, with the 13B model almost complete. This development is expected to enable faster performance on devices such as phones and small computers, with potential applications in tasks like college essays, quick responses, and restaurant recommendations. The models have been made available in the 🤗 MLX Community, and the progress includes the upload of 4 MLX models, including Python and Instruct versions of the 7B and 13B models.
Just finished uploading 4 MLX models. Quantized Code Llama 7B and 13B, Python and Instruct! Link here https://t.co/T8yag84Qgs https://t.co/dFn25QGs5H
Apple AI: 4-bit quantized means they run fast on phones and other small computers. And three different models. Big to do college essays. Small to answer you fast. Medium to find you a restaurant to eat dinner at. New Siri! https://t.co/rxlhNrHj7k
4-bit quantized Code Llama models already in the 🤗 MLX Community! {70, 13, 7}B models here: https://t.co/dUgErUXnM3 1. pip install mlx-lm 2. python -m mlx_lm.generate --model mlx-community/CodeLlama-13b-Python-4bit --prompt "write a quick sort in C++" Thanks to…
I've just quantized CodeLlama 70b Instruct to 4-bit with MLX, you can now run this model super fast on Apple Silicon. Here's the link to the model! https://t.co/rYhattJNvn
"Run this model super fast on Apple Silicon." https://t.co/lmP1tvA2RS
I've just quantized CodeLlama 7b Python to 4-bit with MLX, meaning you can now run this model super fast on Apple Silicon. Here's the link to the model! https://t.co/Uenhv6rNpD By the end of the day, my goal is to add all the new models. The 13B one is almost done!
I've just quantized CodeLlama 7b to 4-bit with MLX, meaning you can now run this model super fast on Apple Silicon. Here's the link to the model! https://t.co/Uenhv6rNpD By the end of the day, my goal is to add all the new models. The 13B one is almost done!
This is how fast @ollama hosted Code Llama 70B writes the game Snake in Python. Probably the biggest Language Model, I ever ran on my MacBook Pro! What a beast! https://t.co/m21nLnLKyW