Developers are seeing significant speed improvements with Apple's MLX compiler for machine learning models on Apple Silicon chips. Benchmarks show impressive speed-ups, such as nearly 4x faster performance compared to T4 results. The latest benchmarks indicate that MLX CLIP on M1 chips outperforms V100 by a large margin, with a notable 3 ms processing time. Memory efficiency remains a concern, with one model using 37GB of RAM.
I benchmarked CLIP on V100 -> 16 ms MLX CLIP on my M1 -> 3 ms MLX CLIP has some improvements over GELU, but I didn't expect this difference can someone else benchmark as well to confirm? https://t.co/1AekQvmzlD
found mamba for mlx that works on apple silicon https://t.co/mti0fMpc9t seems memory inefficient (uses 37gb of ram for 2.8b model), but works
Happy Valentineβs Day MLX! Functions compiled with βmx.compileβ are now integrated in mlx-benchmark π§ͺ Here are the results on an M1 Pro. It would be nice to integrate benchmarks from other M chips @awnihannun @ivanfioravanti @digitalix https://t.co/S7Opsn9t4o
Apple MLX compile! mx.compile improvements πππ mlx 0.2.0 - mlx vs mps PReLU vs compiled_PReLU +21% vs +54% π₯ SeLU vs compiled_SeLU -43% vs +89% π₯π₯π₯ Thanks @awnihannun for the hint! https://t.co/9t5YnIDToX https://t.co/nL0wUz1FmB
I converted CLIP checkpoints to MLX and they're quite fast π¨ Find them hereπ - https://t.co/eItPFFm4v7 - https://t.co/T9EHIT7xF8 - https://t.co/ltdyxs04ev π Benchmarks are appreciated π€
I converted CLIP checkpoints to MLX and they're quite fast π¨ MBP M1 MLX vs T4 results in nearly 4x speed-up, I think I'll just develop on local π Find them π https://t.co/T9EHIT7xF8 https://t.co/eItPFFm4v7 https://t.co/ltdyxs04ev π