XDA Developers00 bình luận5 phút đọc3 giờ trước

I switched my local LLM setup to Ollama's new MLX engine, and my Mac suddenly feels twice as fast

Ollama's new MLX engine delivers significant performance gains for local LLM inference on Apple Silicon Macs. The update leverages Apple's unified memory architecture more effectively, combines GPU operations into larger Metal kernels via MLX's JIT compiler, and improves GPU-backed token sampling — resulting in roughly 20% higher output speed over the previous Q4_K_M implementation. Quality also improves through support for NVIDIA's NVFP4 quantization format, which cuts quality loss by about half compared to Q4_K_M at similar memory usage. A redesigned snapshot-based caching system replaces traditional prefix caching for agent workflows, allowing coding assistants like Claude Code and Aider to resume from saved model states rather than rebuilding context on every tool call — meaningfully reducing latency in multi-agent setups.

Đọc bài gốc

#data-science #ollama #local-ai

Nguồn: https://www.xda-developers.com/ollama-new-mlx-engine-local-llm-mac-twice-fast. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

freeCodeCamp210 phút15 giờ trướcAI

How to Build a Personal AI Web Research Agent with Ollama and Qwen

Hướng dẫn từng bước xây dựng một agent nghiên cứu web AI cục bộ bằng Ollama, mô hình Qwen3.5:4b và Python. Agent này nhận lệnh nghiên cứu, tìm kiếm 5 kết quả web hàng đầu qua API tìm kiếm web của Ollama, trích xuất văn bản bằng BeautifulSoup, sau đó tóm tắt bằng mô hình Qwen chạy cục bộ. Kết quả được lưu dưới dạng file Markdown có dấu thời gian, hoạt động hoàn toàn trên thiết bị mà không tốn phí API hay xâm phạm quyền riêng tư.

Lập trình viên muốn tự động hóa công việc nghiên cứu web một cách hiệu quả, tiết kiệm chi phí và bảo mật dữ liệu cá nhân nên đọc bài này để xây dựng một hệ thống AI cá nhân hoạt động trên thiết bị riêng của mình.

#python

I switched my local LLM setup to Ollama's new MLX engine, and my Mac suddenly feels twice as fast

Đề xuất cho bạn

How to Build a Personal AI Web Research Agent with Ollama and Qwen

I tried PewDiePie's open-source AI workspace, and it's weirdly great

From Local LLM to Tool-Using Agent

My 7-year-old GPU runs local AI perfectly, and I don't need my cloud subscriptions anymore

Which programming language to use for coding interviews

Google Consent Mode v2

Running local LLMs on your NPU from R with Foundry Local and ellmer

I ran my local LLM for hours and watched it get dumber in real time