R-bloggers00 bình luận4 phút đọc2 giờ trước

Running local LLMs on your NPU from R with Foundry Local and ellmer

A walkthrough on running local LLMs on a Surface Pro 11's Neural Processing Unit (NPU) using Microsoft's Foundry Local and the R ellmer package. Since Ollama and LM Studio don't natively support NPU inference, the author adapted Microsoft's Python getting-started guide into R code. The solution starts the Foundry service, downloads and loads a model (Qwen2.5-0.5B), discovers the dynamic endpoint, resolves the model ID via the REST API, and connects ellmer's chat_openai_compatible to the local OpenAI-compatible endpoint to send prompts.

Đọc bài gốc

#r #local-ai

Nguồn: https://www.r-bloggers.com/2026/06/running-local-llms-on-your-npu-from-r-with-foundry-local-and-ellmer. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

XDA Developers14 phút22 giờ trướcAI

I tried PewDiePie's open-source AI workspace, and it's weirdly great

PewDiePie giới thiệu Odysseus, một workspace AI mã nguồn mở tự lưu trữ, tích hợp chat, agent tự động, nghiên cứu sâu, so sánh model, quản lý email, ghi chú, lịch, tác vụ và cả trình chỉnh sửa ảnh trong một dashboard Docker duy nhất. Người dùng có thể kết nối với các model cục bộ qua Ollama, llama.cpp, LM Studio hoặc vLLM, đồng thời tùy chọn sử dụng API đám mây. Quá trình cài đặt nhanh chóng chỉ mất khoảng 4 phút bằng cách clone repo và chạy docker compose, tạo nên một bộ công cụ năng suất AI toàn diện vượt xa giao diện chat thông thường.

Là lập trình viên muốn tự host và tối ưu hóa công cụ AI cá nhân mà không phụ thuộc vào các nền tảng bên ngoài, Odysseus sẽ giúp bạn tiết kiệm thời gian và chi phí trong việc tích hợp các tính năng từ chatbot đến xử lý tự động, đồng thời tiết lộ cách xây dựng một hệ sinh thái AI mạnh mẽ với Docker.

Running local LLMs on your NPU from R with Foundry Local and ellmer

Đề xuất cho bạn

I tried PewDiePie's open-source AI workspace, and it's weirdly great

My 7-year-old GPU runs local AI perfectly, and I don't need my cloud subscriptions anymore

I ran my local LLM for hours and watched it get dumber in real time

I switched my local LLM setup to Ollama's new MLX engine, and my Mac suddenly feels twice as fast

I run a 24GB GPU instead of paying for Claude or Codex, and Qwen 3.6 keeps up more than I expected

stick function for the EDA in time series

Using Local Coding Agents

Untitled