Chủ đề

#data-science

Tin lập trình mới nhất về data-science, tóm tắt tiếng Việt bằng AI.

XDA Developers14 phút21 giờ trướcAI

My 7-year-old GPU runs local AI perfectly, and I don't need my cloud subscriptions anymore

Các mô hình MoE và kỹ thuật lượng tử hóa (quantization) cho phép chạy AI cục bộ trên GPU cũ 8GB VRAM như RTX 2070 Super, thay thế được các gói cloud nhờ các model như Qwen3-Coder 8B hay Gemma 4 E4B. Các công cụ như Ollama (dòng lệnh) hay LM Studio (GUI) giúp triển khai dễ dàng, nhưng cần lưu ý tốc độ sinh token, kích thước cửa sổ ngữ cảnh và hỗ trợ tool calling.

Nếu bạn đang tìm cách tiết kiệm chi phí và tăng hiệu suất cho các ứng dụng AI hàng ngày mà vẫn giữ được chất lượng cao, thì bài viết này sẽ cho bạn cách tối ưu hóa mô hình AI với GPU cũ và công nghệ MoE/quantization để làm việc hiệu quả mà không cần phụ thuộc vào cloud.

#data-science #llm #ollama Nguồn

Community Picks214 phút2 năm trướcAI

Which programming language to use for coding interviews

Lựa chọn ngôn ngữ lập trình (Python, Java) cho phỏng vấn coding ảnh hưởng lớn đến hiệu suất, nhưng quan trọng nhất là sự quen thuộc của bạn với ngôn ngữ đó. Chỉ nên học ngôn ngữ mới nếu vị trí yêu cầu chuyên môn cụ thể, còn thông thường không nên học chỉ để phục vụ phỏng vấn.

Là người tìm việc kỹ thuật, hiểu rõ các ngôn ngữ phổ biến trong các cuộc phỏng vấn như Python hay Java giúp bạn tự tin giải quyết bài tập nhanh chóng và tránh mất thời gian học mới khi gặp tình huống thực tế.

#data-science #data-structures

Community Picks144 phút2 năm trướcAI

Google Consent Mode v2

Google Consent Mode là tính năng quan trọng giúp quản lý sự đồng thuận của người dùng đối với cookie và dữ liệu, phiên bản v2 bổ sung các tham số mới như ad_user_data và ad_personalization để tối ưu hóa quảng cáo theo quy định bảo mật. Có thể triển khai thông qua Google Tag Manager, SDK hoặc chỉnh sửa trực tiếp mã nguồn.

Lập trình viên nên đọc bài này để hiểu cách tích hợp Google Consent Mode v2 vào dự án của mình để tuân thủ quy định GDPR, cải thiện trải nghiệm người dùng và tránh bị phạt vì vi phạm quyền riêng tư.

#data-science #advertising

InfluxData010 phút13 giờ trướcAI

How Mumu Migrated From Prometheus to InfluxDB and Tripled Their Metric Coverage

Mumu chuyển từ Prometheus sang InfluxDB 3 trong 3 tháng, tăng gấp ba số liệu giám sát từ 150 lên 560 nhờ mô hình push-based phù hợp hơn. Quá trình di chuyển sử dụng chiến lược dual-write qua Vector, tận dụng AI để sinh 80% cấu hình từ OpenAPI của InfluxDB, đồng thời xác thực bằng Grafana so sánh song song. Lợi ích chính gồm chi phí triển khai thấp hơn (thêm metric chỉ cần 1 HTTP call), hỗ trợ SQL cho truy vấn tức thì, và mở rộng quan sát hệ thống như CI/CD, môi trường per-developer.

Lập trình viên cần đọc bài này để hiểu cách chuyển đổi từ Prometheus sang InfluxDB giúp tối ưu hóa hiệu quả theo dõi sự kiện push riêng biệt, giảm chi phí triển khai và mở rộng khả năng truy vấn SQL cho các trường hợp sử dụng mới như CI/CD và phân tích hoạt động người dùng.

#data-science #observability

BleepingComputer06 phút20 giờ trướcAI

Google releases new privacy controls for activity history, personalization

Google bổ sung các điều khiển quyền riêng tư mới, tách biệt lịch sử hoạt động (Search Services History) và cá nhân hóa (Personalized Recommendations) cho Search cùng Google Play, thay vì gộp chung như trước. Theo mặc định, Google sẽ lưu trữ media (ảnh, âm thanh, video từ Google Lens, tìm kiếm bằng giọng nói) vào Search Services History nếu tính năng Web & App Activity đang bật, nhưng người dùng có thể tắt riêng mục này hoặc xóa từng mục đã lưu. Các cài đặt mới sẽ triển khai dần trong vài ngày tới.

Lập trình viên nên đọc để hiểu cách Google xử lý dữ liệu người dùng và cách bảo mật riêng tư trong ứng dụng, giúp họ phát triển các giải pháp bảo vệ dữ liệu hiệu quả hơn trong các sản phẩm công nghệ.

#data-science

databricks05 phút1 ngày trước

Databricks positioned highest in execution and furthest in vision for the second consecutive year in Gartner Magic Quadrant

Databricks announces it has been named a Leader in the 2026 Gartner Magic Quadrant for AI Platforms for Data Science and Machine Learning, holding the highest position in Ability to Execute and furthest in Completeness of Vision for the second consecutive year. The post highlights Databricks' unified platform philosophy combining lakehouse, Lakebase, Agent Bricks, and Unity Catalog to deliver governed, production-grade agentic applications. Key capabilities include centralized governance via Unity AI Gateway, support for frontier and open-source models, and tools for both developers and business users to build AI agents grounded in enterprise data.

#machine-learning #data-science

ByteByteGo015 phút1 ngày trước

Large Language Models vs Small Language Models

Small and large language models share transformer foundations but diverge sharply based on three constraints: deployment target, inference economics, and training budget. Architecturally, small models use grouped-query attention, sliding window attention, and shared KV caches to minimize memory footprint. Training-wise, they rely on high-quality synthetic data (e.g., Phi family), knowledge distillation from larger teachers (e.g., Gemma 2), and deliberate overtraining beyond compute-optimal ratios. Deployment involves quantization and hardware-specific tuning for devices like Apple's Neural Engine or NVIDIA Jetson. Small models have real gaps in generalization, multi-step reasoning, and world knowledge. Production systems increasingly compose both model classes using routing (small model handles easy requests, escalates hard ones), guardrails (small models filter input/output), and speculative decoding (small model drafts tokens, large model verifies). The key design insight is to start from constraints rather than benchmarks.

#data-science #llm

MIT Technology Review020 phút1 ngày trước

Heads in the game

MIT Sports Lab, co-founded in 2015, has become a key technology partner for major sports organizations. The lab played a central role in validating FIFA's semi-automated offside technology (SAOT) used at the 2022 World Cup, processing over 108,900 skeletal data points per second to ensure accuracy. Beyond soccer, the lab developed an Expected Action Value (EAV) metric for the NBA to quantify player decision-making quality, helped Adidas optimize 3D-printed midsole designs using biomechanical models, and conducted a COVID-19 stadium attendance analysis for the NFL. The lab bridges academic research and industry needs, connecting MIT students and faculty with professional sports organizations.

#data-science #computer-vision Nguồn