Medium0 Hot0 bình luận5 phút đọc1 giờ trước

Why Forecasting Infectious Diseases Is Harder Than Predicting Sales (And What Data Scientists Can Learn From It)

Forecasting infectious disease activity differs fundamentally from typical time-series problems like retail sales. Key lessons include: model performance varies significantly by country due to differing epidemiology; strongly seasonal diseases are paradoxically harder to forecast because peak timing and magnitude must all be correct; standard metrics like RMSE can mask critical errors in public health contexts; domain-informed feature engineering often outperforms algorithm-switching; and anomaly detection complements forecasting to catch unexpected divergences. The broader takeaway for data scientists is to shift focus from 'which model' to questions about validation strategy, meaningful metrics, adaptive pipelines, and knowing when a model is wrong.

Đọc bài gốc

#machine-learning #feature-engineering #time-series-forecasting

Nguồn: https://medium.com/@shrutikaushik15/why-forecasting-infectious-diseases-is-harder-than-predicting-sales-and-what-data-scientists-can-a5092d2a9d62. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

DevBlogs1 Hot5 phút1 giờ trướcAI

Enabling MLflow OpenAI Autolog on PySpark Workers

Khi phân phối các cuộc gọi LLM trên các worker PySpark bằng mapInPandas, MLflow's openai.autolog() không ghi lại traces do ba vấn đề: worker không kế thừa URI theo dõi và tên experiment từ driver, xuất traces bất đồng bộ gây xung đột thread khi kết thúc process, và không hỗ trợ liên kết trace cha-con. Giải pháp là thiết lập tracking URI, experiment name và tắt MLFLOW_ENABLE_ASYNC_TRACE_LOGGING=false trong hàm worker. Sau khi hoạt động, việc theo dõi từng cuộc gọi phát hiện chi phí ẩn do Spark lazy evaluation thực thi lại nhiều lần các cuộc gọi LLM.

Lập trình viên muốn tối ưu hóa và theo dõi hiệu suất mô hình ML trên Spark với OpenAI, đặc biệt khi sử dụng mapInPandas, nên đọc bài này để khắc phục lỗi trace không hoạt động và khám phá cách khắc phục vấn đề tái thực hiện LLM nhiều lần do tính chất lazy evaluation của Spark.

Why Forecasting Infectious Diseases Is Harder Than Predicting Sales (And What Data Scientists Can Learn From It)

Đề xuất cho bạn

Enabling MLflow OpenAI Autolog on PySpark Workers

Why Specialization Is Inevitable

Unlocking the Power of the TPU Stack: Introducing our new Developer Hub

ML Development in VS Code with Google Cloud Power: Workbench Extension Now Available

From a “Buzzword” to a “Direction” — How AI Pulled Me Into the World of Data

From Prompt to Classifier: A Production Case Study

SiMa Unveils Agentic Development Environment for Physical AI

How urban design leads to better wellness