Towards Data Science015 phút
Beyond the Straight Line: Choosing Between OLS, Interaction Terms, and Tweedie Regression
A practical guide comparing three regression approaches for datasets with zero-inflation and heavy tails: standard OLS, OLS with interaction terms, and Tweedie regression. Using the French Motor Third-Party Liability Claims dataset, the author demonstrates that OLS fails on zero-inflated insurance data, interaction terms provide only marginal improvement, while Tweedie regression reduces MAE by ~35%. A bonus two-step zero-inflated model combining LightGBM (claim occurrence classifier) with Tweedie regression (severity estimator) achieves an additional 21% MAE reduction, reaching an MAE of 87.79 versus OLS's 174.17.