Towards AIblog

What Really Makes Cars Pollute? A Data Science Deep Dive into CO₂ Emissions

Friday, June 12, 2026Sai Bhargav RallapalliView original
Last Updated on June 14, 2026 by Editorial Team Author(s): Sai Bhargav Rallapalli Originally published on Towards AI. How I built a 98.8% accurate prediction model — and discovered that the “cleanest” fuel is hiding a dirty secret When the Global Automotive Council wants to reduce vehicle emissions, where do they start? Do they target fuel types? Engine sizes? Vehicle classes? The answer, it turns out, is not as straightforward as you’d think — and the data tells a story that completely contradicts common intuition. The article walks through a CO₂ emissions data science workflow: starting with a dataset of 7,000+ vehicles, the author cleans duplicates and verifies target distribution, then addresses multicollinearity (dropping redundant fuel-consumption columns) using variance inflation factor and Ridge regression for stability. They argue against removing high-emission outliers because those “top 1%” vehicles represent the category policy makers most need to regulate. The core result overturns raw fuel-type averages via Simpson’s Paradox: ethanol (E85) appears worst when averaged, but once the model controls for engine size and fuel consumption, ethanol is actually the cleanest fuel in the dataset—its benefit is “hidden” because it’s used in larger, higher-consuming engines. The author describes building a scikit-learn pipeline with one-hot encoding and evaluating performance (very high R², low error), then shows model weaknesses concentrated in rare alternative fuel categories and proposes policy recommendations tying fuel mandates to vehicle/engine constraints and targeted actions against super-emitters. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI