Part 13 — Design the Recommender System

Last Updated on June 22, 2026 by Editorial Team Author(s): Utkarsh Mittal Originally published on Towards AI. Part 13 — Design the Recommender System Part 12 — https://medium.com/p/75cf0a345156 The article explains how to design a production recommender system using a real end-to-end scenario and concrete latency, data, and training considerations. It argues that business objectives differ from what can be directly labeled, and that ranking (not simple classification) with measurable proxy signals is central. It outlines what the system must do and must never do under tight latency constraints, why scale forces a two-stage architecture (fast retrieval followed by richer ranking), and how cold-start and feedback sparsity shape training data. It covers how labels are constructed, how negatives and time-based splits avoid bias and leakage, how feature stores prevent training-serving skew, and how two-tower retrieval with dot products and softmax training works in practice. It then discusses ranking with baselines like LightGBM and richer wide & deep models, calibration and multi-task refinements, and evaluation using Recall@K for retrieval and NDCG@K for ranking with error slicing. Finally, it walks through the live 200ms execution pipeline, operational optimizations (quantization, batching, caching, fallbacks, shadow mode), online evaluation methods (A/B bucketing, interleaving), and the four major ways such systems “rot” via monitoring gaps, popularity spirals, offline-online mismatch, and training-serving skew—closing with what distinguishes mid-level, senior, and staff engineering work in recommender systems. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI