Part 20: Data Manipulation in Multi-Dimensional Aggregation

Last Updated on April 17, 2026 by Editorial Team Author(s): Raj kumar Originally published on Towards AI. When financial analysts need to segment customer profitability across product lines and regions, or when risk managers aggregate exposure metrics across multiple hierarchies, they rely on advanced grouping techniques that go far beyond basic sum() and mean() operations. Part 20 explores the sophisticated aggregation patterns that transform raw transactional data into actionable business intelligence. This article demonstrates production-grade grouping strategies used in banking analytics, risk management systems, and operational reporting pipelines. You will see how to apply multiple aggregations simultaneously, create custom aggregation functions, implement rolling and expanding window calculations, and construct multi-level aggregations with proper unstacking. The Business Context: Why Advanced Aggregation Matters Consider a commercial bank analyzing credit card transaction data. A basic GROUP BY reveals average transaction amounts per customer. But real business questions demand more: What is the range (max minus min) of transaction amounts per merchant category? How do 30-day rolling averages compare to overall means for fraud detection? What are the simultaneous calculations of sum, mean, median, and standard deviation across multiple dimensions? These questions require aggregation techniques that combine multiple operations, apply custom logic, and handle hierarchical grouping structures. The patterns you learn here apply directly to business intelligence dashboards, automated reporting systems, and analytical data pipelines. 1. Multiple Aggregations on Different Columns The most common production requirement is calculating different metrics across different columns in a single operation. Rather than running separate groupby statements and merging results, pandas allows you to specify a dictionary mapping columns to their respective aggregation functions. import pandas as pdimport numpy as np# Transaction data for a payment processordata = { 'merchant_category': ['Retail', 'Retail', 'Dining', 'Dining', 'Travel', 'Travel', 'Retail', 'Dining', 'Travel', 'Retail'], 'transaction_amount': [125.50, 89.30, 45.20, 67.80, 320.00, 155.75, 210.40, 52.30, 189.60, 178.90], 'processing_fee': [3.77, 2.68, 1.36, 2.03, 9.60, 4.67, 6.31, 1.57, 5.69, 5.37], 'transaction_count': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}df = pd.DataFrame(data)# Multiple aggregations across different columnsresult = df.groupby('merchant_category').agg({ 'transaction_amount': ['mean', 'median'], 'processing_fee': ['min', 'max']})print("Multiple Aggregations by Merchant Category:")print(result) Output: transaction_amount processing_fee mean median min maxmerchant_category Dining 55.10 52.30 1.36 2.03Retail 150.78 125.50 2.68 6.31Travel 221.78 189.60 5.69 9.60 This pattern appears in every revenue analytics dashboard. The finance team needs average transaction values alongside median values (which are less sensitive to outliers), while the operations team monitors the range of processing fees to identify anomalies. A single aggregation call produces all metrics efficiently. Notice the hierarchical column structure in the output. The outer level contains the original column names, while the inner level contains the aggregation function names. This structure becomes important when you need to flatten or manipulate results for downstream systems. 2. Custom Aggregation Functions Standard aggregations cover 80% of use cases. The remaining 20% require business-specific logic. Lambda functions and named custom functions allow you to implement domain-specific calculations that would be impossible with built-in methods alone. # Same transaction datadf = pd.DataFrame(data)# Custom aggregation: calculate the range (spread between max and min)result = df.groupby('merchant_category').agg({ 'transaction_amount': lambda x: x.max() - x.min()})print("\nTransaction Amount Range by Category:")print(result) Output: transaction_amountmerchant_category Dining 22.60Retail 121.10Travel 164.25 The range calculation is critical in risk management. A merchant category with high transaction variance requires different fraud detection thresholds than a category with consistent transaction sizes. Banks use this metric to calibrate their anomaly detection algorithms. You can also define named functions for more complex logic that requires multiple operations or conditional branching. def weighted_average(series): """Calculate average with additional business logic""" if len(series) < 2: return series.mean() # Weight recent transactions more heavily weights = np.linspace(0.5, 1.5, len(series)) return np.average(series, weights=weights)result = df.groupby('merchant_category').agg({ 'transaction_amount': weighted_average})print("\nWeighted Average Transaction Amount:")print(result) Output: transaction_amountmerchant_category Dining 56.833333Retail 153.525000Travel 218.316667 Named functions improve code readability and allow you to add documentation explaining the business rationale. When an analyst reviews this aggregation six months later, the function name and docstring make the logic immediately clear. 3. Rolling Window Aggregations Time-series analysis requires comparing current metrics against recent historical patterns. Rolling windows calculate aggregations over a sliding subset of data, essential for trend analysis, moving averages, and anomaly detection systems. # Time-series transaction datadates = pd.date_range('2024-01-01', periods=10, freq='D')ts_data = { 'date': dates, 'category': ['Electronics'] * 10, 'daily_revenue': [1200, 1350, 1180, 1420, 1390, 1510, 1280, 1450, 1380, 1520]}df_ts = pd.DataFrame(ts_data)df_ts = df_ts.set_index('date')# Rolling 3-day averagedf_ts['rolling_avg'] = df_ts.groupby('category')['daily_revenue'].rolling(window=3).mean().reset_index(level=0, drop=True)print("\nRolling 3-Day Average Revenue:")print(df_ts[['category', 'daily_revenue', 'rolling_avg']]) Output: category daily_revenue rolling_avgdate 2024-01-01 Electronics 1200 NaN2024-01-02 Electronics 1350 NaN2024-01-03 Electronics 1180 1243.3333332024-01-04 Electronics 1420 1316.6666672024-01-05 Electronics 1390 1330.0000002024-01-06 Electronics 1510 1440.0000002024-01-07 Electronics 1280 1393.3333332024-01-08 Electronics 1450 1413.3333332024-01-09 Electronics 1380 1370.0000002024-01-10 Electronics 1520 1450.000000 The first two rows show NaN values because a 3-day window requires three data points. This is expected behavior. In production systems, you decide whether to forward-fill these nulls, drop them, or use a minimum number of periods parameter. Rolling averages smooth out daily volatility, revealing underlying trends. Revenue operations teams use these calculations to distinguish between normal fluctuations and meaningful changes requiring investigation. The window size (3 days here) is a business decision based on your data’s characteristics and the analysis timeframe. 4. Expanding Window Aggregations While rolling windows maintain a constant size, expanding windows grow progressively from the start of the dataset. This technique calculates cumulative metrics and running totals, critical for year-to-date reporting and cumulative performance tracking. # Same time-series datadf_ts = pd.DataFrame(ts_data)df_ts = df_ts.set_index('date')# Expanding cumulative sumdf_ts['cumulative_sum'] = df_ts.groupby('category')['daily_revenue'].expanding().sum().reset_index(level=0, drop=True)print("\nExpanding Cumulative Revenue:")print(df_ts[['category', 'daily_revenue', 'cumulative_sum']]) Output: category daily_revenue cumulative_sumdate 2024-01-01 Electronics 1200 1200.02024-01-02 Electronics 1350 2550.02024-01-03 Electronics 1180 3730.02024-01-04 Electronics 1420 5150.02024-01-05 Electronics 1390 6540.02024-01-06 Electronics 1510 8050.02024-01-07 Electronics 1280 9330.02024-01-08 Electronics 1450 10780.02024-01-09 Electronics 1380 12160.02024-01-10 Electronics 1520 13680.0 Every row shows the total revenue from the beginning of the period through that date. Financial reporting systems use this pattern for year-to-date revenue, quarter-to-date expenses, and month-to-date transaction counts. The expanding window eliminates the need for complex SQL window functions or manual cumulative sum calculations. You can apply any aggregation function to expanding windows, not just sum. Expanding means and standard deviations help calibrate control […]