Portfolio Project · Python · Analysis & Forecasting

Revenue Anomaly Detector

A statistical anomaly detection script that flags accounts whose revenue deviates from their own historical pattern — catching sudden drops, billing-error spikes, and slow erosion before they show up in a quarterly report.

Python Pandas NumPy Matplotlib Simulated Data · No real client info
30
Accounts Analyzed
720
Revenue Records
48
Anomalies Flagged
6.7%
Exception Rate
4
High-Priority Accounts
Portfolio simulation notice: All account names and revenue figures in this case study are entirely fictitious, generated programmatically for portfolio demonstration purposes. No real client, billing, or proprietary revenue data has been used or referenced.

The Case Study

Five-Step Walkthrough

1
The Problem
Why a backward-looking audit isn't enough

The Excel and SQL projects in this portfolio both look backward — they verify invoices that have already been billed and reconcile payments that have already been collected. But by the time an exception report flags a problem, the revenue impact has often compounded for several billing cycles.

Revenue Operations teams need a forward-looking signal: which accounts are behaving differently from their own normal pattern right now, before the deviation becomes a quarter-end surprise. A single month-over-month comparison isn't enough — a 15% dip might be totally normal for a seasonal account and alarming for a steady one. The detection logic needs to be relative to each account's own history, not a flat company-wide threshold.

  • Flat thresholds ("alert if revenue drops >10%") produce false positives on naturally volatile accounts and miss real problems on stable ones
  • Manual review of 30+ accounts across 24 months of history doesn't scale
  • Slow erosion — a few percent decline every month — rarely trips a single-month alert but compounds into serious revenue loss
  • No way to rank which anomalies deserve attention first when several flag in the same week
2
The Simulated Dataset
30 accounts, 24 months, 5 seeded scenarios

generate_data.py builds 720 rows of monthly revenue across 30 fictitious SaaS accounts spanning July 2024 through June 2026. Each account starts with its own baseline MRR, its own gentle growth or decline trend, and its own natural month-to-month noise — so the simulated data doesn't look artificially uniform.

Five deliberate anomalies are seeded on top of that baseline, each shaped differently so the detection logic has to handle more than one pattern type:

  • Apex Holdings — sudden churn-style drop to ~30% of prior revenue, staying low (a customer that's leaving)
  • Lighthouse Media — one-month spike to nearly 3x normal revenue, then reverting (a billing error, not real growth)
  • Northgate Partners — slow erosion across 12 consecutive months, a few percent at a time (easy to miss month-to-month)
  • Sterling Capital — sharp one-month spike followed by a gradual climb to a genuinely higher baseline (a real expansion, not an error)
  • Crestline Industries — sudden drop followed by partial recovery the next month (a service disruption that got partially resolved)
3
Detection Logic
Rolling baseline, z-score, and erosion check

The core method: for every account, compute a trailing 6-month rolling mean and rolling standard deviation, shifted forward one month so the current month is being compared against its own prior baseline — not a baseline that includes itself.

Rolling Baseline
rolling_mean = revenue.rolling(6, min_periods=5).mean().shift(1)
Each account's own trailing 6 months become its personal baseline — a volatile account and a stable account each get judged against their own history, not a single company-wide rule.
Std Floor (stability fix)
rolling_std = rolling_std.clip(lower = rolling_mean * 0.03)
Without this floor, a few unusually flat trailing months produce a near-zero standard deviation, which sends the z-score to extreme and meaningless values. Flooring it at 3% of the rolling mean keeps the score stable.
Z-Score Classification
z = (revenue - rolling_mean) / rolling_std
flag if |z| ≥ 2.5
Months that land more than 2.5 standard deviations from the account's own baseline are flagged as a point anomaly — classified as SPIKE if above baseline, DROP if below.
Erosion Check
erosion = revenue.pct_change(periods=6) ≤ -30%
A secondary check that catches slow decline a single-month z-score would miss — if revenue has fallen 30%+ over the trailing 6 months even without one dramatic month, it's flagged as EROSION.
Severity Score
severity = |z-score| or |erosion %| × 100
Point anomalies and erosion are measured on different scales (standard deviations vs. percentage decline). The severity score normalizes both onto one comparable number so every anomaly type can be ranked together.
Priority Tiering
HIGH / MEDIUM / LOW based on severity thresholds per type
Converts the raw severity score into an executive-friendly triage label, so a RevOps manager can scan the report and know immediately which accounts need a conversation this week.
4
Code & Output Preview
The classification function and ranked report

The classification logic that turns raw z-scores and erosion flags into a labeled anomaly type:

Python — pandas
def classify(row):
    # Point anomalies (single dramatic month) take priority over erosion
    # since a sudden drop or spike is more urgent than a gradual trend
    if row["point_anomaly"] and row["z_score"] > 0:
        return "SPIKE"
    elif row["point_anomaly"] and row["z_score"] < 0:
        return "DROP"
    elif row["erosion_flag"]:
        return "EROSION"
    else:
        return "NORMAL"

g["anomaly_type"] = g.apply(classify, axis=1)

# Severity score unifies point anomalies and erosion onto one scale
g["severity_score"] = np.where(
    g["anomaly_type"] == "EROSION",
    g["erosion_pct"].abs() * 100,
    g["z_score"].abs().fillna(0)
)

Top of the ranked anomaly report

accountmonthrevenuerolling_meanpct_vs_baselinetypeseverity
Apex Holdings2026-04$8,954.90$24,896.35-64.0%EROSION73.09
Apex Holdings2026-03$9,594.54$28,832.55-66.7%EROSION71.11
Lighthouse Media2025-10$115,321.82$45,438.23+153.8%SPIKE31.21
Sterling Capital2025-04$50,884.28$29,359.53+73.3%SPIKE21.32
Apex Holdings2026-02$11,193.63$32,642.42-65.7%DROP20.77
Crestline Industries2026-01$6,543.81$14,552.17-55.0%DROP18.34
5
Results & Visualization
Chart of the top flagged accounts

The script outputs a four-panel chart of the top flagged accounts, showing each account's revenue line against its own rolling baseline with anomalies marked at the point of deviation. Gradient fills and glowing markers make the deviation points immediately visible without needing to read the underlying numbers.

Revenue
Rolling Baseline
SPIKE
DROP
EROSION
Revenue vs Rolling Baseline — Apex Holdings
Month-over-Month % Change

Apex Holdings shows the churn-style drop staying low across three months. Lighthouse Media shows the single-month billing-error spike reverting immediately after. Crestline Industries shows a sharp drop with partial recovery. Sterling Capital shows a spike followed by a genuinely higher sustained baseline — the model correctly stops flagging it as anomalous once the new baseline catches up.

Outcomes

What the Detection Run Produced

6.7%
Of all 720 account-months flagged — a clean signal, not noise
4 of 5
Seeded anomaly scenarios surfaced at the very top of the ranked report
3 types
SPIKE, DROP, and EROSION all detected by one unified scoring method
Self-correcting
Sterling Capital's new higher baseline stopped triggering once the rolling window caught up — no manual threshold reset needed

Technical Skills

Tools & Techniques Used

Python & Data Logic

Pandas NumPy Matplotlib Rolling Window Statistics Z-Score Anomaly Detection groupby + apply Custom Classification Functions Vectorized Conditional Logic (np.where) Gradient Fill Visualization Theme-Aware Chart Rendering

Business & Revenue Ops Domains

Revenue Anomaly Detection Churn Signal Identification Billing Error Detection Trend Erosion Analysis Severity Scoring Executive Triage / Priority Tiers Forward-Looking Forecasting Signal

Download the script

Run it against your own monthly revenue export — swap the CSV, the detection logic doesn't change.