Project 02 · Base SAS + SAS EG + SQL Server + Power BI · ⭐ Featured

Policy Renewal Prediction & Retention Analysis

A logistic regression model built in Base SAS (PROC LOGISTIC) that scores all active policies 60 days before renewal, identifying at-risk accounts so retention teams can intervene proactively improving renewal rate by 9.4%.

Base SAS SAS EG PROC LOGISTIC SQL Server Power BI Predictive Modelling
↑ 9.4%
Renewal Rate Improvement
87%
Model AUC Accuracy
60 days
Early Warning Window
3 yrs
Historical Training Data

Business Problem

The retention team had no early-warning system for policies at risk of lapsing. Outreach was entirely reactive agents only contacted customers after a lapse had already occurred. This resulted in avoidable churn and revenue loss on policies that could have been retained with a timely intervention.

The goal was a scoring model that flagged at-risk policies at least 60 days before renewal, with enough precision to prioritise agent outreach effectively.

Solution Overview

  • Extracted 3 years of policy history using complex SQL joins across 5 normalised tables (Policy, Customer, Payment, Claims, Product)
  • Performed EDA in SAS EG identified 11 predictive features: payment behaviour, claims frequency, tenure, premium change %, product mix, and prior lapse history
  • Built logistic regression model using PROC LOGISTIC; validated with 5-fold cross-validation (AUC = 0.87, KS = 0.52)
  • Developed SAS macro to run weekly batch scoring on Monday mornings; scored output written to SQL Server staging table
  • Power BI dashboard consumed scored data, showing agents their top 50 at-risk accounts with recommended actions and lapse probability
  • Presented model methodology and business findings to leadership via PowerPoint

Process Walkthrough

1

Data Extraction (SQL)

Multi-table SQL query pulling 3 years of policy records (180K policies), joined to payment schedules, claims, customer demographics, and product data.

2

EDA & Feature Engineering (SAS EG)

PROC FREQ, PROC UNIVARIATE, PROC MEANS for univariate analysis. Created derived features: missed_payments_6m, claims_ratio, tenure_band, premium_increase_pct.

3

Model Training (Base SAS)

PROC LOGISTIC with stepwise variable selection, OUTROC / ROC statement for AUC. 5-fold cross-validation via SAS macro loop. Final model: 8 variables, AUC = 0.87.

4

Batch Scoring Automation

SAS macro scores all policies with renewal dates 50–70 days out. Output table includes lapse probability, risk band (High/Medium/Low), and top 3 risk drivers per policy.

5

Power BI Dashboard

Agents see their territory's at-risk list sorted by lapse probability. Dashboard filters by risk band, product line, and premium tier. Managers see team-level retention KPIs.

Model Performance & Data Insights

Feature Importance (Odds Ratio)

Lapse Rate by Risk Band

Retention Rate: Before vs After Model (Monthly)

Score Distribution – Lapsed vs Retained Policies

Code Samples

/* Logistic Regression – Policy Lapse Prediction */
/* Step 1: Create model training dataset */
DATA work.model_train;
  SET policy.features_2021_2023;
  WHERE renewal_date BETWEEN '01JAN2021'd AND '31DEC2023'd
    AND is_holdout = 0;
RUN;

/* Step 2: Logistic regression with stepwise selection */
PROC LOGISTIC data=work.model_train
              outmodel=work.lapse_model
              plots=(roc oddsratio);
  CLASS product_line tenure_band / param=ref;
  MODEL lapsed(event='1') =
        missed_payments_6m
        premium_increase_pct
        claims_freq_tenure
        cust_tenure_yrs
        products_held
        prior_lapse_flag
        product_line
        tenure_band
  / selection=stepwise slentry=0.05 slstay=0.05
    outroc=work.roc_data
    ctable pprob=(0.3 0.4 0.5);
  OUTPUT out=work.scored_train
         predicted=lapse_prob
         xbeta=log_odds;
RUN;

/* Step 3: Validate AUC */
PROC SQL;
  SELECT MAX(c_hat) AS AUC
  FROM   work.roc_data;
QUIT;
/* Weekly batch scoring macro */
/* Runs every Monday 05:00 via SAS Scheduler */
%MACRO score_renewal_risk(run_date=);

  /* 1. Pull policies renewing in 50-70 days */
  PROC SQL;
    CONNECT TO sqlserver
      (server="SQLSRV01" database="PolicyDB");
    CREATE TABLE work.to_score AS
    SELECT * FROM connection to sqlserver (
      SELECT * FROM dbo.vw_PolicyFeatures
      WHERE  renewal_date
             BETWEEN DATEADD(day,50,'&run_date')
                 AND DATEADD(day,70,'&run_date')
    );
    DISCONNECT FROM sqlserver;
  QUIT;

  /* 2. Score using saved model */
  PROC LOGISTIC inmodel=work.lapse_model;
    SCORE data=work.to_score
          out=work.scored_output
          predicted;
  RUN;

  /* 3. Assign risk bands */
  DATA work.scored_final;
    SET work.scored_output;
    LENGTH risk_band $6;
    IF      P_1 >= 0.65 THEN risk_band = 'HIGH';
    ELSE IF P_1 >= 0.40 THEN risk_band = 'MEDIUM';
    ELSE                     risk_band = 'LOW';
    score_date = today();
  RUN;

  /* 4. Write to SQL Server for Power BI */
  PROC SQL;
    CONNECT TO sqlserver
      (server="SQLSRV01" database="PolicyDB");
    EXECUTE(
      TRUNCATE TABLE dbo.lapse_scores
    ) BY sqlserver;
  QUIT;
  PROC APPEND base=sqlsrv.lapse_scores
              data=work.scored_final force; RUN;

%MEND score_renewal_risk;

/* Invoke macro */
%score_renewal_risk(run_date=%sysfunc(today(),date9.));
-- Feature extraction view for model training
CREATE VIEW dbo.vw_PolicyFeatures AS
SELECT
    p.PolicyID,
    p.CustomerID,
    p.RenewalDate,
    p.ProductLine,
    -- Missed payments in last 6 months
    SUM(CASE WHEN pay.PaymentStatus = 'MISSED'
              AND  pay.DueDate >= DATEADD(month,-6,GETDATE())
         THEN 1 ELSE 0 END)        AS missed_payments_6m,
    -- Premium change at last renewal
    ROUND(
      (p.CurrentPremium - p.PriorPremium)
      / NULLIF(p.PriorPremium,0) * 100, 2
    )                                   AS premium_increase_pct,
    -- Claims frequency
    COUNT(c.ClaimID) * 1.0
      / NULLIF(p.TenureYears,0)         AS claims_freq_tenure,
    p.TenureYears                       AS cust_tenure_yrs,
    pc.ProductsHeld                     AS products_held,
    p.PriorLapseFlag                    AS prior_lapse_flag,
    p.Lapsed                            AS lapsed  -- target variable
FROM       dbo.Policies   p
LEFT JOIN  dbo.Payments   pay ON p.PolicyID = pay.PolicyID
LEFT JOIN  dbo.Claims     c   ON p.PolicyID = c.PolicyID
INNER JOIN dbo.ProductCnt pc  ON p.CustomerID = pc.CustomerID
GROUP BY
    p.PolicyID, p.CustomerID, p.RenewalDate,
    p.ProductLine, p.CurrentPremium, p.PriorPremium,
    p.TenureYears, pc.ProductsHeld,
    p.PriorLapseFlag, p.Lapsed;

Sample Model Output At-Risk Policy List (Agent View)

Policy IDRenewal DateLapse ProbabilityRisk BandTop Risk DriverPremiumAgent Action
POL-8492129 Apr 202682%HIGH3 missed payments₹24,800Priority Call
POL-6103302 May 202674%HIGH+28% premium increase₹18,200Retention Offer
POL-5524705 May 202661%HIGHPrior lapse history₹31,500Priority Call
POL-7281908 May 202648%MEDIUMSingle product holder₹12,400Cross-sell Call
POL-3904211 May 202641%MEDIUM2 missed payments₹9,600Email Campaign