Project 04 · T-SQL + SAS PROC CLUSTER + Power BI

Customer Segmentation & Marketing Analytics

Segmented 500K policyholders into 6 behavioural clusters using SQL-based RFM scoring and SAS statistical clustering. A Power BI executive dashboard enables campaign targeting by segment, driving a 31% improvement in campaign response rates.

T-SQL SAS PROC CLUSTER SAS PROC FASTCLUS RFM Analysis Power BI Marketing Analytics
↑ 31%
Campaign Response Rate
500K
Customers Analysed
6
Customer Segments
↓ 55%
Opt-out Rate

Business Problem

The marketing team ran identical batch campaigns to all 500K policyholders, achieving only a 3.1% response rate and generating significant opt-out complaints. There was no customer segmentation, propensity scoring, or personalised messaging in place.

The goal was to identify meaningful customer groups and enable targeted campaigns with the right message to the right segment improving ROI and reducing churn from unwanted communications.

Solution Overview

  • Built an RFM (Recency, Frequency, Monetary) scoring model using T-SQL window functions on 500K policyholder records from the data warehouse
  • Applied SAS PROC FASTCLUS (k-means) to identify K=6 optimal clusters, validated with PROC CLUSTER dendrograms and Calinski-Harabasz index
  • Named and profiled each segment: Champions, Loyalists, At-Risk, Price-Sensitive, New Customers, Dormant
  • Built a Power BI marketing dashboard: segment filter, campaign ROI tracker, conversion funnel, and demographic overlay
  • Presented segment profiles and campaign recommendations to the CMO via PowerPoint
  • Campaign results tracked monthly segment-targeted campaigns showed 31% higher response rate vs prior baseline

Process Walkthrough

1

RFM Scoring (T-SQL)

Calculated Recency (days since last payment), Frequency (number of active policies), Monetary (total lifetime premium) using SQL window functions. Each dimension scored 1–5 using NTILE(5).

2

Cluster Analysis (SAS EG)

Exported RFM scores to SAS. Used PROC FASTCLUS with K=6 (determined by elbow method on PROC CLUSTER). Iterated with different seeds to ensure stability.

3

Segment Profiling

PROC MEANS and PROC FREQ analysed cluster characteristics. Added demographic overlays (age band, location, product mix) to create detailed segment personas for the marketing team.

4

Power BI Dashboard

Segment assignment table joined to campaign response logs. Dashboard shows: segment breakdown, response rate by segment over time, campaign ROI, and conversion funnel from contact to purchase.

5

Stakeholder Presentation

Presented 6 segment personas with recommended campaign strategies to the CMO and Marketing leadership. Findings led to a new segment-based targeting framework adopted company-wide.

Segmentation Analysis & Campaign Results

Customer Distribution by Segment

Campaign Response Rate by Segment (%)

Average Premium Value by Segment (₹)

Overall Response Rate: Before vs After Segmentation

Code Samples

-- RFM Scoring using T-SQL Window Functions
WITH rfm_base AS (
    SELECT
        c.CustomerID,
        c.CustomerName,
        c.Region,
        -- Recency: days since last payment
        DATEDIFF(day,
            MAX(p.PaymentDate),
            CAST(GETDATE() AS DATE)
        )                                   AS recency_days,
        -- Frequency: active policy count
        COUNT(DISTINCT pol.PolicyID)        AS frequency,
        -- Monetary: total lifetime premium
        SUM(pol.AnnualPremium)              AS monetary
    FROM       dbo.Customers  c
    INNER JOIN dbo.Policies   pol ON c.CustomerID = pol.CustomerID
    LEFT JOIN  dbo.Payments   p   ON pol.PolicyID = p.PolicyID
    WHERE  pol.PolicyStatus IN ('ACTIVE', 'LAPSED')
    GROUP BY c.CustomerID, c.CustomerName, c.Region
),
rfm_scored AS (
    SELECT *,
        -- Lower recency = more recent = higher score
        6 - NTILE(5) OVER (ORDER BY recency_days DESC) AS R_score,
        NTILE(5) OVER (ORDER BY frequency  ASC)                AS F_score,
        NTILE(5) OVER (ORDER BY monetary   ASC)                AS M_score
    FROM rfm_base
)
SELECT
    CustomerID, CustomerName, Region,
    recency_days, frequency, monetary,
    R_score, F_score, M_score,
    (R_score + F_score + M_score)       AS rfm_total,
    -- Composite segment label
    CASE
        WHEN R_score >= 4 AND F_score >= 4 AND M_score >= 4
        THEN 'Champions'
        WHEN R_score >= 3 AND F_score >= 3
        THEN 'Loyalists'
        WHEN R_score <= 2 AND F_score >= 3
        THEN 'At-Risk'
        WHEN M_score <= 2
        THEN 'Price-Sensitive'
        WHEN recency_days <= 365 AND F_score = 1
        THEN 'New Customer'
        ELSE 'Dormant'
    END                                 AS rfm_segment
FROM rfm_scored
ORDER BY rfm_total DESC;
/* K-means clustering on RFM scores */

/* Step 1: Determine optimal K using pseudo-R² */
PROC CLUSTER data=work.rfm_normalised
             method=ward outtree=work.tree
             plots=all;
  VAR r_norm f_norm m_norm;
RUN;

/* Step 2: K=6 final clustering */
PROC FASTCLUS data=work.rfm_normalised maxclusters=6
              maxiter=100 converge=0.001
              out=work.clustered
              outstat=work.cluster_stats;
  VAR r_norm f_norm m_norm;
RUN;

/* Step 3: Profile each cluster */
PROC MEANS data=work.clustered mean std min max;
  CLASS cluster;
  VAR   recency_days frequency monetary;
  OUTPUT OUT=work.cluster_profile mean= std= / autoname;
RUN;

/* Step 4: Assign segment labels */
PROC SQL;
  SELECT
      cluster,
      COUNT(*)                              AS n_customers,
      ROUND(MEAN(recency_days),1)          AS avg_recency,
      ROUND(MEAN(frequency),1)             AS avg_freq,
      ROUND(MEAN(monetary),0)              AS avg_monetary
  FROM  work.clustered
  GROUP BY cluster
  ORDER BY avg_monetary DESC;
QUIT;
-- Campaign ROI by Segment
Campaign ROI % =
DIVIDE(
    SUMX(
        Campaigns,
        Campaigns[Revenue_Generated] - Campaigns[Campaign_Cost]
    ),
    SUM( Campaigns[Campaign_Cost] ),
    0
) * 100

-- Response Rate
Response Rate % =
DIVIDE(
    COUNTROWS( FILTER( CampaignResults, CampaignResults[Responded] = TRUE ) ),
    COUNTROWS( CampaignResults ),
    0
)

-- Segment Size %
Segment Share % =
DIVIDE(
    COUNTROWS( Customers ),
    CALCULATE( COUNTROWS( Customers ), ALL( Customers[Segment] ) ),
    0
)

-- YoY Retention by Segment
YoY Retention Change =
VAR CY = CALCULATE( [Retention Rate], DATESYTD( DateTable[Date] ) )
VAR PY = CALCULATE( [Retention Rate],
           SAMEPERIODLASTYEAR( DateTable[Date] ) )
RETURN CY - PY

Segment Profile Summary

SegmentCustomersAvg Premium ₹Avg Tenure (yrs)Response RateCampaign Strategy
🏆 Champions48,20042,6008.464%Loyalty rewards, referral programme
💙 Loyalists112,40028,2006.151%Upsell, multi-policy discounts
⚠️ At-Risk76,80024,8004.838%Win-back offer, personalised call
💰 Price-Sensitive98,6009,4002.922%Value bundle, premium reduction
🆕 New Customers91,20015,1000.729%Onboarding nurture, education
😴 Dormant72,8007,2005.28%Low-cost reactivation email