Project 04 · T-SQL + SAS PROC CLUSTER + Power BI
Segmented 500K policyholders into 6 behavioural clusters using SQL-based RFM scoring and SAS statistical clustering. A Power BI executive dashboard enables campaign targeting by segment, driving a 31% improvement in campaign response rates.
The marketing team ran identical batch campaigns to all 500K policyholders, achieving only a 3.1% response rate and generating significant opt-out complaints. There was no customer segmentation, propensity scoring, or personalised messaging in place.
The goal was to identify meaningful customer groups and enable targeted campaigns with the right message to the right segment improving ROI and reducing churn from unwanted communications.
Calculated Recency (days since last payment), Frequency (number of active policies), Monetary (total lifetime premium) using SQL window functions. Each dimension scored 1–5 using NTILE(5).
Exported RFM scores to SAS. Used PROC FASTCLUS with K=6 (determined by elbow method on PROC CLUSTER). Iterated with different seeds to ensure stability.
PROC MEANS and PROC FREQ analysed cluster characteristics. Added demographic overlays (age band, location, product mix) to create detailed segment personas for the marketing team.
Segment assignment table joined to campaign response logs. Dashboard shows: segment breakdown, response rate by segment over time, campaign ROI, and conversion funnel from contact to purchase.
Presented 6 segment personas with recommended campaign strategies to the CMO and Marketing leadership. Findings led to a new segment-based targeting framework adopted company-wide.
-- RFM Scoring using T-SQL Window Functions
WITH rfm_base AS (
SELECT
c.CustomerID,
c.CustomerName,
c.Region,
-- Recency: days since last payment
DATEDIFF(day,
MAX(p.PaymentDate),
CAST(GETDATE() AS DATE)
) AS recency_days,
-- Frequency: active policy count
COUNT(DISTINCT pol.PolicyID) AS frequency,
-- Monetary: total lifetime premium
SUM(pol.AnnualPremium) AS monetary
FROM dbo.Customers c
INNER JOIN dbo.Policies pol ON c.CustomerID = pol.CustomerID
LEFT JOIN dbo.Payments p ON pol.PolicyID = p.PolicyID
WHERE pol.PolicyStatus IN ('ACTIVE', 'LAPSED')
GROUP BY c.CustomerID, c.CustomerName, c.Region
),
rfm_scored AS (
SELECT *,
-- Lower recency = more recent = higher score
6 - NTILE(5) OVER (ORDER BY recency_days DESC) AS R_score,
NTILE(5) OVER (ORDER BY frequency ASC) AS F_score,
NTILE(5) OVER (ORDER BY monetary ASC) AS M_score
FROM rfm_base
)
SELECT
CustomerID, CustomerName, Region,
recency_days, frequency, monetary,
R_score, F_score, M_score,
(R_score + F_score + M_score) AS rfm_total,
-- Composite segment label
CASE
WHEN R_score >= 4 AND F_score >= 4 AND M_score >= 4
THEN 'Champions'
WHEN R_score >= 3 AND F_score >= 3
THEN 'Loyalists'
WHEN R_score <= 2 AND F_score >= 3
THEN 'At-Risk'
WHEN M_score <= 2
THEN 'Price-Sensitive'
WHEN recency_days <= 365 AND F_score = 1
THEN 'New Customer'
ELSE 'Dormant'
END AS rfm_segment
FROM rfm_scored
ORDER BY rfm_total DESC;
/* K-means clustering on RFM scores */
/* Step 1: Determine optimal K using pseudo-R² */
PROC CLUSTER data=work.rfm_normalised
method=ward outtree=work.tree
plots=all;
VAR r_norm f_norm m_norm;
RUN;
/* Step 2: K=6 final clustering */
PROC FASTCLUS data=work.rfm_normalised maxclusters=6
maxiter=100 converge=0.001
out=work.clustered
outstat=work.cluster_stats;
VAR r_norm f_norm m_norm;
RUN;
/* Step 3: Profile each cluster */
PROC MEANS data=work.clustered mean std min max;
CLASS cluster;
VAR recency_days frequency monetary;
OUTPUT OUT=work.cluster_profile mean= std= / autoname;
RUN;
/* Step 4: Assign segment labels */
PROC SQL;
SELECT
cluster,
COUNT(*) AS n_customers,
ROUND(MEAN(recency_days),1) AS avg_recency,
ROUND(MEAN(frequency),1) AS avg_freq,
ROUND(MEAN(monetary),0) AS avg_monetary
FROM work.clustered
GROUP BY cluster
ORDER BY avg_monetary DESC;
QUIT;
-- Campaign ROI by Segment
Campaign ROI % =
DIVIDE(
SUMX(
Campaigns,
Campaigns[Revenue_Generated] - Campaigns[Campaign_Cost]
),
SUM( Campaigns[Campaign_Cost] ),
0
) * 100
-- Response Rate
Response Rate % =
DIVIDE(
COUNTROWS( FILTER( CampaignResults, CampaignResults[Responded] = TRUE ) ),
COUNTROWS( CampaignResults ),
0
)
-- Segment Size %
Segment Share % =
DIVIDE(
COUNTROWS( Customers ),
CALCULATE( COUNTROWS( Customers ), ALL( Customers[Segment] ) ),
0
)
-- YoY Retention by Segment
YoY Retention Change =
VAR CY = CALCULATE( [Retention Rate], DATESYTD( DateTable[Date] ) )
VAR PY = CALCULATE( [Retention Rate],
SAMEPERIODLASTYEAR( DateTable[Date] ) )
RETURN CY - PY
| Segment | Customers | Avg Premium ₹ | Avg Tenure (yrs) | Response Rate | Campaign Strategy |
|---|---|---|---|---|---|
| 🏆 Champions | 48,200 | 42,600 | 8.4 | 64% | Loyalty rewards, referral programme |
| 💙 Loyalists | 112,400 | 28,200 | 6.1 | 51% | Upsell, multi-policy discounts |
| ⚠️ At-Risk | 76,800 | 24,800 | 4.8 | 38% | Win-back offer, personalised call |
| 💰 Price-Sensitive | 98,600 | 9,400 | 2.9 | 22% | Value bundle, premium reduction |
| 🆕 New Customers | 91,200 | 15,100 | 0.7 | 29% | Onboarding nurture, education |
| 😴 Dormant | 72,800 | 7,200 | 5.2 | 8% | Low-cost reactivation email |