Project 05 · Base SAS + T-SQL + SSIS + Power BI
Designed and deployed a rule-based fraud scoring engine using SAS data steps and PROC SQL against 850+ claims per month. An SSIS batch pipeline automates nightly scoring; a Power BI operations dashboard gives investigators real-time fraud alerts preventing ₹2.4 Cr in fraudulent payouts in the first year.
The Special Investigations Unit (SIU) relied entirely on manual claim review and escalation by adjusters to flag suspicious claims. With 850+ claim submissions per month, only <5% received in-depth investigation and confirmed fraud accounted for an estimated 3.8% of total payouts.
The objective was to build an automated, data-driven fraud scoring system that flagged high-risk claims for investigator review before payment was released, and to reduce both fraud losses and false positive investigations.
Held workshops with SIU investigators to codify their tacit fraud knowledge into 18 explicit, testable rules. Each rule was assigned a weight (1–5) reflecting historical fraud-confirmation rates from the prior 3-year claims dataset.
Coded each rule in Base SAS data steps with indicator flags (0/1). PROC SQL aggregates rule weights into a total score and assigns tier labels. Parameterised thresholds so SIU can adjust cut-offs without code changes.
SSIS package: Extract new claims from SQL DW → Export to SAS-readable flat file → Execute Base SAS scoring job → Import scored output back to SQL Server staging → Trigger Power BI dataset refresh via REST API.
Five report pages: Alert Queue (priority list for today), Score Distribution, Rule Hit Analysis, Investigator Workload, Trend Analysis. RLS restricts each investigator to their assigned region's claims.
Monthly review of rule performance precision, recall, confirmed fraud rate per rule. Removed 2 low-precision rules, added 3 new network-linkage rules. False positive rate dropped from 28% → 16.5% post-tuning.
/* ─── Fraud Scoring Rule Engine ─── */
%LET high_threshold = 65;
%LET mid_threshold = 35;
DATA work.scored_claims;
SET work.claims_features;
/* Initialise score */
fraud_score = 0;
/* ── Category 1: Claimant Behaviour ── */
/* Rule 1: Multiple claims within 90 days */
IF claims_last_90d >= 2 THEN fraud_score + 15;
/* Rule 2: Address change within 30 days of claim */
IF addr_change_before_claim_days <= 30 THEN fraud_score + 10;
/* Rule 3: Claim filed same day as policy inception */
IF inception_to_claim_days <= 1 THEN fraud_score + 20;
/* ── Category 2: Provider Patterns ── */
/* Rule 4: Provider billed amount > 2x market benchmark */
IF provider_rate_ratio >= 2.0 THEN fraud_score + 18;
/* Rule 5: Provider involved in prior confirmed fraud */
IF provider_prior_fraud_flag = 1 THEN fraud_score + 25;
/* ── Category 3: Claim Characteristics ── */
/* Rule 6: Claimed amount exceeds policy sub-limit */
IF claim_to_sublimit_ratio >= 1.5 THEN fraud_score + 12;
/* Rule 7: Accident occurred on a Monday (high indicator) */
IF accident_weekday = 'MON' THEN fraud_score + 5;
/* ── Category 4: Network Linkage ── */
/* Rule 8: Shared phone with another active claim */
IF shared_phone_active_claim = 1 THEN fraud_score + 22;
/* Normalise to 0-100 */
fraud_score = MIN(fraud_score, 100);
/* Assign tier */
IF fraud_score >= &high_threshold THEN fraud_tier = 'HIGH';
ELSE IF fraud_score >= &mid_threshold THEN fraud_tier = 'MEDIUM';
ELSE fraud_tier = 'LOW';
/* Readable rationale string */
fraud_rationale = CATX('; ',
IFC(claims_last_90d >= 2, 'Multiple claims in 90d', ''),
IFC(provider_prior_fraud_flag=1, 'Provider fraud history', ''),
IFC(shared_phone_active_claim=1, 'Shared phone linkage', '')
);
RUN;
-- Build feature view for SAS scoring engine
CREATE OR ALTER VIEW dbo.vw_ClaimFraudFeatures
AS
WITH claim_history AS (
SELECT
c.ClaimantID,
COUNT(*) OVER (
PARTITION BY c.ClaimantID
ORDER BY c.ClaimDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_claims,
COUNT(*) FILTER (WHERE
c2.ClaimDate >= DATEADD(day, -90, c.ClaimDate)
) OVER (PARTITION BY c.ClaimantID) AS claims_last_90d
FROM dbo.Claims c
LEFT JOIN dbo.Claims c2
ON c.ClaimantID = c2.ClaimantID
),
provider_benchmark AS (
SELECT
ProviderID,
AVG(BilledAmount) AS avg_provider_bill,
pb.benchmark_amount,
AVG(BilledAmount) /
NULLIF(pb.benchmark_amount, 0) AS provider_rate_ratio
FROM dbo.Claims clm
INNER JOIN dbo.ProviderBenchmarks pb USING (ProviderID, ClaimType)
GROUP BY ProviderID, pb.benchmark_amount
),
network AS (
SELECT
c1.ClaimID,
CASE WHEN
EXISTS (
SELECT 1 FROM dbo.Claims c2
WHERE c2.ClaimantPhone = c1.ClaimantPhone
AND c2.ClaimID <> c1.ClaimID
AND c2.ClaimStatus = 'ACTIVE'
) THEN 1 ELSE 0 END AS shared_phone_active_claim
FROM dbo.Claims c1
)
SELECT
c.ClaimID, c.PolicyID, c.ClaimantID, c.ClaimDate,
ch.claims_last_90d,
pb.provider_rate_ratio,
pf.prior_fraud_confirmed AS provider_prior_fraud_flag,
DATEDIFF(day, pol.InceptionDate, c.ClaimDate) AS inception_to_claim_days,
n.shared_phone_active_claim,
DATENAME(dw, c.AccidentDate) AS accident_weekday,
c.ClaimAmount / NULLIF(pol.SubLimit, 0) AS claim_to_sublimit_ratio
FROM dbo.Claims c
INNER JOIN dbo.Policies pol ON c.PolicyID = pol.PolicyID
INNER JOIN claim_history ch ON c.ClaimantID = ch.ClaimantID
LEFT JOIN provider_benchmark pb ON c.ProviderID = pb.ProviderID
LEFT JOIN dbo.ProviderFraudList pf ON c.ProviderID = pf.ProviderID
INNER JOIN network n ON c.ClaimID = n.ClaimID
WHERE c.ClaimStatus IN ('PENDING', 'UNDER_REVIEW');
-- Row-Level Security: restrict to investigator's region
Investigator Security Filter =
USERPRINCIPALNAME() = RELATED( Investigators[Email] )
|| ISEMPTY(
FILTER( Investigators, Investigators[Role] = "SIU_MANAGER"
&& Investigators[Email] = USERPRINCIPALNAME() )
) = FALSE
-- High-risk claim count in current period
High Risk Claims =
CALCULATE(
COUNTROWS( FraudScores ),
FraudScores[FraudTier] = "HIGH"
)
-- Confirmed fraud rate %
Confirmed Fraud Rate =
DIVIDE(
CALCULATE( COUNTROWS( FraudScores ),
FraudScores[Outcome] = "CONFIRMED_FRAUD" ),
CALCULATE( COUNTROWS( FraudScores ),
FraudScores[FraudTier] IN { "HIGH", "MEDIUM" } ),
0
) * 100
-- Rolling False Positive Rate (3-month)
FP Rate (3M Rolling) =
VAR flaggedWindow =
CALCULATE( COUNTROWS(FraudScores),
FraudScores[FraudTier] <> "LOW",
DATESINPERIOD( DateTable[Date], LASTDATE(DateTable[Date]), -3, MONTH) )
VAR fpWindow =
CALCULATE( COUNTROWS(FraudScores),
FraudScores[Outcome] = "CLEARED",
DATESINPERIOD( DateTable[Date], LASTDATE(DateTable[Date]), -3, MONTH) )
RETURN DIVIDE( fpWindow, flaggedWindow, 0 ) * 100
| Rule | Category | Monthly Hits | Precision % | Confirmed Fraud % | Status |
|---|---|---|---|---|---|
| Provider prior fraud flag | Provider | 38 | 82% | 74% | Active |
| Shared phone linkage | Network | 55 | 71% | 63% | Active |
| Claim on inception day | Claimant | 12 | 78% | 71% | Active |
| Multiple claims in 90 days | Claimant | 74 | 54% | 46% | Active |
| Provider billing > 2x benchmark | Provider | 49 | 61% | 52% | Active |
| Claim amount > 1.5x sub-limit | Characteristic | 31 | 48% | 38% | Active |
| Address change 30d before claim | Claimant | 22 | 44% | 35% | Active |
| Weekend accident spike rule | Characteristic | 110 | 19% | 14% | Tuned ↓ |