Project 05 · Base SAS + T-SQL + SSIS + Power BI

Insurance Fraud Detection Analytics

Designed and deployed a rule-based fraud scoring engine using SAS data steps and PROC SQL against 850+ claims per month. An SSIS batch pipeline automates nightly scoring; a Power BI operations dashboard gives investigators real-time fraud alerts preventing ₹2.4 Cr in fraudulent payouts in the first year.

Base SAS PROC SQL T-SQL SSIS Power BI Fraud Analytics Rule Engine
₹2.4 Cr
Fraud Prevented (Yr 1)
↓ 41%
False Positive Rate
18
Detection Rules
850+
Claims Scored / Month

Business Problem

The Special Investigations Unit (SIU) relied entirely on manual claim review and escalation by adjusters to flag suspicious claims. With 850+ claim submissions per month, only <5% received in-depth investigation and confirmed fraud accounted for an estimated 3.8% of total payouts.

The objective was to build an automated, data-driven fraud scoring system that flagged high-risk claims for investigator review before payment was released, and to reduce both fraud losses and false positive investigations.

Solution Overview

  • Designed 18 rule-based fraud indicators across four categories: claimant behaviour, provider patterns, claim characteristic anomalies, and network linkage signals
  • Implemented the rule engine in Base SAS (data step + PROC SQL) each rule adds a weighted score; total fraud probability score 0–100 triggers three tiers: Low / Medium / High
  • SSIS package runs nightly at 02:00, extracts new claims from the DW, executes the SAS scoring batch job, and writes scores to the SQL staging table for Power BI refresh
  • Power BI SIU dashboard shows: fraud score distribution, alert queue by tier, rule hit frequency, adjuster performance, and monthly trend
  • Every flagged claim includes a "fraud rationale card" readable by investigators without data expertise
  • After 12 months, confirmed fraud prevention tracked at ₹2.4 Cr; false-positive rate fell from 28% to 16.5% after rule tuning iterations

Process Walkthrough

1

Rule Design Workshop

Held workshops with SIU investigators to codify their tacit fraud knowledge into 18 explicit, testable rules. Each rule was assigned a weight (1–5) reflecting historical fraud-confirmation rates from the prior 3-year claims dataset.

2

SAS Rule Engine

Coded each rule in Base SAS data steps with indicator flags (0/1). PROC SQL aggregates rule weights into a total score and assigns tier labels. Parameterised thresholds so SIU can adjust cut-offs without code changes.

3

SSIS Automation

SSIS package: Extract new claims from SQL DW → Export to SAS-readable flat file → Execute Base SAS scoring job → Import scored output back to SQL Server staging → Trigger Power BI dataset refresh via REST API.

4

Power BI SIU Dashboard

Five report pages: Alert Queue (priority list for today), Score Distribution, Rule Hit Analysis, Investigator Workload, Trend Analysis. RLS restricts each investigator to their assigned region's claims.

5

Monitoring & Rule Tuning

Monthly review of rule performance precision, recall, confirmed fraud rate per rule. Removed 2 low-precision rules, added 3 new network-linkage rules. False positive rate dropped from 28% → 16.5% post-tuning.

Fraud Analytics Dashboard Insights

Fraud Score Distribution (%)

Monthly Flagged vs Confirmed Fraud Claims

Top 8 Rule Hit Frequency (Monthly Avg)

False Positive Rate Before vs After Rule Tuning

Code Samples

/* ─── Fraud Scoring Rule Engine ─── */

%LET high_threshold = 65;
%LET mid_threshold  = 35;

DATA work.scored_claims;
    SET  work.claims_features;
    
    /* Initialise score */
    fraud_score = 0;
    
    /* ── Category 1: Claimant Behaviour ── */
    /* Rule 1: Multiple claims within 90 days */
    IF claims_last_90d >= 2                THEN fraud_score + 15;
    
    /* Rule 2: Address change within 30 days of claim */
    IF addr_change_before_claim_days <= 30  THEN fraud_score + 10;
    
    /* Rule 3: Claim filed same day as policy inception */
    IF inception_to_claim_days <= 1        THEN fraud_score + 20;
    
    /* ── Category 2: Provider Patterns ── */
    /* Rule 4: Provider billed amount > 2x market benchmark */
    IF provider_rate_ratio >= 2.0           THEN fraud_score + 18;
    
    /* Rule 5: Provider involved in prior confirmed fraud */
    IF provider_prior_fraud_flag = 1        THEN fraud_score + 25;
    
    /* ── Category 3: Claim Characteristics ── */
    /* Rule 6: Claimed amount exceeds policy sub-limit */
    IF claim_to_sublimit_ratio >= 1.5       THEN fraud_score + 12;
    
    /* Rule 7: Accident occurred on a Monday (high indicator) */
    IF accident_weekday = 'MON'             THEN fraud_score + 5;
    
    /* ── Category 4: Network Linkage ── */
    /* Rule 8: Shared phone with another active claim */
    IF shared_phone_active_claim = 1        THEN fraud_score + 22;
    
    /* Normalise to 0-100 */
    fraud_score = MIN(fraud_score, 100);
    
    /* Assign tier */
    IF      fraud_score >= &high_threshold  THEN fraud_tier = 'HIGH';
    ELSE IF fraud_score >= &mid_threshold   THEN fraud_tier = 'MEDIUM';
    ELSE                                        fraud_tier = 'LOW';
    
    /* Readable rationale string */
    fraud_rationale = CATX('; ',
        IFC(claims_last_90d >= 2,        'Multiple claims in 90d',   ''),
        IFC(provider_prior_fraud_flag=1, 'Provider fraud history',   ''),
        IFC(shared_phone_active_claim=1, 'Shared phone linkage',     '')
    );
RUN;
-- Build feature view for SAS scoring engine
CREATE OR ALTER VIEW dbo.vw_ClaimFraudFeatures
AS
WITH claim_history AS (
    SELECT
        c.ClaimantID,
        COUNT(*) OVER (
            PARTITION BY c.ClaimantID
            ORDER BY c.ClaimDate
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        )                                             AS cumulative_claims,
        COUNT(*) FILTER (WHERE
            c2.ClaimDate >= DATEADD(day, -90, c.ClaimDate)
        ) OVER (PARTITION BY c.ClaimantID)           AS claims_last_90d
    FROM dbo.Claims c
    LEFT JOIN dbo.Claims c2
        ON c.ClaimantID = c2.ClaimantID
),
provider_benchmark AS (
    SELECT
        ProviderID,
        AVG(BilledAmount)    AS avg_provider_bill,
        pb.benchmark_amount,
        AVG(BilledAmount) /
            NULLIF(pb.benchmark_amount, 0) AS provider_rate_ratio
    FROM dbo.Claims clm
    INNER JOIN dbo.ProviderBenchmarks pb USING (ProviderID, ClaimType)
    GROUP BY ProviderID, pb.benchmark_amount
),
network AS (
    SELECT
        c1.ClaimID,
        CASE WHEN
            EXISTS (
                SELECT 1 FROM dbo.Claims c2
                WHERE  c2.ClaimantPhone = c1.ClaimantPhone
                  AND  c2.ClaimID <> c1.ClaimID
                  AND  c2.ClaimStatus = 'ACTIVE'
            ) THEN 1 ELSE 0 END       AS shared_phone_active_claim
    FROM dbo.Claims c1
)
SELECT
    c.ClaimID, c.PolicyID, c.ClaimantID, c.ClaimDate,
    ch.claims_last_90d,
    pb.provider_rate_ratio,
    pf.prior_fraud_confirmed                      AS provider_prior_fraud_flag,
    DATEDIFF(day, pol.InceptionDate, c.ClaimDate) AS inception_to_claim_days,
    n.shared_phone_active_claim,
    DATENAME(dw, c.AccidentDate)                 AS accident_weekday,
    c.ClaimAmount / NULLIF(pol.SubLimit, 0)      AS claim_to_sublimit_ratio
FROM            dbo.Claims              c
INNER JOIN      dbo.Policies           pol ON c.PolicyID = pol.PolicyID
INNER JOIN      claim_history          ch  ON c.ClaimantID = ch.ClaimantID
LEFT  JOIN      provider_benchmark     pb  ON c.ProviderID = pb.ProviderID
LEFT  JOIN      dbo.ProviderFraudList  pf  ON c.ProviderID = pf.ProviderID
INNER JOIN      network                n   ON c.ClaimID   = n.ClaimID
WHERE c.ClaimStatus IN ('PENDING', 'UNDER_REVIEW');
-- Row-Level Security: restrict to investigator's region
Investigator Security Filter =
USERPRINCIPALNAME() = RELATED( Investigators[Email] )
  || ISEMPTY(
       FILTER( Investigators, Investigators[Role] = "SIU_MANAGER"
               && Investigators[Email] = USERPRINCIPALNAME() )
     ) = FALSE

-- High-risk claim count in current period
High Risk Claims =
CALCULATE(
    COUNTROWS( FraudScores ),
    FraudScores[FraudTier] = "HIGH"
)

-- Confirmed fraud rate %
Confirmed Fraud Rate =
DIVIDE(
    CALCULATE( COUNTROWS( FraudScores ),
                FraudScores[Outcome] = "CONFIRMED_FRAUD" ),
    CALCULATE( COUNTROWS( FraudScores ),
                FraudScores[FraudTier] IN { "HIGH", "MEDIUM" } ),
    0
) * 100

-- Rolling False Positive Rate (3-month)
FP Rate (3M Rolling) =
VAR flaggedWindow =
    CALCULATE( COUNTROWS(FraudScores),
                FraudScores[FraudTier] <> "LOW",
                DATESINPERIOD( DateTable[Date], LASTDATE(DateTable[Date]), -3, MONTH) )
VAR fpWindow =
    CALCULATE( COUNTROWS(FraudScores),
                FraudScores[Outcome] = "CLEARED",
                DATESINPERIOD( DateTable[Date], LASTDATE(DateTable[Date]), -3, MONTH) )
RETURN DIVIDE( fpWindow, flaggedWindow, 0 ) * 100

Top Fraud Detection Rules Performance Summary

RuleCategoryMonthly HitsPrecision %Confirmed Fraud %Status
Provider prior fraud flagProvider3882%74%Active
Shared phone linkageNetwork5571%63%Active
Claim on inception dayClaimant1278%71%Active
Multiple claims in 90 daysClaimant7454%46%Active
Provider billing > 2x benchmarkProvider4961%52%Active
Claim amount > 1.5x sub-limitCharacteristic3148%38%Active
Address change 30d before claimClaimant2244%35%Active
Weekend accident spike ruleCharacteristic11019%14%Tuned ↓