
Enterprise data pipeline work that turns high-volume AI call data into real-time reporting and operational insight.
Mon Oct 28 2024
Full-stack platform engineer
ETL architecture, analytics APIs, dashboard data models, and operational reporting.
Powers reporting workflows over hundreds of thousands of daily calls for enterprise financial-services customers.
During my time at Salient, I've architected and built ETL pipelines that process hundreds of thousands of calls daily from major US banks. These systems transform raw communication data into actionable business intelligence, powering real-time analytics dashboards used by enterprise clients.
The challenge? Processing massive volumes of unstructured data (call transcripts, metadata, customer interactions) and transforming it into structured, queryable datasets that business teams can use for decision-making.
Key Components:
Real-world Scale:
Below is an interactive simulation of how our ETL system works. Watch as raw data flows in at random intervals, gets processed by our ETL engine every few seconds, and transforms into aggregated insights.
When you're dealing with financial institutions processing thousands of customer calls daily, every piece of data matters. The raw call data comes in various formats:
Here's how we transform raw call data into business insights:
Raw Input:
{
"call_id": "call_12345",
"duration": 420,
"transcript": "Hello, I need help with my account...",
"sentiment_score": 0.7,
"resolution": "resolved",
"timestamp": "2024-10-28T10:30:00Z"
}
Transformed Output:
{
"hour_bucket": "2024-10-28T10:00:00Z",
"total_calls": 127,
"avg_duration": 380,
"resolution_rate": 0.85,
"avg_sentiment": 0.72,
"escalation_rate": 0.12
}
Instead of processing each record individually, we batch data into 5-minute windows. This provides the perfect balance between real-time insights and system efficiency.
Every ETL job can be re-run safely. Critical for financial data where accuracy is non-negotiable.
Built with flexible schemas that can adapt as business requirements change without breaking existing pipelines.
Comprehensive observability with custom metrics tracking data quality, processing latency, and system health.
This ETL system powers dashboards used by:
The data pipeline processes millions of data points daily, providing real-time insights that directly impact business decisions for some of the largest financial institutions in the US.
Building ETL pipelines at this scale taught me the importance of: