10 min read

System Architecture Guide: Real-Time Lead Scoring Pipeline

WorkflowIntermediateCDPanalytics-database

System Overview

This document outlines the architecture my team and I have designed for a real-time lead scoring system. We are building this guide to serve as the single source of truth for its implementation, operation, and maintenance. The primary objective is to move beyond static, batch-processed scores and provide the sales team with dynamic scores that reflect user actions within minutes. Traditional lead scoring models, which often refresh only daily or weekly, leave a critical gap between a prospect’s demonstrated intent and our sales team’s ability to act on it. This system closes that gap.

The system ingests user data from multiple sources via Segment, our Customer Data Platform (CDP). This includes front-end web activity, form submissions, and engagement data from our email marketing platform. All this data is centralized in our Snowflake analytics database, which serves as the computational engine for the scoring model. Once scores are calculated, we use a Reverse ETL service to push the results back to our Salesforce CRM.

This architecture ensures that our sales organization can act on the most engaged leads immediately. When a prospect visits our pricing page or attends a webinar, their score will update in Salesforce in near real-time, triggering an alert for the assigned sales representative. This immediacy is critical for increasing the velocity of our sales pipeline and ensuring we connect with potential customers at the peak of their interest.

Prerequisites and System Requirements

Before my team and I begin the implementation, we must ensure all foundational components are in place. Failure to meet these requirements will cause significant delays and potential rework. We have established the following five prerequisites as mandatory for project kickoff.

First, we require administrative access to our core systems: Segment, Snowflake, and Salesforce. This level of access is non-negotiable. It is necessary for creating API credentials, installing new integrations, defining data schemas, and configuring the automation workflows that are central to this system's function.

Second, we must have a clearly defined and agreed upon scoring model. We will not write a single line of code until the business logic is finalized with sales and marketing leadership. We must document which user actions correspond to specific point values. This includes both positive actions like page views and form submissions, and negative actions such as long periods of inactivity. This documentation serves as our blueprint for the SQL transformation logic. For further reading, we've found several guides on B2B lead scoring best practices to be valuable.

Third, Segment must be properly instrumented on all web properties. This means the Analytics.js library is correctly installed and configured to capture all required user traits and events. My team will conduct an audit to confirm that we are tracking high-value activities such as Pricing Page Viewed, Demo Requested, and Webinar Attended as distinct events. A comprehensive tracking plan is the foundation of an accurate scoring model. You can find the quickstart guide here.

Fourth, all necessary service accounts and API credentials must be provisioned. This is a critical security and operational step. We will create a dedicated user and role in Snowflake specifically for the Segment data loading process, following the official Segment-to-Snowflake documentation. Additionally, we will create a dedicated API user in Salesforce for our Reverse ETL service. This practice of using dedicated service accounts isolates permissions and makes future troubleshooting much simpler.

Finally, a consistent user identifier must be present across all data sources. The integrity of our scoring model depends on our ability to stitch together a user's journey. We have standardized on a lowercased email address as this identifier. It must be consistently passed as the userId in all Segment identify calls. This ensures that a user's web activity, email clicks, and CRM record can all be joined accurately in the analytics database.

Implementation: An 8-Step Execution Plan

I have broken the implementation of this system into eight distinct steps. My team will execute these steps sequentially to ensure a smooth and logical build-out, from defining the business rules to activating the final automation in our CRM.

Step 1: Formalize the Scoring Logic

Before any technical work begins, I will work directly with marketing and sales stakeholders to finalize the points system. This is a business-critical decision. We will define explicit positive scores for engagement events, such as +20 points for webinar attendance, +15 for a pricing page visit, and +30 for a demo request. We will also implement score degradation to ensure pipeline freshness. For instance, we will apply a negative score of -10 after 30 days without any engagement. This decay mechanism prevents stale leads from cluttering the queue and allows sales to focus on currently active prospects. The complete model will be documented and signed off on before proceeding. You can find more information on lead scoring best practices and defining the lead scoring process in these industry guides.

Step 2: Configure CDP Data Ingestion

Next, my team will verify that all necessary data sources are correctly configured in Segment. This involves auditing our website tracking via the Segment Analytics.js library and confirming that our email events are flowing correctly from Mailchimp via a configured webhook source. We will use Segment's real-time event debugger to inspect the payload of incoming track and identify calls, ensuring they conform to our tracking plan and contain the necessary properties for our scoring model. A clean, structured data stream is essential for the downstream SQL model to function correctly.

Step 3: Establish the Data Warehouse Sink

With data flowing cleanly into Segment, we will configure the Snowflake destination. My team will set up Segment to forward all incoming event data and user profiles to a dedicated SEGMENT_EVENTS schema within our Snowflake database. As specified in the Segment documentation, this configuration requires creating a specific user and role within Snowflake that has the necessary permissions to create tables and load data. This one-time setup establishes the raw data foundation upon which our entire scoring model will be built.

Step 4: Construct the Scoring Model in SQL

This is where the raw data becomes actionable intelligence. I will write a SQL transformation using dbt to calculate the lead scores. Critically, this will be built as an dbt incremental model. This approach is vastly more efficient and cost-effective than running a full table scan on every run. The model will only process new user events that have arrived since the last time the model ran, dramatically reducing computation time. The SQL logic will join user identify calls with their track events, sum the point values based on the logic from Step 1, and materialize the final scores into a clean, queryable view called dim_lead_scores.

Step 5: Deploy a Reverse ETL Service

Once our scores are calculated and available in Snowflake, we need a mechanism to push them back into our operational systems. For this, we will connect Hightouch, a Reverse ETL service, to our Snowflake analytics database. The configuration is straightforward. We will provide Hightouch with credentials to our Snowflake instance, granting it read-only access to the dim_lead_scores view. This adheres to the principle of least privilege and ensures the service can only read the data it needs to sync.

Step 6: Configure the CRM Data Sync

Within Hightouch, I will create a new sync job that targets our Salesforce destination. This job's source will be a simple query: SELECT * FROM dim_lead_scores. I will configure the sync to run every 15 minutes. This frequency represents a balance between our goal of near real-time updates and the need to respect Salesforce's rolling 24-hour API limit. Syncing too frequently on a large dataset could exhaust our API quota, disrupting other critical business operations. The 15-minute interval ensures timeliness without creating undue system strain.

Step 7: Map Fields and Activate the Sync

With the sync configured, the final step in Hightouch is to map the data fields. I will map the calculated_score column from our Snowflake view to the custom Real_Time_Lead_Score__c field we have created on the Lead and Contact objects in Salesforce. The sync will be configured to use the user's email address as the primary external ID for matching records. This allows Hightouch to perform an 'upsert' operation: if a matching record exists, it will be updated with the new score; if not, no action will be taken. We will use the official Hightouch documentation for Salesforce as a guide for this mapping process.

Step 8: Build Automation in the CRM

Finally, we will operationalize the score within Salesforce. The CRM administrator on my team will build a record-triggered Flow. The Flow will trigger whenever a Lead or Contact record is updated and the Real_Time_Lead_Score__c field's value crosses our 'sales-ready' threshold of 75 points. Because this automation needs to create a related Task record assigned to the lead owner, it must be configured as an 'Actions and Related Records' flow that runs after the record is saved. This final step connects the entire data pipeline to a concrete sales action, notifying the team to engage with a highly qualified lead immediately.

System Components and Tools

To build this system, my team and I have selected a specific set of tools, each chosen for its capabilities and role within the modern data stack.

  • Customer Data Platform (CDP): We will use Segment. Its robust developer tools, extensive catalog of integrations, and reliable data pipelines make it the ideal choice for collecting and routing customer data from all our sources.

  • Analytics Database: Our choice is Snowflake. Its architecture, which separates storage and compute, provides cost-effective scalability. This allows us to handle large volumes of event data for our modeling without incurring prohibitive costs.

  • Reverse ETL Service: We have selected Hightouch. It provides a visual data mapper that simplifies the process of configuring syncs to Salesforce. Furthermore, its detailed sync logs are crucial for operational monitoring and troubleshooting, giving us clear visibility into the health of our data pipeline.

  • Customer Relationship Management (CRM): The destination system is Salesforce Sales Cloud. It is the established system of record for our sales organization, and enriching it with real-time scores directly within the sales team's primary workspace is the ultimate goal of this project.

Troubleshooting Common Issues

Even in a well-designed system, issues can arise. I have outlined the primary failure modes we anticipate and the precise, step-by-step diagnostic process my team will follow to resolve them.

Issue: Scores in Salesforce are stale or not updating.

  • Solution: My first step will be to check the Hightouch sync history for the specific Salesforce sync job. I will look for any API errors, particularly messages like API_LIMIT_EXCEEDED, which would indicate a problem with our Salesforce API consumption. If the Hightouch logs show no sync errors, the issue lies upstream. I will then inspect the run history of the dbt incremental model in our dbt Cloud dashboard. A failed or delayed dbt run is the most likely culprit if the Reverse ETL service is functioning correctly. This process follows the data flow backward to isolate the point of failure.

Issue: Scores are not being attached to the correct lead records.

  • Solution: This is almost always an identifier mismatch problem. I will first verify that our dbt transformation logic is explicitly lowercasing the user's email address before it is materialized in the final scoring view. Case-sensitivity issues are a common source of failed joins. Second, I will double-check the configuration in Hightouch. I will confirm the sync is correctly set to use the email field as the external ID for matching records in Salesforce. These two checks cover the most common causes of record-matching failures.

Issue: The calculated score value appears incorrect.

  • Solution: When the score itself is wrong, the problem is in the transformation logic. To debug this, I will isolate a specific userId that has an incorrect score. I will query the raw SEGMENT_EVENTS tables in Snowflake for that user, pulling all their associated track events. Then, I will manually calculate what their score should be based on the documented scoring logic. Finally, I will compare my manual calculation to the output of my dbt model's compiled SQL, which can be found in the target/ directory of our dbt project. This systematic process allows me to determine if the issue is with the incoming raw data or a flaw in the SQL transformation itself. Debugging dbt errors is a well-documented process my team is prepared to handle.

Expected Results and Success Metrics

Upon successful completion of this project, we expect to see several direct, measurable outcomes that will confirm the system is delivering on its promise. These results bridge the technical implementation with tangible business value.

The primary technical result will be a custom field, Real_Time_Lead_Score__c, that is populated and actively maintained on all Lead and Contact records in Salesforce. This field will serve as the single source of truth for a lead's current engagement level. We are holding ourselves to a strict performance standard: the end-to-end latency between a user's action on our website and the corresponding score update in Salesforce will be consistently under 15 minutes.

From an operational standpoint, sales team members will receive automated Salesforce task notifications for leads that cross the 'sales-ready' score threshold of 75. This automation eliminates the need for manual list-checking and enables immediate, informed follow-up when a lead is most receptive.

Ultimately, we will measure the success of this system by tracking two key performance indicators (KPIs) that directly impact revenue. First, we will measure the 'time to first contact' for sales-qualified leads, which we expect to decrease significantly. Second, we will track our 'lead to opportunity' conversion rate. By enabling our sales team to focus on the most engaged leads at the right time, we anticipate a measurable increase in this critical conversion metric. These KPIs, as outlined in lead scoring best practices, are the ultimate arbiters of the project's success.

Related Content