top of page

Marketing Campaign Analytics Multi-Touch Attribution Modeling and Insight Development

This project focuses on optimizing marketing campaigns using mult-touch attribution modeling. We constructed analytical relationship clusters and identified patterns and trends across this wide swath of marketing campaign sentiment feedback to answer the following business questions. 

​

Primary Question: What features most accurately predict campaign success, measured by CTR (Click-Through Rate) and ROI?

​

Secondary Questions:

  • Which audience segments respond most to CTAs?

  • What channels drive the highest ROI?

  • How can we cluster audiences for personalized targeting?

Project Summary

What We Tested

We built machine learning models to understand what makes a marketing campaign successful. Specifically, what factors most accurately predict Click-Through Rate (CTR) and Return on Investment (ROI)? We tested whether different audience segments, campaign traits (like cost and engagement), and marketing channels (like Email or YouTube) could help us predict and improve performance.

What We Did

We used two powerful machine learning models: Random Forest and XGBoost, to analyze over 330,000 rows of campaign data. These models looked at patterns between features like Clicks, Engagement Score, Acquisition Cost, and Target Audience to predict outcomes like CTR. We also ran cluster analysis to group audience segments with similar behaviors and performance profiles, helping us personalize future strategies.

Why We Did It

Marketing teams often struggle to know what works and why. By predicting performance and uncovering which features drive results, we give business teams data-backed recommendations on:

​

  • Which audiences to target

  • What types of campaigns to run

  • How to allocate marketing budget efficiently

  • Where to optimize creative and messaging

 

This helps turn raw data into strategic insights that improve future

marketing performance, boost ROI, and reduce wasted spend.

Gemini_Generated_Image_ajeichajeichajei.png
Gemini_Generated_Image_ajeichajeichajei.png

About The Data

Our data is a marketing performance dataset that represents real-world scenarios where businesses want to learn about, analyze, and optimize marketing spend based on conversions and effectiveness of marketing campaigns measured across 14 unqiue qualatative and quantitative variables.

Modeling Approach

Firefly_A hand guiding particles of data

Random Forest (via ranger)

The Random Forest model is an ensemble of tree-based models that are well-suited for structured tabular data with both numeric and categorical variables. We selected the Ranger implementation due to:

​

  • Speed: It is highly optimized for large datasets and allows for parallel processing.

  • Robustness: It handles noise, missing values, and irrelevant features better than simpler linear models.

  • Interpretability: Variable importance metrics provide clear insights into feature influence, which is crucial for marketing attribution discussions.

 

Use Case Fit: Random Forest works well when the signal is nonlinear and subtle. This is ideal for modeling complex user behavior and campaign performance such as our analytics data where there may be interaction effects between audiences, channels, and content types.

XGBoost

XGBoost is a gradient boosting framework that builds an ensemble of weak learners (trees) in a stage-wise manner, optimizing for predictive performance. We included XGBoost for its:

​

  • Superior accuracy on structured prediction tasks.

  • Built-in feature importance scoring, which helps with interpreting business drivers.

  • Tolerance for unbalanced data and robustness to multicollinearity.

​

Use Case Fit: XGBoost is excellent for predictive tasks in marketing where micro-patterns in user engagement (e.g., subtle differences in channel preference or demographic responsiveness) need to be captured, in order to build predictive recommendations to optimize future campaigns.

Other Models

Linear Regression

KNN (K-Nearest Neighbors)

Logistic Regression (for classification)

Deep Learning (e.g., Neural Nets)

Support Vector Regression

Reason for Rejection / Limited Use

Interpretable but insufficient due to nonlinearity and interaction terms.

Computationally expensive at scale; long runtimes, lacks explainability for this business use-case

Not suited for continuous targets like CTR.

Overkill for structured data; poor interpretability for business users.

Difficult to scale efficiently with 338K rows and less interpretable.

Firefly_A stream of data particles of data from messy marketing data into clear insights

Data Wrangling

  • Source: marketing_campaign_dataset.csv (338K rows, 16 columns)

  • Key Preprocessing Steps:

    • Converted Acquisition_Cost from character to numeric.

    • Removed rows with NA in key columns (Conversion_Rate, Clicks, Engagement_Score, etc.).

    • Scaled numeric variables.

    • Encoded categorical variables as needed.

Variable Selection

  • Top Predictors Chosen for Modeling:

    • Clicks

    • Impressions

    • Engagement_Score

    • Acquisition_Cost

    • Channel_Used_Email (binary dummy)

    • Target_Audience_Men 18-24 (binary dummy)

  • Rationale:

    • Clicks and Impressions are direct components of CTR.

    • Engagement score acts as a behavioral proxy.

    • Acquisition cost signals economic efficiency.

Firefly_A stream of data particles of data from messy marketing data into clear insights
Firefly_A stream of data particles of data from messy marketing data into clear insights

Feature Importance (From varImp())

  • Clicks and Impressions were dominant indicators of CTR — not surprising, since CTR = Clicks / Impressions.

  • Engagement_Score was also highly influential, showing a strong correlation between how engaged users were and their likelihood to click on a CTA.

  • Among categorical features:

    • Email campaigns performed better than other channels on average.

    • The “Men 18-24” segment had one of the highest average CTRs, indicating stronger responsiveness to certain campaign types.

YouTube and Google Ads also showed above-average performance but slightly less efficient ROI per click.

Derived Features that Added Value

  • CTR as a computed target gave better insight than raw Clicks or Impressions.

  • ROI per Click added context on spend efficiency.

​

Duration_Days and Is_Weekend helped detect temporal trends in engagement.

Firefly_A stream of data particles of data from messy marketing data into clear insights

Cluster Model Output Intepretations

Cluster 1: Low Engagement, Average ROI
  • Engagement is the lowest of all clusters (-0.947) — indicating users in this segment are disengaged.

  • ROI is average (5.01) despite the low engagement, suggesting that cost efficiency or click volume still balances out performance.

  • Clicks are slightly below average.

  • Strategy: Focus on re-engagement tactics — perhaps the content isn't resonating, or delivery timing is off. A/B test messaging or try new creatives.

​

Cluster 2: High ROI, High Clicks
  • ROI is the highest at 6.65.

  • Clicks are also the highest (+0.0158) among all segments.

  • Engagement is neutral.

  • Strategy: This is your star cluster — the campaigns here are efficient and drive strong traffic. Consider increasing budget or replicating the campaign style to similar audiences.

​

Cluster 3: Lowest ROI and Lowest Clicks
  • ROI is the lowest at 3.35.

  • Clicks are slightly below average.

  • Engagement is low, but not as bad as Cluster 1.

  • Strategy: These campaigns are underperforming on both engagement and returns. Consider:

    • Retargeting with better creatives

    • Dropping unresponsive audience segments

    • Reallocating spend away from this group

​

Cluster 4: High Engagement, Average ROI
  • Engagement is the highest (+0.956) of all clusters.

  • ROI is average (5.01).

  • Clicks are above average but modest.

Strategy: These users are engaged but not converting into ROI effectively. You likely need to refine CTAs, optimize landing pages, or shorten the conversion funnel to capture value.

Firefly_A stream of data particles of data from messy marketing data into clear insights
Firefly_A stream of data particles of data from messy marketing data into clear insights

Insights And Interpretations

What Did We Learn?​

  • Clicks and Engagement Drive Success
    Campaigns with higher click volumes and stronger engagement (likes, shares, comments) consistently led to higher CTR. Impressions alone had no predictive power—visibility isn’t enough; user interaction matters.

​

  • Top Audience Segments Are Clear
    Men aged 18–24 and Tech Enthusiasts stood out with the highest predicted CTRs and engagement scores. These segments are highly responsive to compelling CTAs and digital content.

​

  • Email and Website Deliver Results
    While most channels performed similarly on CTR, Email and Website consistently yielded better ROI. Instagram showed weaker ROI despite similar engagement rates, suggesting it may be less efficient for conversion.

  • Cluster 2 Is the MVP
    One audience cluster stood out: high clicks, high ROI, strong performance across the board. This group is your top priority for scaling. On the other hand, Cluster 3 had low performance on all fronts—these campaigns likely need rework or divestment.

​

  • Low R², But High Value
    Although the models explained very little variance in CTR (low R²), they still identified clear patterns and directional signals. This suggests more granular data (like session behavior or creative type) is needed to improve future models.

Strategic Recommendations

Creative & Messaging

  • Run A/B tests on CTAs, headlines, and ad visuals to drive more clicks.

  • Prioritize content that encourages engagement—polls, visuals, and interactive formats.

Audience Targeting

  • Double down on Men 18–24 and Tech Enthusiasts.

  • Avoid generic “All Ages” segments; personalization clearly pays off.

Channel Strategy

  • Prioritize Email and Website for performance-driven campaigns.

  • Re-evaluate lower ROI platforms like Instagram unless brand awareness is the goal.

Budget Allocation

  • Funnel investment toward high-CTR, low-cost segments (Cluster 2).

  • Cut or redesign underperforming campaigns targeting Cluster 3.

Measurement & Attribution

  • Move beyond impressions as a core KPI—track CTR, engagement, and ROI.

  • Begin incorporating user-level data (e.g., time of day, device type, referral source) to improve attribution accuracy.

Next Steps

1. Build Cluster-Specific Campaign Playbooks
Tailor creatives, CTAs, and channel mixes to each audience cluster.

 

2. ​Deploy in a Dashboard
Visualize model predictions, audience segments, and ROI patterns in a Tableau or Shiny dashboard for internal use.

​

3. Expand the Dataset
Integrate behavioral and creative-level metadata—these likely explain the missing variance in CTR.

​

4. Experiment with Uplift Models
Test incrementality to see which users respond because of the campaign, not just correlation.

​

5. Track Real Campaign Outcomes
Use model predictions as benchmarks and compare them to actual campaign CTRs to validate impact over time.

© 2025 by James Moy. Powered and secured by Wix

bottom of page