This is not an officially supported Google product.
This folder contains Python Notebook Templates for building an end-to-end LTV Modeling solution using datasets like GA360, Firebase or CRM (client provided), and using the LTV predictions to design, activate and measure the impact of a Google media campaigns. These Notebook Templates rely on the Python modules open sourced in gps_building_blocks.
An LTV Model helps to predict the value (revenue, profit, subscription period, etc.) of a customer during a well-defined time window in the future (called prediction window) with a customer based on historical data captured during a well-defined time window in the past (called lookback window).
- The predicted LTV values can be used to make data-driven decisions, such as improving remarketing or acquisition campaigns, in order to optimize key business objectives (sales, revenue etc.).
- The insights extracted from an LTV model are helpful to understand key ‘drivers’ that are highly correlated with the customer value, some of which would lead the business to take useful actions (called actionable insights).
In order to build an LTV Model, one needs to prepare a dataset (CRM, GA360 or Firebase) containing customers’ past data such as demographics, transactional or browsing behaviour over time. It is recommended to use at least one year of data (ideally 2 years) to capture seasonal patterns of the user behaviour.
Example:
GA360 dataset exported to BigQuery that captures customers’ online behaviour on designated website can be used together with CRM data for LTV Modeling. The dataset format and BigQuery export instructions of a public GA360 dataset can be found here.
In order to run notebooks, user needs to have following permissions on GCP:
- BigQuery User
- Notebooks Legacy Admin
- AutoML Editor
- Service Usage Admin
- Service Account User
For more details refer to https://cloud.google.com/iam/docs/understanding-roles.
problem?
The LTV Modeling is generally formulated as a regression ML problem.
For each user id
a corresponding LTV score
is generated and this can be used
to rank or segment the customers for more personalized marketing as explained
towards the end of this document.
Assuming the data (CRM, Firebase or GA360) is already available as a BigQuery table, the following steps are involved in building and activating an end-to-end LTV Modeling solution to drive a Google Marketing use case.
- Data audit and exploratory data analysis - 01.eda.ipynb.
- ML data preparation - 02.ml_data_preparation.ipynb.
- ML data preprocessing - 03.ml_data_preprocessing.ipynb.
- ML model training - 04.model_training.ipynb.
- ML model evaluation and diagnostics - 05.model_evaluation_and_diagnostics.ipynb.
- Media experiment design - 06.media_experiment_design.ipynb.
- Batch scoring - 07.batch_scoring.ipynb.
- Audience generation - 08.audience_generation.ipynb.
- Audience upload - 09.audience_generation.ipynb.
- Post-campaign Analysis - 10.post_campaign_analysis.ipynb.
- Automated scoring and media activation - [Notebook - WIP].
- Cleanup BigQuery artifacts - 12.cleanup.ipynb.
The following sections provide details of each step.
Notebook - 01.eda.ipynb.
This step analyses the original input data at a high level to make sure that the variables and values available for the required time period to create an ML model to solve the business problem at hand.
This involves the following steps:
- Extraction of the dataset schema and field descriptions
- Explore data size and duration of the GA data tables
Notebook: 02.ml_data_preparation.ipynb.
Creation of an ML dataset from customer behaviour data such as GA360, Firebase or CRM involves:
- First creating a single data snapshot of users with respect to a given
calendar date d. This snapshot consists of:
- Instances: for example, all the users who have done some action in a website up until the date d.
- Features: for each selected instance, aggregated behavior in a well-defined time period in the past from day d called lookback window.
- Labels: for each selected instance, the value we would like to predict (e.g. net profit) in a well-defined time period into the future from the day d called prediction window.
- Second, generating a series of such snapshots over time to capture recency, frequency and changing behaviour of users, seasonality and other trends/events over time. This is vital in a period like Covid-19 to capture changing user behaviour which is also known as the Concept Drift. Also, with multiple snapshots, we would be able to generate more data for our ML model with limited original data.
The quality of an ML model greatly depends on the quality of the input data. Since different datasets contain different data issues such as missing, incorrect and inconsistent data, it is vital to do a deep data exploration and select the most consistent and meaningful variables/row data to create an ML dataset.
ML Windowing Pipeline (MLWP) and ML Data Visualizer modules can be used in order to create an accurate and rich ML dataset efficiently.
Creation of ML datasets with multiple snapshots usually takes about 80% of the project time (the major bottleneck of the ML model building process). Use of MLWP together with ML Data Visualizer module can help to reduce this time from 2-3 weeks to 1-2 days.
MLWP creates an ML dataset by taking multiple data snapshots over time in a very fast way. It has been built to run on BigQuery and the input data is expected to be available as a BigQuery table. The developer can simply specify the time-related inputs (e.g. starting and ending dates of the snapshots and sizes of the lookback, prediction and sliding windows), variable names and aggregate functions to generate features and labels.
Data Visualizer module automatically generates plots visualizing the quality and consistency of raw input data to help selecting right variables to generate the ML dataset and also visualize the generated ML data to further identify issues such as label leakage.
This step consists of running the following sub steps as implemented in the 02.ml_data_preparation.ipynb:
- 2.1. MLWP Data Extraction pipeline: extracts and formats the original data from the BigQuery table into several temporary tables for further processing.
- 2.2: MLWP Exploration Pipeline: outputs facts (original variables converted into user_id, time_stamp, variable and value format, called facts) and ML instances into BigQuery tables for data exploration and analysis.
- 2.3.1: Data Viz Visualize Instances: generates the following plots on ML
instances:
- plots with the number of total instances, number of positive instances and proportion of positive instances for each snapshot. These plots are helpful to understand how the label is distributed over time, any seasonality and trends, and whether there are any inconsistencies. Based on this we can drop specific periods of snapshots having any data issues and consider what additional features to add to capture the seasonality or any trends of the label over time.
- class specific distribution plots for the days_since_first_activity (corresponds to tenure) and days_since_latest_activity (corresponds to recency) features in the Instance table. From these plots, we can determine a reasonable lookback window period and to reason whether it’s worth only using customers having a particular history and recency for modeling.
- 2.3.2. Data Viz Visualize Facts: generates plots of numerical and categorical fact variables, which can be used to explore their validity and distribution over time. Based on that we can make decisions such as which facts variables (and which levels in categorical fact variables) to use to generate features in the remaining steps.
- 2.4: MLWP Data Windowing Pipeline: segments the user data into multiple, potentially overlapping time windows, with each window containing a lookback window and a prediction window.
- 2.5: Run Feature Generation Pipeline: generates features from the windows of data computed in the previous step and outputs to a table in BigQuery.
- 2.6: Data Viz Visualize Features: visualizes the statistics calculated from the Features table in BigQuery. The plots include class-specific distribution plots of numerical and categorical features, which can be used to explore the validity of the features and potentially identify issues such as label leakage, and the distribution of the features over time helping to understand the consistency. Based on these insights the developer can select only the valid features for training the model.
Notebook: 03.ml_data_preprocessing.ipynb.
This notebook demonstrates the preparation of an already created ML dataset for model development. It is vital to split machine learning datasets in such a way that the model performance can be tuned and fairly assessed. This notebook shows an example of dividing a dataset into out-of-time TEST dataset (including selected full snapshot/s) and DEVELOPMENT dataset (randomly splitting the rest of the snapshots into TRAIN,VALIDATION and TEST). Those names are designed to be directly used in the AUTOML DATA_SPLIT_COL.
Once we are happy with the extracted ML-ready dataset, it can be separated into Training, Validation and Testing datasets as shown in the following diagram:
The ML examples extracted from the model development period are divided into Training (e.g. 80% of instances), Validation (e.g 10% of instances) and Testing (in-time) (e.g. 10% instances) partitions, such that each of these data partition contain a mutually exclusive random subset of instances. We called this dataset as Model Development dataset. In addition to the in-time Testing dataset, an out-of-time Testing dataset is also created to specially test the model performance in recent times.
Notebook: 04.model_training.ipynb.
Creating an ML model generally can be a complicated, iterative and long process due to the algorithm selection and hyper-parameter tuning steps. However, in this solution, we use GCP BQML tool with the AutoML models option (by default), which uses GCP AutoML to automatically handles feature preprocessing and hyper-parameter tuning for multiple algorithms in parallel to train the best model with respect to a given performance metric such as RMSE.
We use Model Development dataset to crested in the previous step to develop the model with the Training, Validation and Testing partitions. AutoML uses the Training and Validation partitions for hyper-parameter tuning and algorithm selection, and reports the final results on the Testing partition with the final model trained on the whole Model development dataset.
Notebook: 05.model_evaluation_and_diagnostics.ipynb.
It is important to diagnose a model thoroughly to make sure it’s reasonable (a single performance metric such as RMSE does not give the full picture of the model) and to generate new business insights for the client. This step uses the Regression Diagnostics module to generate a variety of stats and plots helping to understand the performance for different LTV segments, diagnose the model for issues such as label leakages and generate business insights.
The main diagnostics generated by this step as follows:
- Mean squared error: a risk metric corresponding to the expected value of the squared error, calculated by taking the mean of squared error.
- Root mean squared error: the root value of the mean squared error.
- Mean absolute error: a risk metric corresponding to the expected of the absolute value, calculated by taking the mean of absolute error.
- Mean absolute percentage error: an evaluation metric for regression problems, sensitive to relative errors, calculated by taking the mean of the absolute percentage error.
- R-squared: coefficient of determination, representing the proportion of variance that has been explained by the independent variables in the model.
- Pearson correlation: a correlation metric between actual and predicted labels.
Notebook: 06.media_experiment_design.ipynb.
This step helps to design a statistically sound media experiment to activate the developed LTV model in order to optimize a Google media campaign.
One way to use the output from a LTV Model to optimize marketing is to first define different audience groups based on the predicted LTV values (such as High, Medium and Low LTV groups) and then test the same or different marketing strategies with those. This strategy is more useful to understand how different LTV groups respond to remarketing campaigns.
This step estimates the statistical sample sizes required for different groups (bins) of the predicted LTV based on different combinations of the expected minimum uplift/effect size, statistical power and statistical confidence levels specified as input parameters by using statistical T-test.
Expected output: a Pandas Dataframe containing statistical sample size for each bin for each combination of minimum uplift_percentage, statistical power and statistical confidence level.
Based on the estimated sample sizes and the available group sizes, one can decide what setting (expected minimum uplift/effect size at a given statistical power and a confidence level) to be selected for the experiment. Then the selected sample sizes could be used to set Test and Control cohorts from each LTV group to implement the media experiment.
Another way to use the output from an LTV Model to optimize marketing is to target the top X% of users having the highest predicted LTV in a remarketing campaign, or an acquisition campaigns with the similar audience strategy.
This step estimates the statistical sample sizes required for different cumulative groups (bins) of the predicted LTV (top X%, top 2X% and so on) based on different combinations of the expected minimum uplift/effect size, statistical power and statistical confidence levels specified as input parameters by using statistical T-test.
Expected output: a Pandas Dataframe containing statistical sample size for each cumulative bin for each combination of minimum uplift_percentage, statistical power and statistical confidence level.
Based on the estimated sample sizes and the available group sizes one can decide what setting (what top X% of users with the expected minimum uplift/effect size at a given statistical power and a confidence level) to be selected for the experiment. Then the selected sample size could be used to set Test and Control cohorts from the top X% to implement the media experiment.
Notebook: 07.batch_scoring.ipynb.
In this step we used the developed LTV model to score new ML instances to predict customers' LTV. This step consists of the following sub steps:
- Create the ML instances and features for scoring using the MLWP
- Score the created instances using the developed model and uploaded the predictions into a table in GCP BigQuery
Notebook: 08.audience_generation.ipynb.
This step generates audience for a remarketing use case based on the predicted LTV values. It relies on the sample size calculations from the 6.Media experiment design notebook to create the Test and Control audiences which are written to a new BigQuery table. This data can then be uploaded via measurement protocol to GA and used for the activation with the Google Ads products as demonstrated in 9.audience_upload.ipynb notebook notebook.
Notebook: 09.audience_upload.ipynb.
This step uses GMP and Google Ads Connector to upload the created LTV audience into Google Marketing Platform.
The uploaded audiences could be activated as acquisition or remarketing (in this case) campaigns in the selected Google Ad product (Search, Display and YT). The control groups generated and saved in BigQuery to be used at the analysis stage.
Notebook: 10.post_campaign_analysis.ipynb.
This step analyses the results of a media campaign executed by using LTV audiences, that is, the comparison of conversion rates between Test and Control audience groups by using the appropriate statistical significance tests.
TODO(): Add content when the Notebook is submitted.
This step helps to clean up interim tables generated while executing notebooks from 01 to 09.