A containerized experimentation platform built to monitor online controlled experiments learned under contextual bandit policies in real-time. Received Honorable Mention in 2023 Docker AI/ML Hackathon.
View Demo
·
View Devpost Submission
·
Report Bug
·
Request Feature
Table of Contents
The Containerized Online Bandit Experimentation (COBE) Platform is built to monitor the performance of online controlled experiments learned under contextual bandit policies in real-time. The COBE Platform seeks to address the issues that standard A/B Testing is unable to resolve, including the following:
- What if the chosen variation during the rollout phase of the experimentation process degrades in performance over time?
- Will personalizing the choice of variation for each user successfully optimize the targeted metric?
- Is there a faster way to identify better performing variations at a lower opportunity cost?
Many companies with an experimentation-first culture can highly benefit from utilizing online controlled experiments to improve their experimentation strategies by adjusting and optimizing future decisions based on the data collected from each observation. For example, Stitch Fix uses multi-armed bandits in their experimentation platform to support the implementation of various bandit policies, allowing data scientists to implement their own reward models and plug them into the allocation engine via a dedicated microservice for each bandit experiment.
Inspired by Stitch Fix's case study, we built the COBE Platform using
stateDiagram
classDef platform font-family: courier, font-size:16px, fill:transparent, stroke-width:2px
classDef container font-family: courier, font-size:12px, fill:transparent
classDef actor font-family: courier, font-size:12px
classDef none font-family: none, font-size:none
direction LR
Users --> LoadBalancer:::container
Dev --> CobePlatform:::platform
state CobePlatform {
direction LR
LoadBalancer --> WebControl
LoadBalancer --> WebTreatment
WebControl --> PolicyLearner
WebTreatment --> PolicyLearner
PolicyLearner --> LoadBalancer
}
WebControl:::container --> Users:::actor
WebTreatment:::container --> Users
Users --> PolicyLearner:::container
WebControl --> WandB:::platform
WebTreatment --> WandB
PolicyLearner --> WandB
WandB --> Dev:::actor
Dev --> PolicyLearner
- Create and sign into your Weights and Biases account.
- Locate the API Key here, copy it and add the secret key in the
.env
file under environment variableWANDB_API_KEY
. - Install Docker Desktop.
-
Clone the repository to your local environment.
git clone https://github.com/wirrywoo/cobe-platform.git
-
Go into the
cobe-platform
main directory and build the containers.cd cobe-platform; docker compose up -d
-
Go to browser and enter
http://127.0.0.1/cobe-platform-demo/?seed=1
to see control group andhttp://127.0.0.1/cobe-platform-demo/?seed=3
to see treatment group. Reference screenshots of control and treatment versions of the landing page. -
Clicking on the Sign Me Up! button will register
reward = 1
in both the logs of the respective Docker container, and record the reward in a Weights and Biases project namedcobe-platform
. Conversely, navigating away or refreshing the page without clicking on the button will registerreward = 0
in the same locations. -
To observe contextual bandits in action, reference the following Google Colab notebook to simulate the setting when hundreds of users interact with the COBE Platform.
Landing Page for Control Group (with Docker Logo):
Landing Page for Treatment Group (without Docker Logo):
Average Reward Performance of Control vs. Treatment Variations
Under an unobserved cost function used to synthetically generate data for simulation purposes in the provided Google Colab notebook, we observe that the control variant (landing page with Docker logo) outperforms the treatment variant (landing page without Docker logo) over number of observations.
Updating NGINX Probabilities from CB Learning
For a fixed user, the probabilities of directing that user to one of the landing pages converge as more users interact with both control and treatment versions of the landing page. Using myself as an example in the provided Google Colab notebook, the policy recommends that the control version of the landing page should be shown to all users similar to me in terms of click activity and technical skills.
Distributed under the MIT License. See LICENSE.txt
for more information.
Wilson Cheung - Personal Website - [email protected]
Project Link: https://github.com/wirrywoo/cobe-platform