There are some questions when i use the Ax #2342

cyrilmyself · 2024-04-10T02:37:39Z

1、when i use Ax‘s bayesian optimization to search the good parameters，i found a set good parameters like {"actionShortUnInterestW":13.260257,"actionFollowShortInterestW":7.287298,"actionShortInterestW":7.512222}，and the metric behaved good like {"metric1": +2.33%, "metric2":+1.58%, "metric3":+1.88%}，but when i set the same parameters in more flow，the result is like {"metric1": +0.01%, "metric2":-0.02%, "metric3":+0.02%} ，not good as the before；

2、when i use Ax‘s bayesian optimization in A/B Testing，Ax produce new parameters，is the flow need to change to avoid the carry over

mpolson64 · 2024-04-11T20:39:02Z

Hi, thanks for reaching out.

To your first question: is it possible there is noise in the system you're trying to optimize? Or could there be some nonstationarity in your readings (ie the output changes over time in a way that is not related to your parameterization)? Both of these make it more difficult for Bayesian optimization to perform well. Our methods do their best to estimate noise internally and optimize for the true value, but sometimes there is simply too much noise for BO. You can use the function interact_cross_validation_plotly to get a plot that should show how well the Ax's model is performing on your data.

To the second question could you elaborate what you mean by changing the flow and "carry over"?

cyrilmyself · 2024-04-12T04:13:57Z

@mpolson64 thank you for your replying

For the second question, changing the flow is described as below:
firstly, i use Ax's bayesian optimization to produce a set of parameters in a A/B testing with a group of users
secondly, i set the parameters in a A/B testing with another group of users

carry over means :
firstly i set parameters in A/B testing with a group of users, after a while i change the parameters in A/B testing with the same group of users, carry over means the influence on users of the first parameters will last for a while even i change the parameters;

maor096 · 2024-04-12T09:32:22Z

The same problem we encountered，We are using AB experiments for hyperparameter tuning, where there are 3 experimental groups, 3 optimization goals, and 1 constraint. Specific information can be found in the JSON file below. Currently, we have encountered the following issues: in the 15th and 16th rounds, we found some promising hyperparameter combinations, for example {"read_quality_factor":1, "duration_factor":0.5, "pos_interaction_factor":0.2, "score_read_factor":1}, with target effects of {'a':+0.98%, 'b':+0.68%, 'c':+1.49%, 'd':+0.67%}, where the p-value ranges from 0.005 to 0.08. However, when we conduct large-scale AB experiments with these promising hyperparameter combinations, we often encounter situations where the effects cannot be replicated. We would like to inquire about the following two questions:
1、Does Facebook's hyperparameter tuning AB experiment encounter similar issues? We have already used CUPED to reduce the variance of the experimental data for each round . What optimization suggestions do you have for similar issues?
2、For each experimental group, the same batch of users is used every time when deploying hyperparameters. We suspect that the inability to replicate the experimental effects may be related to carry over. Does Facebook's hyperparameter tuning AB experiment reshuffle the experimental users when deploying hyperparameters?"
snapshot.json

eytan · 2024-04-12T14:17:09Z

Hi all, I would definitely recommend “reshuffling” (or simply creating a new experiment) for each batch. Otherwise you have carryover effects. Variance reduction is always a good idea. We use regression adjustment using pre-treatment covariants along the lines of CUPED for most AB tests. Second, 3 arms per batch is probably inefficient / problematic. Typically we use at least 8, but sometimes as many as 64. For 3 parameters though maybe 5 could be OK. The GP borrows strength across conditions so you can make the allocations smaller than you normally would if you wanted to have an appropriately powered AB test. Note that AB tests cause some non stationary, in that treatment effects change over time. I recommend making sure each batch runs for enough time to “settle down”, and using the same number of days per batch. There is more sophisticated adjustment procedure that we use at Meta. if you send me an email (which you can find at http://eytan.GitHub.io) I can send you a preprint that explains the considerations and procedures in more detail. Best, E

…

On Fri, Apr 12, 2024 at 5:32 AM maor096 ***@***.***> wrote: The same problem we encountered，We are using AB experiments for hyperparameter tuning, where there are 3 experimental groups, 3 optimization goals, and 1 constraint. Specific information can be found in the JSON file below. Currently, we have encountered the following issues: in the 15th and 16th rounds, we found some promising hyperparameter combinations, for example {"read_quality_factor":1, "duration_factor":0.5, "pos_interaction_factor":0.2, "score_read_factor":1}, with target effects of {'a':+0.98%, 'b':+0.68%, 'c':+1.49%, 'd':+0.67%}, where the p-value ranges from 0.005 to 0.08. However, when we conduct large-scale AB experiments with these promising hyperparameter combinations, we often encounter situations where the effects cannot be replicated. We would like to inquire about the following two questions: 1、Does Facebook's hyperparameter tuning AB experiment encounter similar issues? We have already used CUPED to reduce the variance of the experimental data for each round . What optimization suggestions do you have for similar issues? 2、For each experimental group, the same batch of users is used every time when deploying hyperparameters. We suspect that the inability to replicate the experimental effects may be related to carry over. Does Facebook's hyperparameter tuning AB experiment reshuffle the experimental users when deploying hyperparameters?" — Reply to this email directly, view it on GitHub <#2342 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAW34KJT5PCRCMW7JGL6NTY46S3ZAVCNFSM6AAAAABF7TWSPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJRGM4TOMJXHE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

maor096 · 2024-04-15T02:46:06Z

Hi all, I would definitely recommend “reshuffling” (or simply creating a new experiment) for each batch. Otherwise you have carryover effects. Variance reduction is always a good idea. We use regression adjustment using pre-treatment covariants along the lines of CUPED for most AB tests. Second, 3 arms per batch is probably inefficient / problematic. Typically we use at least 8, but sometimes as many as 64. For 3 parameters though maybe 5 could be OK. The GP borrows strength across conditions so you can make the allocations smaller than you normally would if you wanted to have an appropriately powered AB test. Note that AB tests cause some non stationary, in that treatment effects change over time. I recommend making sure each batch runs for enough time to “settle down”, and using the same number of days per batch. There is more sophisticated adjustment procedure that we use at Meta. if you send me an email (which you can find at http://eytan.GitHub.io) I can send you a preprint that explains the considerations and procedures in more detail. Best, E
…
On Fri, Apr 12, 2024 at 5:32 AM maor096 @.> wrote: The same problem we encountered，We are using AB experiments for hyperparameter tuning, where there are 3 experimental groups, 3 optimization goals, and 1 constraint. Specific information can be found in the JSON file below. Currently, we have encountered the following issues: in the 15th and 16th rounds, we found some promising hyperparameter combinations, for example {"read_quality_factor":1, "duration_factor":0.5, "pos_interaction_factor":0.2, "score_read_factor":1}, with target effects of {'a':+0.98%, 'b':+0.68%, 'c':+1.49%, 'd':+0.67%}, where the p-value ranges from 0.005 to 0.08. However, when we conduct large-scale AB experiments with these promising hyperparameter combinations, we often encounter situations where the effects cannot be replicated. We would like to inquire about the following two questions: 1、Does Facebook's hyperparameter tuning AB experiment encounter similar issues? We have already used CUPED to reduce the variance of the experimental data for each round . What optimization suggestions do you have for similar issues? 2、For each experimental group, the same batch of users is used every time when deploying hyperparameters. We suspect that the inability to replicate the experimental effects may be related to carry over. Does Facebook's hyperparameter tuning AB experiment reshuffle the experimental users when deploying hyperparameters?" — Reply to this email directly, view it on GitHub <#2342 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAW34KJT5PCRCMW7JGL6NTY46S3ZAVCNFSM6AAAAABF7TWSPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJRGM4TOMJXHE . You are receiving this because you are subscribed to this thread.Message ID: @.>

Thank you for your suggestion

maor096 · 2024-04-16T12:14:23Z

@eytan hi eytan
I have sent an email to you based on the information at http://eytan.GitHub.io, looking forward to your reply. Thank you very much.

cyrilmyself · 2024-04-25T03:49:10Z

hi，@eytan，i also want the preprint that explains the considerations and procedures that you use in Meta，can you send me by email.
My email address is [email protected].
I am really looking forward to your reply

mpolson64 self-assigned this Apr 11, 2024

mpolson64 added the question Further information is requested label Apr 11, 2024

mgrange1998 mentioned this issue May 7, 2024

Looking for suggestions for the objectives data #2435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There are some questions when i use the Ax #2342

There are some questions when i use the Ax #2342

cyrilmyself commented Apr 10, 2024 •

edited

mpolson64 commented Apr 11, 2024

cyrilmyself commented Apr 12, 2024

maor096 commented Apr 12, 2024 •

edited

eytan commented Apr 12, 2024 via email

maor096 commented Apr 15, 2024

maor096 commented Apr 16, 2024

cyrilmyself commented Apr 25, 2024

There are some questions when i use the Ax #2342

There are some questions when i use the Ax #2342

Comments

cyrilmyself commented Apr 10, 2024 • edited

mpolson64 commented Apr 11, 2024

cyrilmyself commented Apr 12, 2024

maor096 commented Apr 12, 2024 • edited

eytan commented Apr 12, 2024 via email

maor096 commented Apr 15, 2024

maor096 commented Apr 16, 2024

cyrilmyself commented Apr 25, 2024

cyrilmyself commented Apr 10, 2024 •

edited

maor096 commented Apr 12, 2024 •

edited