Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early Exit NDT Rollout #395

Open
7 tasks done
cristinaleonr opened this issue Sep 14, 2023 · 6 comments
Open
7 tasks done

Early Exit NDT Rollout #395

cristinaleonr opened this issue Sep 14, 2023 · 6 comments

Comments

@cristinaleonr
Copy link
Contributor

cristinaleonr commented Sep 14, 2023

Objective:

Monitor the phased rollout of early-exit NDT.

Steps:

  • Release ndt-server changes.
  • Release Locate probability changes set to 1%.
  • Release client changes.
  • Confirm metrics/analyses look as expected and increase Locate probability to 10%.
  • Confirm metrics/analyses look as expected and increase Locate probability to 30%.
  • Confirm metrics/analyses look as expected and increase Locate probability to 50%.
  • Confirm metrics/analyses look as expected and increase Locate probability to 90%.

Rollout dashboard:

https://grafana.mlab-oti.measurementlab.net/d/W8JPPzzIz/ndt-early-exit?orgId=1

Criteria before progressing:

  • Download performance and bytes sent graphs match expectations.
  • No increase in client- or server-side errors.
  • Global test rates are unaffected.
  • No alerts are firing due to the rollout.

Rollback criteria:

  • If there is any change to the global test rates or any alerts that fire due to the rollout, we should immediately roll back.
  • If there is any increase in client- or server-side errors, we should immediately roll back.
  • If any of the dashboard panels do not match expectations, we should consider rolling back.

Things we expect to change:

  • Number of bytes sent per test.
@cristinaleonr
Copy link
Contributor Author

Checks for 1% of traffic:

  • Download performance closely matches the full test.
    Screenshot 2023-09-19 2 45 15 PM

  • Bytes sent top out at about 250MB.
    Screenshot 2023-09-19 2 46 30 PM

  • Stable number of client-side errors over the past week.

  • Stable number of ndt7 download errors.
    Screenshot 2023-09-19 3 10 35 PM

  • Global NDT test rate also stable.
    Screenshot 2023-09-19 3 13 14 PM

  • Alerts firing unrelated to launch.

    • GardenerFailureRateTooHighOrMissing: related to the pipeline.
    • DataQuality_TooManyNdtS2cTestsWithDiscards: specific to MIA and DFW.
    • PlatformCluster_IPv6AtSiteDownForTooLong: specific to TUN01.

@cristinaleonr
Copy link
Contributor Author

Checks for 10% of traffic:

  • Download performance (last 7 days) closely matches the full test.
    Screenshot 2023-09-26 1 22 24 PM

  • Bytes sent (last 7 days) top out at about 250MB.
    Screenshot 2023-09-26 1 23 31 PM

  • Stable number of client-side errors over the past week.

  • Stable number of ndt7 download errors (last 15 days).
    Screenshot 2023-09-26 2 23 10 PM

  • Global NDT test rate also stable (last 15 days).
    Screenshot 2023-09-26 2 25 53 PM

  • Alerts firing unrelated to launch.

    • SwitchUnpingableAtSite: specific to lim03.
    • DataQuality_TooManyNdtS2cTestsWithDiscards: specific to MIA and DFW.
    • PlatformCluster_IPv6AtSiteDownForTooLong: specific to TUN01.

@cristinaleonr
Copy link
Contributor Author

Checks for 30% of traffic:

  • Download performance (last 7 days) closely matches the full test. Slight variations around 200-300 Mbps.

Screenshot 2023-10-03 3 33 09 PM

  • Bytes sent (last 7 days) top out at about 250MB.

Screenshot 2023-10-03 3 35 11 PM

  • Stable number of client-side errors over the past week.

  • Increasing number of ndt7+wss download measurer-closed-early errors in ndt7 (expected). Other errors seem stable (last 15 days).

Screenshot 2023-10-03 3 41 18 PM

  • Global NDT test rate also stable (last 15 days).

Screenshot 2023-10-03 3 42 58 PM

@cristinaleonr
Copy link
Contributor Author

cristinaleonr commented Oct 12, 2023

Granular performance comparisons at the site/metro/AS level show comparable ndt7 download performance for early-exit ("short") and full tests.

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

@cristinaleonr
Copy link
Contributor Author

Checks for 50% of traffic:

  • Download performance (last 7 days) closely matches the full test.
    Screenshot 2023-11-10 10 46 51 AM

  • Bytes sent (last 7 days) top out at about 250MB.
    Screenshot 2023-11-10 10 48 51 AM

  • Stable number of client-side errors over the past week.

  • Stable number of ndt7 download errors (last 15 days).
    Screenshot 2023-11-10 10 57 44 AM

  • Global NDT test rate stable (last 15 days).
    Screenshot 2023-11-10 11 00 19 AM

  • Alerts firing unrelated to launch.

    • DataQuality_TooManyNdtS2cTestsWithDiscards: specific to BOM and DWF.
    • GardenerFailureRateTooHighOrMissing: specific to the pipeline.

@cristinaleonr
Copy link
Contributor Author

Granular performance comparisons (CDFs/geometric distance) at the site/metro/AS level show comparable ndt7 download performance for early-exit ("short") and full tests.

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant