Skip to content

xuj18/BioMe_weight_project

Repository files navigation

Weight trajectory classification in a biobank setting

Note: this R script includes codes to classify different types of weight trajectory. However, this is not a program that could run it by itself. Feel free to copy and modify the script to fit your own dataset.

In addition, the PheWAS summary statistics of weight trajectory is included in the tar.gz file.

Please cite this project (manuscript in preparation, link to the preprint will be provided once it is ready) if you use any of the R code for weight trajectory classification in your cohort/biobank, or if you use the weight trajectory PheWAS summary statistics in your study.

You can plug in your own dataset (replace the your_data dataset, which is a place holder) to identify weight trajectory for participants in your study cohort.

Your dataset should be organized in a long format (e.g. ID, annual weight value, calendar year, like the example below).

Individual_ID AnnualKG YOMeasure Other variables
ID1 58 2010 etc
ID1 64 2012 etc
ID1 60 2013 etc
ID2 80 2007 etc
ID2 70 2008 etc
ID2 60 2009 etc
ID2 65 2010 etc

Stable weight trajectory

Definition: Maximum weight change from first annual weight < 5% or 10%. The cutoff was selected based on previous evidence that a weight change of 5% or more could be clinically relevant [1,2,3].

Location in the R script: line 4-21

Figure illustration

The figures below give a cartoon illustration and a real example of what is classified as stable weight trajectory in the BioMe Biobank using the 5% cutoff (could be changed to 10%).

The x axis represents time and the y axis is the weight change in percentage over time. [The x axis label could be ignored.]

stable_weight_illustration

supl figure2 stable

Weight loss trajectory

Definition:

  1. The net weight loss from the first annual weight to the last annual weight was > 0
  2. The maximum weight loss from baseline was ≥ 5% (or 10%)
  3. Overall the individual had more weight loss than weight gain over time
    1. The maximum weight gain from baseline was < 5% (or 10%)
    2. The amount of maximum weight gain from baseline was < 45% of the overall weight change magnitude (maximum annual weight - minimum annual weight)

If any individual meets all three criteria, then he/she had a weight loss trajectory. To meet the 3rd criteria, either (i) or (ii) works

Location in the R script: line 27-66

The figures below give a cartoon illustration and a real example of what is classified as weight loss trajectory in the BioMe Biobank using the 5% cutoff (could be changed to 10%).

The example on the left meets criteria (1), (2) and (3.i), while the example on the right meets criteria (1), (2) and (3.ii). [The x axis label could be ignored.]

weight_loss_illustration

Below is an example of weight loss trajectory in the BioMe Biobank.

supl figure2 loss

Weight gain trajectory

Definition:

  1. The net weight gain from the first annual weight to the last annual weight was > 0
  2. The maximum weight gain from baseline was ≥ 5% (or 10%)
  3. Overall the individual had more weight gain than weight loss over time
    1. The maximum weight loss from baseline was < 5% (or 10%)
    2. The amount of maximum weight loss from baseline was < 45% of the overall weight change magnitude (maximum annual weight - minimum annual weight)

If any individual meets all three criteria, then he/she had a weight gain trajectory. To meet the 3rd criteria, either (i) or (ii) works

Location in the R script: line 72-94

The figures below give a cartoon illustration and a real example of what is classified as weight gain trajectory in the BioMe Biobank using the 5% cutoff (could be changed to 10%).

The example on the left meets criteria (1), (2) and (3.i), while the example on the right meets criteria (1), (2) and (3.ii). [The x axis label could be ignored.]

weight_gain_illustration

Below is an example of weight gain trajectory in the BioMe Biobank.

supl figure2 gain

Weight cycle trajectory

Definition:

  1. Local maximum/minimum approach based on inflection points (R script line: 105-349)
  2. Global maximum/minimum approach based on maximum and minimum annual weights per individual (R script line: 353-773)
    1. When the maximum and minimum annual weights are not both at the two ends of the weight trajectory at the same time (i.e., the first and last annual weights) (R script line: 545-680)
    2. When the maximum and minimum annual weights are at the two ends of the weight trajectory at the same time (i.e., the first and last annual weights) (R script line:682-773)

Location in the R script: line 100-794

Below is an example that meets the local weight cycle definition (there is at least one weight gain and one weight loss change ≥ 5% between inflection points). The inflection point was identified based on the slopes (i.e., a positive slope followed by a negative slope, or vice versa).

local_weight_cycle_illustration

Below is an example that meets the first global weight cycle definition (e.g., maximum weight is not at the two ends of the weight trajectory).

  1. Maximum annual weight/first annual weight ≥ 5%
  2. Minimum annual weight/maximum annual weight ≤ -5%
  3. This does not meet the local weight cycle criteria as none of the weight changes between inflection points reached 5%

global_weight_cycle_illustration_1

And below is an example that meets the second global weight cycle definition (i.e., maximum and minimum annual weights are at the two ends of the weight trajectory)

  1. Second maximum annual weight/minimum annual weight ≥ 5%
  2. Second minimum annual weight/second maximum annual weight ≤ -5%
  3. Maximum annual weight/second minimum annual weight ≥ 5%
  4. This does not meet the local weight cycle criteria because even though some weight gain changes between inflection points reached 5%, none of the weight loss between inflection points reached 5%.

global_weight_cycle_illustration_2

Sensitivity, specificity, and accuracy of this method to identify weight trajectory in a biobank setting

We drew a random set of 100 participants as a validation set and manually checked each of the weight trajectory plots to assess if they have been classified correctly.

Trajectory Accuracy Sensitivity Specificity
Weight loss 98% 97.2% 98.4%
Weight gain 99% 97.3% 100%
Weight cycle 98% 98% 98%
Stable weight 100% 100% 100%

QC before identifying the weight trajectory for each individual

weight_QC_update

Additional notes

How to calculate weight changes in between inflection points if there is a plateau?

We kept the first plateau point as an inflection point candidate and removed the second plateau point. If the first plateau point turned out not to satisfy the inflection point definition (positive slope followed by negative slope, or vice versa), we then removed the first plateau point out of the set of inflection points and recalculated the weight changes between the new set of inflection points along with the first and last annual weights.

removal_plateau_points

Why do we use annual weight to identify weight trajectory pattern?

We used annual weight to construct weight trajectory, instead of every single weight measure in the electronic health records for each individual, so as to minimize the influences of any weight outliers that could skew or bias the correct identification of the weight trajectory (e.g., implausible weight values for one individual due to human or technical errors, e.g. typo, scale problem, etc).

Annual weight is calculated as an average weight in one calendar year per individual, and we found that it did a decent job of capturing the overall weight trajectory pattern over years.

Below is an example of one individual who had 496 weight masures in total and the weight trajectory is captured well using annual weights.

single_weight_vs_annual_weight

Reference

  1. Stevens J, Truesdale KP, McClain JE, Cai J. The definition of weight maintenance. International Journal of Obesity 2006; 30: 391–9.
  2. Blair SN, Shaten J, Brownell K, Collins G, Lissner L. Body weight change, all-cause mortality, and cause-specific mortality in the Multiple Risk Factor Intervention Trial. Ann Intern Med 1993; 119: 749–57.
  3. French SA, Folsom AR, Jeffery RW, Zheng W, Mink PJ, Baxter JE. Weight variability and incident disease in older women: the Iowa Women’s Health Study. International Journal of Obesity 1997; 21: 217–23.

Citation

  1. Please cite the reference below if you want to use this method to model your own longitudinal weight trajectory, or longitudinal trajectory of any other quantitative traits!

Xu J, Johnson JS, Signer R, Eating Disorders Working Group of the Psychiatric Genomics Consortium, Birgegård A, Jordan J, Kennedy MA, Landén M, Maguire SL, Martin NG, Mortensen PB, Petersen LV, Thornton LM, Bulik CM, Huckins LM. Exploring the clinical and genetic associations of adult weight trajectories using electronic health records in a racially diverse biobank: a phenome-wide and polygenic risk study. Lancet Digital Health 2022; 4(8):e604-e614.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages