Note: this R script includes codes to classify different types of weight trajectory. However, this is not a program that could run it by itself. Feel free to copy and modify the script to fit your own dataset.
In addition, the PheWAS summary statistics of weight trajectory is included in the tar.gz file.
Please cite this project (manuscript in preparation, link to the preprint will be provided once it is ready) if you use any of the R code for weight trajectory classification in your cohort/biobank, or if you use the weight trajectory PheWAS summary statistics in your study.
You can plug in your own dataset (replace the your_data dataset, which is a place holder) to identify weight trajectory for participants in your study cohort.
Your dataset should be organized in a long format (e.g. ID, annual weight value, calendar year, like the example below).
Individual_ID | AnnualKG | YOMeasure | Other variables |
---|---|---|---|
ID1 | 58 | 2010 | etc |
ID1 | 64 | 2012 | etc |
ID1 | 60 | 2013 | etc |
ID2 | 80 | 2007 | etc |
ID2 | 70 | 2008 | etc |
ID2 | 60 | 2009 | etc |
ID2 | 65 | 2010 | etc |
Definition: Maximum weight change from first annual weight < 5% or 10%. The cutoff was selected based on previous evidence that a weight change of 5% or more could be clinically relevant [1,2,3].
Location in the R script: line 4-21
Figure illustration
The figures below give a cartoon illustration and a real example of what is classified as stable weight trajectory in the BioMe Biobank using the 5% cutoff (could be changed to 10%).
The x axis represents time and the y axis is the weight change in percentage over time. [The x axis label could be ignored.]
Definition:
- The net weight loss from the first annual weight to the last annual weight was > 0
- The maximum weight loss from baseline was ≥ 5% (or 10%)
- Overall the individual had more weight loss than weight gain over time
- The maximum weight gain from baseline was < 5% (or 10%)
- The amount of maximum weight gain from baseline was < 45% of the overall weight change magnitude (maximum annual weight - minimum annual weight)
If any individual meets all three criteria, then he/she had a weight loss trajectory. To meet the 3rd criteria, either (i) or (ii) works
Location in the R script: line 27-66
The figures below give a cartoon illustration and a real example of what is classified as weight loss trajectory in the BioMe Biobank using the 5% cutoff (could be changed to 10%).
The example on the left meets criteria (1), (2) and (3.i), while the example on the right meets criteria (1), (2) and (3.ii). [The x axis label could be ignored.]
Below is an example of weight loss trajectory in the BioMe Biobank.
Definition:
- The net weight gain from the first annual weight to the last annual weight was > 0
- The maximum weight gain from baseline was ≥ 5% (or 10%)
- Overall the individual had more weight gain than weight loss over time
- The maximum weight loss from baseline was < 5% (or 10%)
- The amount of maximum weight loss from baseline was < 45% of the overall weight change magnitude (maximum annual weight - minimum annual weight)
If any individual meets all three criteria, then he/she had a weight gain trajectory. To meet the 3rd criteria, either (i) or (ii) works
Location in the R script: line 72-94
The figures below give a cartoon illustration and a real example of what is classified as weight gain trajectory in the BioMe Biobank using the 5% cutoff (could be changed to 10%).
The example on the left meets criteria (1), (2) and (3.i), while the example on the right meets criteria (1), (2) and (3.ii). [The x axis label could be ignored.]
Below is an example of weight gain trajectory in the BioMe Biobank.
Definition:
- Local maximum/minimum approach based on inflection points (R script line: 105-349)
- Global maximum/minimum approach based on maximum and minimum annual weights per individual (R script line: 353-773)
- When the maximum and minimum annual weights are not both at the two ends of the weight trajectory at the same time (i.e., the first and last annual weights) (R script line: 545-680)
- When the maximum and minimum annual weights are at the two ends of the weight trajectory at the same time (i.e., the first and last annual weights) (R script line:682-773)
Location in the R script: line 100-794
Below is an example that meets the local weight cycle definition (there is at least one weight gain and one weight loss change ≥ 5% between inflection points). The inflection point was identified based on the slopes (i.e., a positive slope followed by a negative slope, or vice versa).
Below is an example that meets the first global weight cycle definition (e.g., maximum weight is not at the two ends of the weight trajectory).
- Maximum annual weight/first annual weight ≥ 5%
- Minimum annual weight/maximum annual weight ≤ -5%
- This does not meet the local weight cycle criteria as none of the weight changes between inflection points reached 5%
And below is an example that meets the second global weight cycle definition (i.e., maximum and minimum annual weights are at the two ends of the weight trajectory)
- Second maximum annual weight/minimum annual weight ≥ 5%
- Second minimum annual weight/second maximum annual weight ≤ -5%
- Maximum annual weight/second minimum annual weight ≥ 5%
- This does not meet the local weight cycle criteria because even though some weight gain changes between inflection points reached 5%, none of the weight loss between inflection points reached 5%.
Sensitivity, specificity, and accuracy of this method to identify weight trajectory in a biobank setting
We drew a random set of 100 participants as a validation set and manually checked each of the weight trajectory plots to assess if they have been classified correctly.
Trajectory | Accuracy | Sensitivity | Specificity |
---|---|---|---|
Weight loss | 98% | 97.2% | 98.4% |
Weight gain | 99% | 97.3% | 100% |
Weight cycle | 98% | 98% | 98% |
Stable weight | 100% | 100% | 100% |
We kept the first plateau point as an inflection point candidate and removed the second plateau point. If the first plateau point turned out not to satisfy the inflection point definition (positive slope followed by negative slope, or vice versa), we then removed the first plateau point out of the set of inflection points and recalculated the weight changes between the new set of inflection points along with the first and last annual weights.
We used annual weight to construct weight trajectory, instead of every single weight measure in the electronic health records for each individual, so as to minimize the influences of any weight outliers that could skew or bias the correct identification of the weight trajectory (e.g., implausible weight values for one individual due to human or technical errors, e.g. typo, scale problem, etc).
Annual weight is calculated as an average weight in one calendar year per individual, and we found that it did a decent job of capturing the overall weight trajectory pattern over years.
Below is an example of one individual who had 496 weight masures in total and the weight trajectory is captured well using annual weights.
- Stevens J, Truesdale KP, McClain JE, Cai J. The definition of weight maintenance. International Journal of Obesity 2006; 30: 391–9.
- Blair SN, Shaten J, Brownell K, Collins G, Lissner L. Body weight change, all-cause mortality, and cause-specific mortality in the Multiple Risk Factor Intervention Trial. Ann Intern Med 1993; 119: 749–57.
- French SA, Folsom AR, Jeffery RW, Zheng W, Mink PJ, Baxter JE. Weight variability and incident disease in older women: the Iowa Women’s Health Study. International Journal of Obesity 1997; 21: 217–23.
- Please cite the reference below if you want to use this method to model your own longitudinal weight trajectory, or longitudinal trajectory of any other quantitative traits!
Xu J, Johnson JS, Signer R, Eating Disorders Working Group of the Psychiatric Genomics Consortium, Birgegård A, Jordan J, Kennedy MA, Landén M, Maguire SL, Martin NG, Mortensen PB, Petersen LV, Thornton LM, Bulik CM, Huckins LM. Exploring the clinical and genetic associations of adult weight trajectories using electronic health records in a racially diverse biobank: a phenome-wide and polygenic risk study. Lancet Digital Health 2022; 4(8):e604-e614.