-
Notifications
You must be signed in to change notification settings - Fork 5
Future Year Population
Developing the Synthetic Population control files and seed data for a base or existing year is a relatively straight forward exercise. The near current data (usually just two years back) is available from census.gov, and a user just needs to associate the census data of interest with the approriate zones or geographies for the Synthetic Population (there's a little more to it than that, but at a high level it's a fairly simple process). For future years, there is no official data set to use. Therefore, a methodology and process needs to be established to develop the control files and seed data for future years.
One foundational element to the future year population is... the population (total). In Oregon, the Population Research Center (PRC) at Portland State University (PSU) is responsible for forecasting the population into the future for all counties, cities and MPOs in Oregon. The future forecasts provided are used as specific population controls for regions within the larger model boundary. Unfortunately, model boundaries usually do not perfectly align with political boundaries where the population estimates are available. Therefore, the model region as a whole might not have an official total population, but the county population totals along with other jurisdiction forecasts provide the basis for the future year population totals for the full area that the model covers.
PRC provides those future year populations in various time steps and by age and gender demographics. Because PRC provides a trusted source for both total population, but also population by age and gender, ODOT uses that additional information to properly "age" model populations into the future. However, it's not as simple as just providing new age controls to the Population Synthesizer. All the controls need to be consistent. As an example, older households generally have fewer jobs, fewer persons, fewer children. If only the age control is updated, but the total number of children stays the same, the controls are not consistent, and the population synthesizer will do a poor job of matching all the controls provided. Therefore, the process of aging the population into the future has several steps, which are covered further in the following section.
Similar to the population totals, 10 year forecasts of Employment by regions within Oregon can be obtained from the Oregon Department of Employment. Those projections, while not required to be used, provide the best information available for how the region wide job market will grow and change. Therefore, similar to the population controls, occupation controls at the region level are also prepared and used as controls as the population is aged. The process to develop future year employment controls is documented here.
As is introduced above, building a future population should include aging the population. Aging the population requires work to ensure that all the population synthesis controls are consistent. Here are the steps that have been applied for the ABM to work to ensure that the controls are consistent and will produce the intended synthetic population output for the future year ABM:
- Base year controls are provided including base year weighted PUMS seed
- An iterative proportional fitting (IPF) function is applied to the original PUMS weights to create a starting point where the PUMS seed weights produce the ABM's base year age and occupation totals
- New (Future) age distribution is provided
- The IPF function is used again, this time to adjust the PUMS weights from the base year age and occupation totals to the future year age and occupation totals
- The adjusted PUMS weights now will summarize (total) to the future household age and occupation totals. The benefit of the re-weighted PUMS records provides the ability to tabulate any demographic summaries (workers, children, income, household size) for the local areas' residents that have been re-weighted to align with an aged population for the area. This allows for a method to assess how the aged population could change other demographics and a consistent way to project those other census level demographic controls.
Each of these steps is now discussed in greater detail.
It should also be noted that setting the future occupation mix is done in the same step as aging the population. Testing with the "aging" process resulted in a new distribution of occupations that varied from the base year distribution and the proposed future occupation distribution provided by OED projects (noted above). Population Synthesis controls could be used to achieve the occupation desired, but matching all of the controls proved difficult or problematic. The solution was to adjust the PUMS weights for both future age distribution and the desired occupation distribution at the same time. This allowed for the population synthesizer to achieve the controls with minimal deviation, but does cause a much larger spread in adjustments to PUMS weights (as is shown / discussed below). It remains an open question as to whether the adjustment (divation from original record weights) is better made in the "pre-processor" steps laid out on this page, or in the balancing that occurs in the tool itself. As of this time the approach is to re-weight PUMS data for the future based on both prescribed age and occupation distributions as a pre-processor to the population synthesizer (as is described on this page).
The Process begins with reading in base/existing year level controls, including the raw PUMS data. The PUMS data is then processed (adding ABM specific fields, like the 6 occupation types) in the exact way that it is processed prior to population synthesis, giving access to all the fields used to control the population synthesis. Part of processing also includes filtering up to just PUMAs for the region.
While not required, the process has some initial review steps and plots for understanding the different characteristics of the older population (greater the 65 years old), and those less than 65 years old. This review and plotting is done with the original PUMS weights before any adjusting is done, and provides some insight into trends to expect. Specifically, how the following distributions are different for older (greater than 65) populations:
- Household Type shifts to mobile home from multifamily and duplex (single family is roughly the same).
- Household size shifts to 1 and 2 person households
- Household income decreases
- The number of zero worker households increases
- The number of zero children households increases
- Jobs decrease across the board, but specifically blue collar and "Natural Resources, Construction, and Maintenance" occupations get the biggest decreases
This may be considered an optional step. Overall the process re-weights PUMS data to the future year age and occupation distributions. This process could be applied directly to the original (raw) PUMS weights. However, the original PUMS weights don't align perfectly with the base year distributions that has been specified. This could be due to the fact that PUMAs don't perfectly nest within the model boundary, the model age distribution is established separately from the census data, and/or that there are ranges of uncertainty around any of the ACS data within the PUMS records. All of those contribute to the slight differences between the age distribution found in the PUMS data from using the original PUMS weights and the age distribution input to the population synthesizer for the base year. Because of these differences the first step is to apply the IPF process to the shift the PUMS weight to match the base year age and occupation totals before the IPF is used to shift the PUMS weights further to match the future year age and occupation totals.
In either case (for the base year age distribution or the future year age distribution) the steps to IPF (adjust) the PUMS weights are the same:
- An Age category field is added to the PUMS records to bin each person record into the 12 age bins used in the ABM's population synthesis. (Occupation categories were already added when the PUMS records were processed for use in the Syn Pop).
- A table is created where each row is an individual PUMS household ID and the columns are the 12 age categories and then the 6 occupation bins. So every household record has a count of how many persons fall in each age and occupation bin.
HHID | AGE1 | AGE2 | AGE3 | AGE4 | AGE5 | AGE6 | AGE7 | AGE8 | AGE9 | AGE10 | AGE11 | AGE12 | OCCP1 | OCCP2 | OCCP3 | OCCP4 | OCCP5 | OCCP6 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20 | 1 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
49 | 0 | 3 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
66 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
78 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
86 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
98 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
- The starting household weights are saved out.
- The IPF control is set, which is the total number of person records per bin (meaning that the column sums of the table above needs to sum to the control given).
- A While loop is then run. In that while loop the change needed for each of the 12 age categories plus 6 occupation categories is calculated as a factor. A given household weight can apply to multiple age and occupation categories, the factor for each age/occ category is repeated for every person in the household and then an average is taken. For household ID 20 in the table above this would mean that the factor for Age1 would be included once, Age2 twice, Age7 twice, Occ4 once, and Occ5 once. So the factor applied to HHID 20 would be = (Age1_Factor + Age2_Factor + Age2_Factor + Age7_Factor + Age7_Factor + Occ4_Factor + Occ5_Factor) / 7
- The while loop is iterated until the adjusted weights by household times the persons in each age/occ category summed by age and occupation category equals the global control totals by the 12 age and 6 occupation categories within a set tolerance, or until a maximum number of iterations is hit. For the base year it takes 41 iterations for PUMA 800 and 78 iterations for PUMA 900 (Jackson County) of this loop to achieve all 12 age and 6 occupation category totals are within 0.01% of the total number of people specified per age/occ category.
This step is a simple reading in of the desired future year totals by age/occ category, providing the IPF with the controls needed to re-balance the PUMS weights to the future year (aged) populations by age/occ category.
This process is identical to the steps for the base year, except now the starting point is the weights that were adjusted for the base year age and occupation totals, as opposed to raw PUMS weights, and the controls are now the future age and occupation totals. In this SOABM example, it took 99 iterations for PUMA 800 and 78 interations for PUMA 900 (Jackson County) for the IPF while loop to achieve all 12 age and 6 occupation category totals are within 0.01% of the total number of people specified per age/occ category.
The adjusted (aged) PUMS data for the model area provides the relationship needed to grow the future population controls in a way that is consistent with the demographic area for the region. The following summarizes how each of the population synthesizer controls for the ABM were impacted by using the aged PUMS weights.
As is discussed in the introduction the population total by age is the one "known" input to the future population synthesis. The total number of households is informed by the population. It's an outcome of the population input. After using the PUMS aged weights, the 2045 SOABM average household size was determined to be 2.2 (345,220/156,752) across the three PUMAs covered by SOABM (which were converted to the original two PUMA system to accommodate UGB level controls). This was a decrease from the 2010 model region average household size of 2.46 (262,899/106,873) and the 2017 regional average household size of 2.32 (267,852/115,275). For SOABM, 345,220 persons were generated for 2045, with a corresponding 157,600 households. This total number of households was then distributed across the MAZs (as informed by land use allocation processes and consolation with local partners). Then the MAZ totals were used as aged PUMS data was used to proportion the total future households across the various household level demographics at the MAZ and TAZ level. As is discussed in the following sections.
It should be further noted that the existing year demographic trends by MAZ and TAZ are used as the starting point. The region wide trends from the aged PUMS analysis is then used (along with the future year total households by MAZ) to just the existing year demographic trends into the future. The adjustments are made consistently across all zones in the region.
After the aged PUMS data trends were applied the overall housing type by MAZ shifted as follows:
SF_HH | DUPLEX_HH | MF_HH | MH_HH | |
---|---|---|---|---|
Base (2010) | 73% | 3% | 14% | 9% |
Current (2017) | 69% | 4% | 15% | 13% |
Future (2045) | 62% | 5% | 17% | 15% |
With a shift away from single family to other types, specifically mobile home.
HHINC1 | HHINC2 | HHINC3 | HHINC4 | HHINC5 | HHINC6 | HHINC7 | HHINC8 | |
---|---|---|---|---|---|---|---|---|
Base (2010) | 15% | 13% | 14% | 16% | 19% | 11% | 8% | 4% |
Current (2017) | 15% | 13% | 14% | 16% | 19% | 11% | 8% | 4% |
Future (2045) | 17% | 14% | 13% | 16% | 18% | 9% | 8% | 5% |
This table shows that the aged PUMS data shifts overall income slightly lower.
HHWCHILD | HHWOCHILD | |
---|---|---|
Base (2010) | 29% | 71% |
Current (2017) | 29% | 71% |
Future (2045) | 23% | 77% |
This table shows that the aged PUMS data shifts overall to more households without children.
HHWORK0 | HHWORK1 | HHWORK2 | HHWORK3 | |
---|---|---|---|---|
Base (2010) | 37% | 35% | 25% | 3% |
Current (2017) | 37% | 35% | 25% | 3% |
Future (2045) | 41% | 32% | 23% | 4% |
This table shows that the aged PUMS data shifts to less workers overall with a large shift to zero worker households. This represents an overall slight shift (reduction) to about 41% have the total population having a full time occupation (~143,000 workers). The 2017 year represented approximately 42% of the population having a full time job.
If the population is only aged (without an occupation control), the overall number of workers decreases. Additionally, the shifts from the above demographics all follow the same trend but are slightly more pronounced. In the configuration described on this page, both age and occupation are controlled for. What that results in is the older part of the population that is still working getting higher weights. As a result the table below is not an artifact of the "aging" process, like the tables above. Instead it is just a reflection of the controls provided. However, they are still presented here to show how the occupational make up for the region is assumed to shift over time; which is not very much, but there are slight differences (Management-OCCP1 and Blue Collar-OCCP3 slightly increase while White Collar-OCCP2 slightly decreases).
OCCP1 | OCCP2 | OCCP3 | OCCP4 | OCCP5 | OCCP6 | |
---|---|---|---|---|---|---|
Base (2010) | 23% | 19% | 12% | 25% | 9% | 12% |
Current (2017) | 23% | 19% | 12% | 25% | 9% | 12% |
Future (2045) | 24% | 17% | 13% | 25% | 9% | 12% |
As part of the review of the results, the extent to which the PUMS weights were adjusted was also reviewed. The following figure has a black line depicting the unaltered PUMS weight for each Southern Oregon (PUMAs 901, 902, and 800, noting 901 and 902 were combined to the 900 PUMA) PUMS household record used ordered from smallest (left) to largest (right). The red scatter dots around the black line show how each PUMS household weight was altered by the IPF in order to achieve the 2017 age distribution for the model area. The blue does represent the further adjustment needed to achieve the 2045 age/occupation distribution using the existing PUMS data. One can see that there are two distinct upper adjustment lines in the blue dots. These are likely older households that get used more often via the weights to achieve the older age distribution and then again older population households that still have members of the household in the workforce. The larger population (the majority population) is shifted down, but not as much, as the weight adjustment is distributed over a larger record (population) sample.
An important note for both of the following plots - The weights have been normalized so that each set of weights produces the same number of total households. For an apples-to-apples comparison all the weights were normalized the the original household total of the three PUMA region so that all the weights would sum to the same answer, and no overall skewing to a higher or lower total population is therefore present in the comparisons.
The second plot below shows how the percent change is established when adjusting the weights to 2017 and is then increased when adjusting the weights to 2045. The 2017 percent change is within a 100% change, noting that there seems to be a lack of records with very little change (few households clustered around 0% change). On the positive side that shows that all households are being used in the adjustment to achieve the age and occupation controls. On the downside it would be preferred if the 5% PUMS sample didn't need to be adjusted to the level show in the red bars just to achieve the existing year inputs. When pushing the PUMS weights to 2045, the older households are pushed to change by as much as 200%. The younger households are decreased by up-to ~100% to compensate for the aging.
Overall the adjustments to the PUMS weights seem reasonable given that the weights are being adjusted on two dimensions to both be aged while also assuming that relative percentage of persons in the workforce was relatively unchanged (only a 1% reduction for the region, 42% to 41%).
- Getting Started
- RunModel bat file
- Networks and Zone Data
- Auto Network Coding
- VDF Definition
- Transit Network Coding
- Non-motorized Network Coding
- Editing Land Use Data
- Running the Population Synthesizer
- Input Checker
- Analyzing Model Outputs
- Commercial Vehicle Model
- External Model
- Model Cost Inputs
- Value of Time
- Person Type Coding Logic
- MSA Feedback
- VMT Computation
- Shadow Pricing Mechanism
- Methodology for Developing TAZ Boundaries
- Methodology for Developing MAZ Boundaries
- Methodology for Developing TAPS
- Source of Land-Use Inputs
- Major University Model (Optional)
- Running Transit Everywhere Scenario
- Building the ABM Java Program
- Debugging ABM Python Code
- ABM Cleaning Protocol
- Updating to New Visum
- Troubleshooting and Debugging