Skip to content

Adhering To Population Estimates

Alex Bettinardi edited this page Sep 19, 2019 · 12 revisions

Introduction

In the state of Oregon OAR Chapter 660 Division 32, Population Forecasts, identifies that the Population Research Center's population forecasts (out of Portland State University) need to be used in most transportation planning related efforts. Therefore an important part of developing the ABM's synthetic population input is ensuring that the synthetic population adhere to the PRC Population forecasts and estimates to the extent possible.

As is summarized on the population synthesizer page, the ABM's total synthetic population is made up of:

  • The General Population (GP),
  • The Group Quarters (GQ) Population, and
  • A Visitor Population.

As is document on PRC's page:

"PSU’s Population Research Center produces annual population estimates for Oregon and its counties and incorporated cities using the most recent available data. These estimates are based on fluctuations in the numbers of housing units, persons residing in group quarter facilities, births and deaths, students enrolled in public school, persons employed, Medicare enrollees, State and Federal tax exemptions, Oregon driver license holders as well as counts in other administrative data that are symptomatic of population change."

Therefore, the PRC population totals by jurisdiction include both the general population and group quarters, but not the visitor population. This definition is important. GP and GQ are developed separately, but work must be done to ensure that their population totals by region align with the PRC values. To do this the following work flow for developing a complete synthetic population input for the ABM has been specified.

Population Synthesis Development Work Flow

Here is the quick overview work flow for developing the ABM's synthetic population:

  1. Develop the Population controls (from PRC) by jurisdiction by gender by age and PUMA.
  2. Generate the Group Quarters Population
  3. Create a new set of General Population controls where GQ totals have been removed
  4. Generate the General Population process
  5. Verify that the population controls were met within a set tolerance (currently ODOT has specified that the GP output must be within 5% of the population totals by jurisdiction)
  6. Generate the Visitor Population based on General Population inputs (can be done out of order)
  7. Combine GQ, GP, and visitor populations for the complete input to the ABM.

Each of these steps is described in greater detail below.

Develop Population Controls

There are several controls related to population that are used in generating the synthetic population for the ABM. However, OAR 660 only specifies that published population forecasts by jurisdiction need to be used. Since the PopulationSim tool is designed to take a variety of control information, and PRC provides county level projections of population by age and gender, which are two important aspects to creating a realistic synthetic population for the region. Therefore the guidance captured here, is that population by jurisdiction, age, and gender should be used in developing the controls related to population. This is done at two levels of geography. At the UGB level, PRC provides UGB specific population numbers by 5-year intervals (which can be interpolated as needed). These are the controls which are required to be met. In testing with PopulationSim, ODOT determined that it was not very effective to simply control on the number of people alone; the tool reacts better to specific types of people (fields in the dataset, besides just the count of records). So gender information (male / female splits available at a county level) were applied to split the total population by UGB into Male and Female population by UGB, resulting in a UGB level control similar to the example below.

UGB_NAME UGB MALE FEMALE
Ashland 1 11478 12139
Central Point 2 13876 14677
Eagle Point 3 7083 7492
Gold Hill 4 677 715
Jacksonville 5 2095 2216
Medford 6 53921 57029
Phoenix 7 2946 3117
Rogue River 8 1711 1810
Talent 10 4156 4395
Jackson Co 11 29258 30944
Grants Pass 12 26390 27438
Josephine Co 13 14540 15118

At the county level PRC also provides population by age by forecast year. This information is used to tabulate persons by PUMA by Age group as another population control. These two population controls (at the PUMA and UGB) level make up the population controls provided to the PopulationSim tool for the ABM.

PUMA AGE1 AGE2 AGE3 AGE4 AGE5 AGE6 AGE7 AGE8 AGE9 AGE10 AGE11 AGE12
800 4515 5898 2609 1699 4993 7133 8137 10721 12702 11257 8640 5181
901 8239 10383 4573 3051 10074 14268 14875 18328 19268 16385 12328 7731
902 7219 9097 4006 2674 8826 12502 13033 16059 16883 14357 10802 6774

Generate Group Quarters Population

The population controls above are provided by PRC as a total permanent population including both group quarters (non-institutionalize) and the general population. The controls do not apply to either GQ or GP individually, and it is very difficult accounting exercise and not advised to run GQ and GP together. Therefore, the next step after generating the controls is to generate the group quarters synthetic population record. Which is basically just a listing of persons by MAZ for the region. Given the limited number of controls, these population estimates are achieved nearly perfectly, the output is then used to inform the next step of revising the population controls for the general population PopulationSim run.

Create Revised Population Controls

With a solution from the Group Quarters run, the exact number of GQ persons by UGB and gender, along with PUMA and age can be easily tabulated. This tabulation can then be subtracted from the overall population controls developed from PRC to create new control totals for just the general population. See example r script below.

# script to tabulate GQ output and subract it from total population controls for the GP run

# read GQ output
gqPer <- read.csv("PopSim_GQ_SOABM_2017/output/synthetic_persons.csv",as.is=T)

# read CW (to UGB)
cw <- read.csv("geo_cross_walk.csv",row.names=1,as.is=T)

# add UGB to gq
gqPer$UGB <- cw[as.character(gqPer$MAZ),"UGB_NAME"]

# tabulate ugb totals
ugbGQ <- table(gqPer$UGB, c("MALE","FEMALE")[gqPer$SEX])

# tabulate puma totals
pumaGQ <- table(gqPer$PUMA,cut(gqPer$AGEP,breaks=c(-1,5,12,15,17,24,34,44,54,64,74,84,99999),labels=paste0("AGE",1:12)))

# read in the Full Controls
ugb <- read.csv("ugbData.csv",as.is=T)
puma <- read.csv("metaData.csv",as.is=T)

# write out updated controls for just the general population (minus the GQ)
ugbOut <- ugb
ugbOut[ugb$UGB_NAME %in% rownames(ugbGQ),c("MALE","FEMALE")] <- ugb[ugb$UGB_NAME %in% rownames(ugbGQ),c("MALE","FEMALE")] - ugbGQ[ugb[ugb$UGB_NAME %in% rownames(ugbGQ),"UGB_NAME"],c("MALE","FEMALE")]
write.csv(ugbOut, "Populationsim_SOABM_GP_UGB/data/ugbData.csv",row.names=F) 

pumaOut <- puma
rownames(pumaOut) <- puma$PUMA
pumaOut[rownames(pumaGQ), colnames(pumaGQ)] <- pumaOut[rownames(pumaGQ), colnames(pumaGQ)] - pumaGQ
write.csv(pumaOut, "Populationsim_SOABM_GP_UGB/data/metaData.csv",row.names=F) 

Generate the General Population

With population controls for just the general population, the general population Population Synthesis can now be run. In ODOT's experience the household controls at the MAZ and TAZ level are developed without including GQ, so there should be not reason to revisit those. But it is important to verify that the household controls for the general population were indeed developed without including the group quarter units / population.

Verify Population Controls have been Met

After the general population synthesis is complete, the resulting synthetic population output for either the total population (GQ + GP), or just the GP portion (since the GQ population was removed in an assumed error proof way through code) needs to be compared against the PRC UGB controls. ODOT has specified that the PopulationSim output must be within 5% of the total populataion control at the UGB level. The following is example code that can be used to verify that this has been met.

# First read persons output
per <- read.csv("output/synthetic_persons.csv",as.is=T)

# read the ugb totals that need to be achieved
ugb <- read.csv("data/ugbData.csv",as.is=T,row.names=1)
# Add total population column
ugb$Total <- ugb$MALE + ugb$FEMALE

# read in crosswalk table
cw <- read.csv("data/geo_cross_walk.csv",as.is=T,row.names=1)

# add UGB field to the persons record
per$UGB <- cw[as.character(per$MAZ),"UGB_NAME"]
perUGB <-table(per$UGB)

# plot result
png("PRC_UGB_Compare.png")

par(mar=c(7,4,4,2)+0.1)
plot(ugb$Total, main="Comparison of PRC UGB Population Controls vs Results", ylab="Total Population", xlab="",axes=F, ylim=range(ugb$Total)+(c(-.1,.1)*max(ugb$Total)))
box()
axis(1,1:nrow(ugb),rownames(ugb), las=2)
axis(2)
arrows(1:nrow(ugb),ugb$Total,1:nrow(ugb),perUGB[rownames(ugb)], col="red",angle=90, length=nrow(ugb)/100)
text(1:nrow(ugb),((perUGB[rownames(ugb)]+ugb$Total)/2)+ifelse(abs(perUGB[rownames(ugb)]-ugb$Total)<5000,5000,0),round(perUGB[rownames(ugb)]-ugb$Total))
text(1:nrow(ugb),((perUGB[rownames(ugb)]+ugb$Total)/2)-5000,paste0(round(100*(perUGB[rownames(ugb)]-ugb$Total)/ugb$Total),"%"))
dev.off()

# analytics if there are issues 
# comparing assumed UGB average household size by UGB from the population control versus the average household size from the TAZ level control.

hhUGB <-tapply(per$household_id,per$UGB, function(x) length(unique(x)))
perUGB/hhUGB

# read in taz data
taz <- read.csv("data/tazData.csv",as.is=T,row.names=1)
tazUGBcw <- tapply(cw$UGB_NAME, cw$TAZ, unique)
Check <- unlist(lapply(tazUGBcw,length))
if(length(Check[Check>1])) print(paste("The following TAZs have more than one UGB", paste(Check[Check>1], collapse=", ")))
taz$UGB <- tazUGBcw[rownames(taz)]

HHs <- tapply(rowSums(taz[,paste0("HHSIZE",1:4)]),taz$UGB,sum)

ugb$HHs <- HHs[rownames(ugb)]
ugb$avgHHsize <-  ugb$Total/ugb$HHs 

HHs <- tapply(rowSums(sweep(taz[,paste0("HHSIZE",1:4)],2,c(1:3,4.5),"*")),taz$UGB,sum)/HHs

ugb$TAZinputAvgHHsize <- HHs[rownames(ugb)]

write.csv(ugb,"Ugb_HHsize_Analysis.csv")

Generate the Visitor Population

There is a detailed process to create the visitor population. That process likely builds off of the number of households by MAZ. If it does (check the documentation above to be sure), and the number of general population households was altered in the previous step, then the visitor population would need to be re-run here using the new household information from the general population. If either the visitor population isn't impacted by total households at the MAZ level, or MAZ level households have not been impacted, than the creation of the visitor population can be completed at a previous step. And assuming that it has been created, the modeler can move onto the final step.

Combine GQ, GP, and Visitor Populations

As a final step, the Group Quarters, General Population, and Visitor Populations need to be added into a single table to input into the ABM. The following script is an example of what that combining process might look like, along with post-processing steps that might be needed to get the population synthetic output to align with the synthetic population fields and definitions that the ABM assumes

# A script to combine the GQ population with the General Population to create
# a complete Syn Pop input for the ABM.

# set general population directory
gpDir <- "Populationsim_SOABM_GP_UGB/output"

# set group quarters directory
gqDir <- "PopSim_GQ_SOABM_2017/output"

# set combined output directory
outDir <- "Populationsim_SOABM_2045_Complete"

# set visitor directory
visDir <- "VistorModel"

####################
# Household table
####################

# read in household tables
hh <- read.csv(paste0(gpDir,"/synthetic_households.csv"),as.is=T)
hhgq <- read.csv(paste0(gqDir,"/synthetic_households.csv"),as.is=T)

# add required fields
if(is.null(hhgq$WGTP)) hhgq$WGTP <- hhgq$GQWGTP 
hhgq$TAZ <- as.integer(as.numeric(substring(hhgq$MAZ,1,nchar(hhgq$MAZ)-2)))

hh$GQFLAG <- 0
hh$GQTYPE<- 0

# define required fields
hhCols <-        c("PUMA","taz","maz", "WGTP","serialno","gqflag","gqtype","htype","nwrkrs_esr","hhincadj","hhchild","np","hincp","ten","bld","adjinc","veh","hht","type","npf","hupac","hhid")
names(hhCols) <- c("PUMA","TAZ","MAZ", "WGTP","SERIALNO","GQFLAG","GQTYPE","HTYPE","NWESR",     "HHINCADJ","hhchild","NP","HINCP","TEN","BLD","ADJINC","VEH","HHT","TYPE","NPF","HUPAC","hhid")

# update hhid field
rownames(hh) <- hh$household_id
rownames(hhgq) <- hhgq$household_id
hh$hhid <- 1:nrow(hh)
hhgq$hhid <- (1:nrow(hhgq))+nrow(hh)


###########################
# Persons table
###########################

# read in persons tables
per <- read.csv(paste0(gpDir,"/synthetic_persons.csv"),as.is=T)
pergq <- read.csv(paste0(gqDir,"/synthetic_persons.csv"),as.is=T)

# add required fields
if(is.null(pergq$WGTP)) pergq$wgtp <- pergq$gqwgtp
pergq$TAZ <- as.integer(as.numeric(substring(pergq$MAZ,1,nchar(pergq$MAZ)-2)))
if(is.null(pergq$GQTYPE)) pergq$GQTYPE <- hhgq[as.character(pergq$household_id),"GQTYPE"]
pergq$majoruni <-0
pergq$hhid <-  pergq$hhid <- hhgq[as.character(pergq$household_id),"hhid"]

# convert SCHG to ABM specified definitions:
# https://github.com/RSGInc/SOABM/wiki/Running-the-Population-Synthesizer#person-file-format
schg <- c(-8,1,2,rep(3,4),rep(4,4),rep(5,4),6,7)
names(schg) <- c(-8, 1:16)
pergq$SCHG <- schg[as.character(pergq$SCHG)]

per$GQFLAG <- 0
per$GQTYPE<- 0
per$majoruni <-0
per$hhid <-  per$hhid <- hh[as.character(per$household_id),"hhid"]

# define required fields
perCols <-        c("PUMA","taz","maz", "WGTP","serialno","sporder","employed","soc","occp","gqflag","gqtype","agep","sex","wkhp","esr","schg","wkw","mil","schl","majoruni","hhid")
names(perCols) <- c("PUMA","TAZ","MAZ", "wgtp","SERIALNO","per_num","employed","soc","OCCP","GQFLAG","GQTYPE","AGEP","SEX","WKHP","ESR","SCHG","WKW","MIL","SCHL","majoruni","hhid")

######################################
# combine Person and Household tables
######################################

# combine general popualtion and group quarter households
hh <- rbind(hh[,names(hhCols)],hhgq[,names(hhCols)])
names(hh) <- hhCols

# combine general popualtion and group quarter persons
per <- rbind(per[,names(perCols)],pergq[,names(perCols)])
names(per) <- perCols

# order person table
per <- per[order(per$hhid,per$sporder),]

# clean up NA fields
hh$hincp[is.na(hh$hincp)] <- -8
hh$ten[is.na(hh$ten)] <- -8
hh$bld[is.na(hh$bld)] <- -8
hh$veh[is.na(hh$veh)] <- -8
hh$hht[is.na(hh$hht)] <- -8
hh$npf[is.na(hh$npf)] <- -8
hh$hupac[is.na(hh$hupac)] <- -8

per$wkhp[is.na(per$wkhp)] <- -8
per$esr[is.na(per$esr)] <- -8
per$schg[is.na(per$schg)] <- -8
per$wkw[is.na(per$wkw)] <- -8
per$mil[is.na(per$mil)] <- -8
per$schl[is.na(per$schl)] <- -8

# Add a visitor zero tag
hh$visitor_flag <- 0
per$visitor_flag <- 0

# write.out hh and person table
write.csv(hh,paste0(outDir,"/households_residents.csv"), row.names=F)
write.csv(per,paste0(outDir,"/persons_sorted_uni_residents.csv"),row.names=F)

                           
###########################################
# Add in Visitors, renumber and writeout
###########################################

# read in household tables
hhVis <- read.csv(paste0(visDir,"/traveler_households.csv"),as.is=T)
rownames(hhVis) <- hhVis$hhid

# read in persons tables
perVis <- read.csv(paste0(visDir,"/traveler_persons.csv"),as.is=T)

# Add a visitor zero tag
hhVis$visitor_flag <- 1
perVis$visitor_flag <- 1

# update hhid field to start at the end (bottom) of the GP+GQ household table
hhVis$hhid <- (1:nrow(hhVis))+nrow(hh)

# move revised household number over
perVis$hhid <-  hhVis[as.character(perVis$hhid),"hhid"]

# update schg field
perVis$schg <- schg[as.character(perVis$schg)]
rm(schg)

# combine general popualtion and group quarter households with visitors or traveler households
hh <- rbind(hh,hhVis[,names(hh)])

# combine general popualtion and group quarter persons with visitors or traveler persons
per <- rbind(per,perVis[,names(per)])

# order person table
per <- per[order(per$hhid,per$sporder),]

# adding auto opperating cost varriables to the household table 6-11-19 AB
hh$fuelcost <- 12.4 # in 2010 cents per mile
hh$maintaincost <- 5.6 # in 2010 cents per mile

# convert SCHL to ABM specified definitions:
# https://github.com/RSGInc/SOABM/wiki/Running-the-Population-Synthesizer#person-file-format
schl <- c(-8,1,rep(2,6),rep(3,2),rep(4,2),5:8,rep(9,2),10:16)
names(schl) <- c(-8, 1:24)
per$schl <- schl[as.character(per$schl)]
rm(schl)

# Trim TAZ field string if needed (for UGB crosswalk issues) 7-23-19 AB
flag <- nchar(hh$taz)
if(any(flag>4)) hh[flag>4,"taz"] <- substring(hh[flag>4,"taz"],1,flag[flag>4]-2)
rm(flag)

# write.out hh and person table
write.csv(hh,paste0(outDir,"/households.csv"), row.names=F)
write.csv(per,paste0(outDir,"/persons_sorted_uni.csv"),row.names=F)
Clone this wiki locally