Project Meeting 2020.10.27

Technical Call

Continue discussion on TVPB implementation
- Bill and Joel talked and the memory needs for TVPB are vectors for maz-tap and matrices for tap-tap and so the memory implications are less than discussed last week
- A primary motivation for TVPB is to use the more refined spatial system for better modeling of non-motorized travel
- MTC has a liberal maz-tap ratio that has not really been optimized on the network side
- So max walk of 3 miles is generous but trimmed to 1.2 miles in ct-ramp
- MTC has the same TVPB code as SANDAG
- Important to include the tapLines file, which lists lines served by TAP in order to prune the maz-tap possibilities list. Taps further away without new service are dropped from consideration
- TM2 transit network is pretty disaggregate since it was based on GTFS and therefore has route variations (skipped stops) and so lots of routes are retained
- The transit routes are currently being rebuilt to make the routes more planning like (i.e. abstract)
- For efficiency, maz-tap and tap-tap utilities are calculated just once and on-demand (and could be pre-calculated)
- Code then loops across possible access and egress tap pairs, adds the already calculated path utility components, ranks the tap-pairs, and selects the best N
- If transit selected, then makes a choice from the best N
- It uses generic utilities for ranking and then re-calcs person specific utilities in mode choice for just the N best
- We might have a formal write-up on the design from when we spec'd this out with Dave
- Follow-up - here's the TM2 design papers and I don't see a really useful doc
- In terms of pruning the possible paths, the tapLines idea and also skipping tap-tap pairs with IVT==0 can be done
- The pre-exponential of utilities was done in the old SANDAG version but not in the current version for TM2, SANDAG, CMAP since we introduced the person specific calcaluations
- If we pre-define market segments (which by the way we have done for asim) then we could exponentiate
- What we need is a speedy ranking procedure, good logic to skip non-relevant tap pairs, and pre-calculating utility components
- Doyle's understanding of the problem is inline with Joel's
- We have implemented the tapLines functionality
- There is no current max walk distance, but we could add this as a settings (say 1.2 miles like Marin)
- We're not currently caching maz-tap calculations since they are very small and already super fast
- But if the maz-tap calculations get more complex, then we may want to cache
- tap-tap utility is being computed on demand and saved to a cache for future use
- Asim code is working in chunks and so we need to de-duplicate calculations within chunks; this is now working
- You can also retain your cache for a later subsequent run if desired
- We're not doing person specific utilities, just using pre-defined market (demographic) segments, as spec'd
- We've not implemented the optimization of skipping tap pairs with no IVT
- CT-RAMP had a UEC feature where it skipped the 2+ alternatives if an expression NA'd alt 1 and the expression applies to all alts
- Maybe we add a tap-tap utility filter expression file; like a constraint matrix in EMME
- This would be a good generic improvement that applies to all activitysim expression solving
- TM2 has a pre-processor to turn off duplicate tap pairs across skim sets as well
- Testing on both Marin and SF county since they have different maz-tap ratios
- Currently implemented optimizations with runtimes
  - Test example - 18 minutes
  - plus Remove redundant calcs within chunks - 7 minutes
  - plus tap-tap caching - 4 minutes
  - plus tap line pruning - 45 seconds
  - All together - 23 secs
- Arrow/feather being used for the caching
- Saves a lot of memory and runs super fast
- The question now is how to implement dynamic and growing cache across multiprocesses
- Memory requirements / performance / synchronization across processes and the need to avoid blockages
- Can store tables in memmapped way on disk to free up RAM
- May work for other shared data - skims and shadow prices
- Basically use arrow for shared memory objects
- Arrows will likely be the replacement for pandas backend
- pandas uses numpy as a backend today
- Here's the key article from the pandas created that spawn arrow; thanks Stefan
- arrow includes things like native support for null values, better support for columns of different types, etc; its basically a better pandas backend
- arrow in-memory like pandas, for tables, but no helper functions
- feather is the file format and is super smart and uses memory mapped
- arrow is 1D arrays so to wrap with numpy you reshape on-demand
- This would add new dependencies to activitysim, which we need to be mindful of
- Next steps
  - ct-ramp behavior has been replicated
  - figure out how to multiprocess and share / update data
  - maybe arrow/feather
  - maybe replace string operations with factors for more efficient data storage in numpy
    - factors not supported in pandas hdf5 storage so would need to wrap I/O currently
    - need to know universe of factors when creating
    - factors have better support in arrow
- We're at the point where we're trying the few possible good ideas based on the abstract architecture design
- Jeff Newman says using arrow for skims really works
- He treated them as a column and reshaped with I/O
- Can't compress on disk so files same size as in memory
- Here's Newman's prototype; thanks Jeff
- Going with an on-demand approach since geographic organization doesn't really work
- TM2 disaggregate accessibilities will eventually use this code as well
- Basic idea - create a small carefully controlled synpop that covers the markets, run the models to get the destination choice logsums, and then use these instead of the aggregate accessibilities
- This is beyond this exercise, but its a good idea since it means consistent mode choice models for accessibility and actual mode choice, and planned for semcog
- Jeff to soon share example with me so I can start comparing results to Marin TM2 and Jeff can continue with performance tuning
Discuss CDAP larch integration progress
- EDB larch reader working and notebook drafted
- Doyle to update the cdap coefficient files and code so we have named coefficients as opposed to just values
- Do this after TVPB is in a good place
- Will do our best to transform duplicate values into one coefficient so estimation is more stable
- Joel share an example of the CDAP model since its complicated; thanks Joel for sharing the slides via email
- Still need to write out the updated coefficient file; waiting on Doyle to update the format and then will implement
- Now turn to the non-mandatory tour frequency model, which is the only model that implements the interaction_simulate EDB
Discuss ARC progress, questions, etc.
- Everything stood-up, including new trip scheduling choice submodel, trip departure choice submodel, and cbd parking location submodel
- ARC model is running from start to finish!
- The trip departure choice model is very slow at this point, still working on performance, it builds many alternatives
- Need to create tests cases for all three models for contribution
- Have code and docs done
- PSRC's RAM issues were actually chunk size related
- ARC is running slower than expected; maybe due to chunk size?
- The adaptive chunker should help here; its in the multi-zone branch
Joel join next week to discuss telecommuting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Meeting 2020.10.27

Technical Call

ActivitySim

Clone this wiki locally