Skip to content

Project Meeting 2018.11.16

Ben Stabler edited this page Nov 16, 2018 · 14 revisions

Multiprocessing

  • Work to date
    • Lots of improvements for debugging/tracing/logging/etc. while we continue to optimize/understand runtime performance
    • Can now run in separate instances (on separate machines if desired) and coalesce into one pipeline file
    • New stride run option to run partitions - for example, slice households into 5 samples and run the first
    • Run 1/5th once with mp runs in 52 minutes, and uses 1/5th of the CPUs and 1/5th of the RAM
    • Doesn't scale since two strides at once, 72 minutes
    • But could run 5 simultaneous for a complete run in 52 minutes (but would need to write a distributed management setup and runner, which is not scoped)
    • Good article about pandas memory usage issues
  • Next steps
    • Maybe try different low level C shared code like open blas instead of MKL
    • numpy is doing some memory management outside of Python garbage collection which is suspicious
    • Working on running a few big cloud-based runs using our Azure DevOps account
    • Next try two strides at once on Linux since may behave differently
    • Maybe try single server but separate virtual machines
  • Wrapping up
    • Current best 100% sample run is 130 minutes with 20 processors on a single server
    • Existing TM1 runs at MTC are ~4 to 5 hours for a 50% sample
    • Working toward a deployment recommendations memo based on our findings
    • We're really targeting two setups - a single server at an agency and/or cloud-based
    • Plan to wrap up the task by the end of next week
Clone this wiki locally