Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime overhead seen with boost units #281

Open
louis-langholtz opened this issue Jan 20, 2018 · 2 comments
Open

Runtime overhead seen with boost units #281

louis-langholtz opened this issue Jan 20, 2018 · 2 comments
Assignees
Labels
Enhancement For suggestions or changes that enhance any part of the project and isn't a bug. Help Wanted For things that other people are encouraged to help with.

Comments

@louis-langholtz
Copy link
Owner

louis-langholtz commented Jan 20, 2018

Expected/Desired Behavior or Experience:

Using boost units adds no runtime overhead. I.e. benchmark times don't increase when boost units are enabled.

Actual Behavior:

Enabling the use of boost units (for strongly typed physical units) causes tests like the World.TilesComesToRest test to increase in runtime by some 10% or so.

A more detailed write-up is available from the Run-time overhead with boost.units question on StackOverflow.

An online conversation about this issue can be found in the Add constexpr support conversation. Some interesting responses were posted on Jul 2, 2017 and Jul 15, 2017.

Steps to Reproduce the Actual Behavior:

Build the library and its UnitTests and Benchmark dependents both with and with boost units support. For instructions for building with boost units support enabled, see the PhysicalUnits.md document.

@louis-langholtz louis-langholtz added Enhancement For suggestions or changes that enhance any part of the project and isn't a bug. Help Wanted For things that other people are encouraged to help with. labels Jan 20, 2018
@louis-langholtz
Copy link
Owner Author

louis-langholtz commented Jan 20, 2018

Some of the simpler benchmark tests that show the difference:

  • AabbTestOverlap/1000
  • AABB/1000
  • MaxSepBetweenRel4x4/10
  • ManifoldForTwoSquares1

Maybe an even more direct demonstration of the problem is this code.

louis-langholtz added a commit that referenced this issue Jan 21, 2018
Adds more benchmarks oriented toward solving the mystery of why boost units appears to be slowing down simulations.
@louis-langholtz
Copy link
Owner Author

With the previously mentioned merge (of PR #282), and the library built with boost units enabled, I see benchmark results like:

Run on (8 X 2600 MHz CPU s)
2018-01-21 12:32:56
---------------------------------------------------------------------------
Benchmark                                    Time           CPU Iterations
---------------------------------------------------------------------------
FloatAdd/1000                              607 ns        607 ns    1157197
FloatMul/1000                              597 ns        596 ns    1176886
FloatMulAdd/1000                           641 ns        641 ns    1056668
FloatDiv/1000                             2009 ns       2009 ns     353482
FloatSqrt/1000                            2059 ns       2057 ns     350398
FloatSin/1000                             6628 ns       6626 ns     110511
FloatCos/1000                             6696 ns       6695 ns     101064
FloatSinCos/1000                          6734 ns       6733 ns     103246
FloatAtan2/1000                           6843 ns       6840 ns     103228
FloatHypot/1000                           3958 ns       3958 ns     175468
FloatFma/1000                             3683 ns       3682 ns     184650
DoubleAdd/1000                             585 ns        585 ns    1194030
DoubleMul/1000                             574 ns        574 ns    1221619
DoubleMulAdd/1000                          642 ns        641 ns    1105077
DoubleDiv/1000                            4036 ns       4034 ns     174365
DoubleSqrt/1000                           4118 ns       4116 ns     173155
DoubleSin/1000                           11537 ns      11533 ns      60512
DoubleCos/1000                           11726 ns      11725 ns      59921
DoubleSinCos/1000                        13019 ns      13018 ns      52414
DoubleAtan2/1000                         25635 ns      25633 ns      27688
DoubleHypot/1000                          4881 ns       4879 ns     143379
DoubleFma/1000                           14443 ns      14441 ns      48534
AlmostEqual1/1000                         3134 ns       3134 ns     216157
AlmostEqual2/1000                         1255 ns       1255 ns     530395
AlmostEqual3/1000                         1199 ns       1199 ns     587179
DiffSignsViaSignbit/1000                   897 ns        897 ns     795644
DiffSignsViaMul/1000                       873 ns        873 ns     818484
ModuloViaTrunc/1000                       1999 ns       1997 ns     345993
ModuloViaFmod/1000                        6546 ns       6544 ns     108948
DotProduct/1000                            904 ns        904 ns     796504
CrossProduct/1000                          723 ns        723 ns     966397
LengthSquaredViaDotProduct/1000            870 ns        870 ns     819336
GetMagnitudeSquared/1000                   889 ns        888 ns     820652
GetMagnitude/1000                         1994 ns       1994 ns     348991
GetUnitVec1/1000                          4216 ns       4215 ns     164132
GetUnitVec2/1000                          4101 ns       4101 ns     169024
UnitVectorFromVector/1000                 4061 ns       4061 ns     174162
UnitVectorFromVectorAndBack/1000          4128 ns       4126 ns     168013
UnitVecFromAngle/1000                     9003 ns       9002 ns      79533
LessLength/1000                            671 ns        671 ns    1039964
LessFloat/1000                             679 ns        679 ns     947034
LessDouble/1000                            677 ns        677 ns     979761
LessEqualLength/1000                       625 ns        625 ns    1147202
LessEqualFloat/1000                        622 ns        622 ns    1127396
LessEqualDouble/1000                       628 ns        628 ns    1108156
LesserLength/1000                          927 ns        927 ns     776846
LesserFloat/1000                           579 ns        579 ns    1221726
LesserDouble/1000                          583 ns        583 ns    1200871
LesserEqualLength/1000                     881 ns        881 ns     822678
LesserEqualFloat/1000                     1148 ns       1148 ns     608495
LesserEqualDouble/1000                    1172 ns       1171 ns     604464
MinLength/1000                             933 ns        933 ns     774371
MinFloat/1000                             1799 ns       1799 ns     378995
MinDouble/1000                            1299 ns       1299 ns     519959
IntervalIsIntersecting/1000               5561 ns       5561 ns     127405
LengthIntervalIsIntersecting/1000         4809 ns       4809 ns     143815
AabbTestOverlap/1000                     10868 ns      10865 ns      61869
AabbContains/1000                         6167 ns       6167 ns     113541
AABB/1000                                11455 ns      11455 ns      58974
MaxSepBetweenRel4x4/10                     641 ns        641 ns    1123163
MaxSepBetweenRel4x4/100                   6216 ns       6216 ns     116463
MaxSepBetweenRel4x4/1000                 67592 ns      67558 ns      10540
MaxSepBetweenRel4x4/10000               749035 ns     748231 ns        964
MaxSepBetweenRelSquaresNoStop/10           697 ns        697 ns     957553
MaxSepBetweenRelSquaresNoStop/100         7443 ns       7443 ns      96106
MaxSepBetweenRelSquaresNoStop/1000       94483 ns      94477 ns       7210
MaxSepBetweenRelSquaresNoStop/10000     998937 ns     998652 ns        713
MaxSepBetweenRelSquares/10                 742 ns        741 ns     927472
MaxSepBetweenRelSquares/100               7729 ns       7729 ns      85201
MaxSepBetweenRelSquares/1000            100670 ns     100658 ns       7146
MaxSepBetweenRelSquares/10000          1020148 ns    1019823 ns        695
ConstructAndAssignVC                        39 ns         39 ns   17979098
SolveVC                                     43 ns         43 ns   16471828
ManifoldForTwoSquares1                     178 ns        178 ns    3941597
ManifoldForTwoSquares2                     180 ns        180 ns    3893583
AsyncFutureDeferred                        268 ns        268 ns    2575869
AsyncFutureAsync                         22004 ns      19657 ns      34682
ThreadCreateAndDestroy                   24553 ns      19026 ns      32921
MultiThreadQD                            10497 ns       5560 ns     142886
MultiThreadQDE                            9606 ns       5133 ns     100000
MultiThreadQDA                             156 ns        156 ns    4528986
MultiThreadQDAQ                          19661 ns      19651 ns      25750
WorldStep                                   63 ns         63 ns   11379893
WorldStepWithStatsStatic/0                  69 ns         69 ns   10339581
WorldStepWithStatsStatic/1                  74 ns         74 ns    9778858
WorldStepWithStatsStatic/10                101 ns        101 ns    6875620
WorldStepWithStatsStatic/100               338 ns        338 ns    2079811
WorldStepWithStatsStatic/1000             5657 ns       5655 ns     123265
WorldStepWithStatsStatic/10000          118882 ns     118877 ns       5914
DropDisks/0                                 63 ns         63 ns   10518881
DropDisks/1                                928 ns        928 ns     810767
DropDisks/10                              9619 ns       9616 ns      71049
DropDisks/100                           105524 ns     105469 ns       6616
DropDisks/1000                         1112393 ns    1111481 ns        649
DropDisks/10000                       11108205 ns   11107292 ns         65
TumblerAdd100SquaresPlus100Steps      51297440 ns   51293231 ns         13
TumblerAdd200SquaresPlus200Steps     233531778 ns  233512667 ns          3
AddPairStressTestPlayRho400/0          3168093 ns    3166944 ns        215
AddPairStressTestPlayRho400/10         1398341 ns    1397823 ns        504
AddPairStressTestPlayRho400/15         1563222 ns    1562324 ns        451
AddPairStressTestPlayRho400/16        22040693 ns   22038387 ns         31
AddPairStressTestPlayRho400/17        52010058 ns   51975357 ns         14
AddPairStressTestPlayRho400/18        81004567 ns   80961429 ns          7
AddPairStressTestPlayRho400/19        28283196 ns   28280960 ns         25
AddPairStressTestPlayRho400/20        22965133 ns   22963033 ns         30
AddPairStressTestPlayRho400/30         7675704 ns    7674924 ns         92
TilesRestPlayRho/12                   30269734 ns   30268957 ns         23
TilesRestPlayRho/20                  161508536 ns  161498750 ns          4
TilesRestPlayRho/36                 1965060073 ns 1964324000 ns          1
Program ended with exit code: 0

The particularly interesting part of these results is:

LessLength/1000                            671 ns        671 ns    1039964
LessFloat/1000                             679 ns        679 ns     947034
LessDouble/1000                            677 ns        677 ns     979761
LessEqualLength/1000                       625 ns        625 ns    1147202
LessEqualFloat/1000                        622 ns        622 ns    1127396
LessEqualDouble/1000                       628 ns        628 ns    1108156
LesserLength/1000                          927 ns        927 ns     776846
LesserFloat/1000                           579 ns        579 ns    1221726
LesserDouble/1000                          583 ns        583 ns    1200871
LesserEqualLength/1000                     881 ns        881 ns     822678
LesserEqualFloat/1000                     1148 ns       1148 ns     608495
LesserEqualDouble/1000                    1172 ns       1171 ns     604464
MinLength/1000                             933 ns        933 ns     774371
MinFloat/1000                             1799 ns       1799 ns     378995
MinDouble/1000                            1299 ns       1299 ns     519959

All here looks well up till LesserLength/1000.

What's most bizarre for me is what happens timing wise with the LesserEqual* and Min* tests. It makes no sense to me that these Length tests would actually be faster but seemingly that's what this output shows. How's that possible? In part this contradicts the larger scoped benchmark tests like TilesComesToRest unless this library is dominated by Lesser like operations (over LesserEqual and Min like operations).

@louis-langholtz louis-langholtz self-assigned this Jan 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For suggestions or changes that enhance any part of the project and isn't a bug. Help Wanted For things that other people are encouraged to help with.
Projects
None yet
Development

No branches or pull requests

1 participant