Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick implementation of "is_nearly_double" #246

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

sampotter
Copy link

Here's a quick implementation of is_nearly_double for issue #245. Some things that jump out at me:

  1. The function assert_that_double_ is hard-coded assuming that the user will have specified a relative error in terms of significant figures. To emit a more helpful message on test failure when "nearly" is being used, I switched on the constraint name. This seems like a bad way to go. I figured I would get @thoni56's input on how best to handle this.
  2. I didn't use the existing tools for double precision testing. It would be good for this new functionality to be backwards compatible so that existing double tests don't break. There already appears to be a relative tolerance (in the form of significant_figures) and absolute_tolerance (I wasn't sure how the default value of DBL_MIN/1e-8 was chosen).
  3. I think it should be OK to replace significant_figures and absolute_tolerance with double_relative_tolerance and double_absolute_tolerance: when setting the number of significant figures, just set double_relative_tolerance = pow(10, 1 - significant_figures). This is handy, too, because then there's no need to call log10 when calling accuracy.

Anyway, let me know what you think of what I've thrown together so far and if I have the right idea. I don't know much about the design of Cgreen, so please feel free to clue me in. :-)

@thoni56
Copy link
Contributor

thoni56 commented Dec 18, 2020

I like the "nearly" twist in the naming! Following the "natural reading" principle of Cgreen constraints should probably be is_nearly_equal_to_double(), though.

  1. Agree we should have as nice messages as possible, so I'll have to play around with it a bit. Maybe we need to split that to two explicit texts...
  2. No sure about this. The double-handling is mostly the work of @d-meiser, perhaps he remembers something from 2017 ;-) Or has any other comments on this?
  3. That sounds like a good idea. You sound hesitant? How can we make sure?

@sampotter
Copy link
Author

sampotter commented Dec 18, 2020

I can't say I'm familiar with what "natural reading" is. If the idea is that you should be able to read an assert fluently from left to right, like in an English sentence, then assert_that_double(x, is_nearly_double(y)) seems fine to me: "x is nearly y" is a simpler and more fluent way of saying "x is nearly equal to y". (This is all IMO, of course.)

  1. This seemed fishy to me, so I looked into it. Generally, it doesn't make sense to use tolerances smaller than machine epsilon for a given floating point type (~2.2e-16 for double), and DBL_MIN/1e-8 is actually ~2.2e-300. It would make more sense to set absolute_tolerance initially to something like machine epsilon or a bit larger. E.g., choosing absolute_tolerance = 1e-15 is already quite stringent. The test abs_diff < 2.2e-300 actually behaves very similarly to abs_diff <= 0 (or abs_diff == 0). The idea with IEEE754 is to have intermediate operations round to the nearest correct value. Since 2.2e-300 is much smaller than machine epsilon, it will very rarely be the case that the expected rounding error due to a sequence of run-of-the-mill floating point operations is smaller than 2.2e-300 (indeed, it will probably be larger than machine epsilon!).

  2. I think it's the right way to go. I just don't want to step on your toes, since I haven't contributed to Cgreen before. :-)

@thoni56
Copy link
Contributor

thoni56 commented Dec 19, 2020

Thanks @sampotter, for the comments.

The example you gave was exactly what I did mean, but I'm not a native English speaker so my language sense might be off when it comes to "natural reading", so your comment seems reasonable.

  1. Thanks for that write up, would probably be worth adding to the documentation about double. I'm not sure I understand the implications, though. Do you have a concrete suggestion? Or is that "to set absolute_tolerance initially to something like machine epsilon or a bit larger"?
  2. I agree with that, so no problem. No toes where hurt ;-)

@sampotter
Copy link
Author

sampotter commented Dec 20, 2020

  1. The implication is that by setting absolute_tolerance to something incredibly tiny like 2.2e-300, the effect for most doubles that aren't also extremely tiny is going to be the same as asserting that two doubles are exactly equal. My concrete suggestion would be to roll the existing absolute_tolerance and my new double_absolute_tolerance parameter into one, using the name double_absolute_tolerance, with a default value of 1e-13. IIRC, the default absolute tolerances for other unit testing libraries are larger (I think Google Test uses 1e-12, but it's been a while since I've looked).

Edit: Want to make sure what I'm trying to say about the current value of absolute_tolerance is easy to understand. A simpler way of thinking about it is to consider the definition of machine epsilon: it is defined to be the difference between 1.0 and the next largest floating point number. For doubles, it's ~2.2e-16 (defined as DBL_EPSILON in float.h). So, if you're estimating a value that should be equal to 1.0 using a function f(), if you check fabs(f() - 1.0) <= 2.2e-300, this is the same as requiring fabs(f() - 1.0) == 0. Why? Because if there is nonzero floating point round-off error accumulated by the implementation of f() (which we expect to happen in practice), it is exceedingly likely that we will have fabs(f() - 1.0) >= DBL_EPSILON. The smallest nonzero error that can occur is 10^285 times larger than 2.2e-300!

When we look at relative errors, we're trying to normalize the error so that the magnitude of the quantities being compared is on the order of 1.0, making it sensible to compare with machine epsilon or some constant multiple thereof. To come up with a more precise tolerance, you need to know more about what you're testing. For example, the expected round-off error from summing N numbers in sequence is ≈ ε sqrt(N), where ε is machine epsilon. So, a default tolerance of 1e-13 is similar to the expected error from summing ~200,000 numbers in sequence. It isn't particularly easy to choose a one-size-fits-all default floating point tolerance...

Anyway, I'll make these changes shortly and add them to this PR for you to take a look at.

(The reason I'd like to use the double_*_tolerance naming convention is to allow space for single precision float tests.)

@thoni56 thoni56 added this to the 1.4.0 milestone Dec 21, 2020
@coveralls
Copy link

coveralls commented Dec 24, 2020

Coverage Status

Coverage decreased (-0.6%) to 94.983% when pulling 5580833 on sampotter:master into 9314651 on cgreen-devs:master.

@thoni56
Copy link
Contributor

thoni56 commented Jan 25, 2021

Hi!

Any update on this? No rush, just want to see if there is progress...

@sampotter
Copy link
Author

Hi Thomas, sorry for the delay! Should be able to reply more quickly now.

I went ahead and simplified src/constraint.c a bit by replacing the significant_figures and absolute_tolerance variables with double_absolute_tolerance and double_relative_tolerance like I proposed above.

What should be done next? There are evidently some failing tests. Not sure if they're failing because of changes I introduced.

@thoni56
Copy link
Contributor

thoni56 commented Jan 28, 2021

Looking into the Travis output I can actually see only one test that fails (we have a build matrix in Travis for gcc, g++ and dynamic/static linking, so that's 4 travis jobs).

/home/travis/build/cgreen-devs/cgreen/tests/constraint_tests.c:162: Failure: constraint_tests -> matching_doubles_respects_significant_figure_setting 
	Expected [compare_double_constraint(want_337, 339.0)] to [be false]

I haven't looked at the code but the above indicates that there is something in that test that relates to significant figures ;-) It probably tests doubles constraints with a particular significance. So for backward compatibility we probably do want that test to pass as is. (Again, haven't looked at the test code yet.)

This is the code of the test:

Ensure(Constraint, matching_doubles_respects_significant_figure_setting) {                                                                                                                                                              
    Constraint *want_337 = create_equal_to_double_constraint(337.0, "height");                                                                                                                                                          
                                                                                                                                                                                                                                        
    significant_figures_for_assert_double_are(2);                                                                                                                                                                                       
    assert_that(compare_double_constraint(want_337, 339.0), is_true);                                                                                                                                                                   
                                                                                                                                                                                                                                        
    significant_figures_for_assert_double_are(3);                                                                                                                                                                                       
    assert_that(compare_double_constraint(want_337, 339.0), is_false);                                                                                                                                                                  
                                                                                                                                                                                                                                        
    destroy_constraint(want_337);                                                                                                                                                                                                       
} 

It obviously tests that significant figures works as expected, 337 and 339 are equal if only 2 significant digits, but different with 3 significant figures.

(Again, the exact mathematics here is beyond me. Hope you can make some sense of that.)

@sampotter
Copy link
Author

Ah, thanks. I missed that because of the many lines reading "highly_nested_test_suite": No assertions. that were output.

How can I debug this test? From ./build, I tried running cgreen-runner tests/libconstraint_tests.dylib, but all these tests seem to pass. I'd like run the test under lldb so I can step through it. (I'm on macOS using clang and lldb from the command line.)

(Thanks for bearing with me while I figure out how this project is organized!)

@thoni56
Copy link
Contributor

thoni56 commented Jan 28, 2021

No problem. Being a CMake-project things are a little more complicated than you really want them to be. (Sometimes I wonder if the portability benefits are worth it....)

TIP: I most often use make unit at the top level, it has been built to hide all the complexities.

But that doesn't help when you want to debug. You are using the correct command. To be really sure, ensure that you are using the cgreen-runner in the project, and not some other version that you have installed (build/tools/cgreen-runner ...).

I checked out the PR and tried exactly that command on Cygwin and WSL. On Cygwin it passed, but on WSL it failed, which maybe makes sense in one way, since Travis in on Linux. But on Cygwin make unit showed that another test (that is not in make test) failed:

constraint_messages_tests.c: Failure: ConstraintMessage -> for_is_greater_than_double_with_accuracy
       Expected [1.0] to [be greater than double] [1.0 + 1.0e-3 + DBL_EPSILON] within [4] significant figures
               actual value:                   [1.000000]
               expected value:                 [1.001000]

(The "messages" tests run tests and matches output to some golden output, to ensure that user level messages are not trashed, but again these are not run with make test.)

That indicates that even when the constraints test passes some other tests related to doubles might fail. And that usually indicates reading/writing outside memory/buffer overruns or some other horrible thing. So let's valgrind it; Nope, no problem.

So the only thing that I can think of is

I think it should be OK to replace significant_figures and absolute_tolerance with double_relative_tolerance and double_absolute_tolerance: when setting the number of significant figures, just set double_relative_tolerance = pow(10, 1 - significant_figures). This is handy, too, because then there's no need to call log10 when calling accuracy.

Maybe there is something in the libraries that are different that makes that statement not hold on all platforms?

To actually debug you can probably adapt what the cgreen-debug script does into lldb equivalent commands. The core of it being

bp=${2//:/__}
echo break $bp > .cgreen-debug-commands
echo run $1 $2 >> .cgreen-debug-commands
if command -v cgdb ; then
    debugger=cgdb
else
    debugger=gdb
fi
$debugger -ex "set breakpoint pending on" cgreen-runner --command=.cgreen-debug-commands
rm .cgreen-debug-commands

$1 and $2 are the library and the test function respectively, like

cgreen-debug libconstraint_tests.so Constraint\:matching_doubles_respects_significant_figure_setting

The first couple of lines converts the Cgreen-style testname to the linkage name, dumps a break and a run command in temporary file, then gdb or cgdb is asked to run those commands.

The "set breakpoint pending on" instructs gdb to set breakpoints even though that code is not loaded yet. (It isn't when the cgreen-runner starts.)

@thoni56
Copy link
Contributor

thoni56 commented Mar 15, 2021

So, did you get a hold on the problem here? Again, if it is a mathematical problem (rounding?) then it´s over my head...

@sampotter
Copy link
Author

Sorry for the hold up, simply did not have time to get to this recently...

The failures all seemed to be because I tried to remove the accuracy function in constraints.c and reimplement those checks in my style. It is still not completely clear to me how accuracy works and since this isn't important for me I don't want to get into it right now. So I think my is_nearly functions will just have to coexist for now. Maybe this can be revisited later.

Looks like my new commit is passing. Not sure how important the decrease in coverage is to you.

If you'd like me to add a new section to the docs, please give me some basic instructions and I should be able to take care of this soon.

@thoni56
Copy link
Contributor

thoni56 commented Mar 16, 2021

Sorry for the hold up, simply did not have time to get to this recently...

No problem. Life.

The failures all seemed to be because I tried to remove the accuracy function in constraints.c and reimplement those checks in my style. It is still not completely clear to me how accuracy works and since this isn't important for me I don't want to get into it right now. So I think my is_nearly functions will just have to coexist for now. Maybe this can be revisited later.

That's probably right for now, maybe we need a mathematician ;-)

Looks like my new commit is passing. Not sure how important the decrease in coverage is to you.

Passing is good. Coverage decrease as such is not a problem per se, but I'd really like a few tests that

  • shows how to set tolerance and what that means as "documentation"
  • exercise some boundary condition(s)

If you'd like me to add a new section to the docs, please give me some basic instructions and I should be able to take care of this soon.

Good, thanks. The docs/guide are written in AsciiDoc (which is like Markdowns bigger brother) residing in the doc subdirectory. In section 4.1.1 ...

  • the new constraint needs to be added of course
  • but it also needs some text or section describing the difference between accuracy and tolerance.

I also remembered that you had a question in your first comment, about switching on the name of the constraint. (Maybe this should go in code comments...)

I think you actually should use the type field with a new type enum instead. I think these are the steps (without any promise about being complete, or correct ;-):

  • new type enum in constraint.h
  • select on the type field rather than the name inside assert_that_double_()
  • add the enum to is_double_comparing() in constraint.c

Maybe I can help out with some of these.

@sampotter
Copy link
Author

Thanks for the detailed instructions, much appreciated! I'll start looking into these tasks as I'm able this week.

Re: getting a mathematician: that's basically what I am... I looked into a good way to reliably count the number of significant figures and couldn't come up with something that I had much confidence in (because of edge cases relating to floating point arithmetic). That accuracy function is what does that, and it passes all of the tests, but it wasn't clear to me that it works perfectly. I doubt it's particularly complicated, but I remain unconvinced that comparing floating point numbers by counting the number of shared leading digits is a good feature for cgreen to provide... If it were up to me, I'd probably just remove it.

@thoni56
Copy link
Contributor

thoni56 commented Mar 17, 2021

Thanks for the detailed instructions, much appreciated! I'll start looking into these tasks as I'm able this week.

No problem. Again, if you think I can help, just say the word.

Re: getting a mathematician: that's basically what I am... I looked into a good way to reliably count the number of significant figures and couldn't come up with something that I had much confidence in (because of edge cases relating to floating point arithmetic). That accuracy function is what does that, and it passes all of the tests, but it wasn't clear to me that it works perfectly. I doubt it's particularly complicated, but I remain unconvinced that comparing floating point numbers by counting the number of shared leading digits is a good feature for cgreen to provide... If it were up to me, I'd probably just remove it.

I did realised that you were much more knowledgeable than me in the area. Sorry for not picking up on the mathematician bit ;-)

I hear what you are saying, I think. From a "maths" perspective accuracy is the right way to go. And I buy that, so your contribution is very valuable.

But I guess that to @d-meiser, who did the original double implementation, the significant_digits was natural, so there are possibly scenarios where this is right/ok. For this, and also for backwards compatibility, we should keep both. It is unfortunate that your unification did not work, and I assume that is due to some veird machine/library difference in rounding, since it seems to vary across platforms, which is a bummer.

I also want to bring up the naming of the user level macro again. After some time I realized what bugged me with nearly_double intially, and that is that just reading it it sounds like nearly_double(1) should be close to 2 since that is the double of 1.

Coming from is_nearly(1) (which is really good!) and adding the double to indicate "type" (which unfortunately is necessary in C) one gets is_nearly_double()...

Looking around I see that both Jasmine and Hamcrest use some variation of "is close to", while FluentAssertions use "BeApproximately" (yuck... ;-).

Also, the "double" type indicator (and "_string") for is_equal_to() is forced on us because we can't do type-based overloading of functions in C, not even with macro-magic. It is not there for any other reason, if it was possible to avoid, we would.

So, thinking that we don't actually need the typing information here (an int will be automatically type converted into a double, and there is no "int-typed" version of nearly() that we want to "overload") a good option is actually just is_nearly(). Possibly is_close_to() would be more "standard". Do I sound hesitant? I am, since I don't do tests in double precision, so I don't know what would feel the most natural and fluent.

Last point, do we need is_not_nearly() and the other predicates, greater etc.? If so, that might influence the naming decision also.

@sampotter
Copy link
Author

No problem. Again, if you think I can help, just say the word.

Sure, will do. Thanks.

I did realised that you were much more knowledgeable than me in the area. Sorry for not picking up on the mathematician bit ;-)

Of course, it's not a problem at all. I'm not so egotistical that it would offend me (only a little egotistical). ;-)

I hear what you are saying, I think. From a "maths" perspective accuracy is the right way to go. And I buy that, so your contribution is very valuable.

But I guess that to @d-meiser, who did the original double implementation, the significant_digits was natural, so there are possibly scenarios where this is right/ok. For this, and also for backwards compatibility, we should keep both. It is unfortunate that your unification did not work, and I assume that is due to some veird machine/library difference in rounding, since it seems to vary across platforms, which is a bummer.

Well, it's two things. Let me clarify.

Issue #1: computing significant digits is of limited practical utility. Anyone who thinks they need it for something probably doesn't. I can see maybe an argument for it when comparing amounts of currency, but outside of that it is basically always better to use one of these normalized "relative error" tests, like I added. You can think of these tests as being the continuous version of the discrete "significant digits" test. Because of this, they're more natural and robust when dealing with floating point numbers.

Issue #2: I don't have a lot of confidence in the current implementation of "sig digits". What's checked in currently passes all the tests, but I have a hunch that if I did some aggressive testing I'd find plenty of corner cases that fail. Part of my lack of confidence comes from not knowing the provenance of accuracy. Maybe accuracy has a simple explanation and we can indeed be 100% confident that it works in 100% of cases. But I can't assume that myself; and, given the vagaries of floating point arithmetic, my default assumption is that there are probably some weird corner cases where it fails.

(Incidentally, we can assume that floating point behavior should be consistent across platforms. This should be guaranteed the C standards' conformance to the IEEE754 standard. If things vary across platforms, it's probably a compiler bug.)

Now, as far as Issue #1 goes: whatever. ;-) This isn't my library and it doesn't bother me if there's a set of test functions in it that I don't think make a lot of sense and won't use myself. If people depend on them now, I won't assume I know their situation better than they do!

As for Issue #2: I think it would be good to have an implementation of "check sig digits" that we can be confident in. I spent about an hour trying to come up with an approach based just on floating point arithmetic, but because of rounding wasn't able to find something that worked robustly. I came up with a bit of a more heavyweight solution instead. Let me know what you think:

// digits.c

#include <math.h>
#include <stdio.h>

// Base 10 version of frexp from math.h
double frexp10(double x, int *exp) {
  *exp = floor(log10(fabs(x)));
  return x/pow(10.0, (double)*exp);
}

int get_sig_digits(double x, double y) {
  if (x == y) return 16; // max digits in double mantissa

  // Get (m)antissa and (e)xponent of x and y
  int ex, ey;
  double mx = frexp10(x, &ex);
  double my = frexp10(y, &ey);

  if (ex != ey) return 0; // no shared digits if diff. exps

  // A number if "positive" if >= 0 --- should be enough for the comparisons below
  int sx = mx >= 0, sy = my >= 0;

  if (sx != sy) return 0; // no shared digits if diff. signs

  // Make mx and my nonnegative for snprintf (don't want leading '-' char)
  if (!sx) mx *= -1;
  if (!sy) my *= -1;

  // Print out normalized mantissas of x and y
  char xstr[19], ystr[19]; // will write 18 digits
  snprintf(xstr, sizeof(xstr), "%1.16f", mx);
  snprintf(ystr, sizeof(ystr), "%1.16f", my);

  // Count common leading digits
  int d = xstr[0] == ystr[0];
  int c = 2;
  while (xstr[c] == ystr[c]) {
    ++d;
    ++c;
  }

  return d;
}

int main() {
  // Test cases...
  double pairs[][2] = {
    {0.1, -0.1},
    {1.0012, 1.0012},
    {1.0100, 1.0012},
    {7543.20001, 75},
    {-987.4, -987.0004},
    {1010, 1012},
    {0.1, -0.1},
    {-1.2, -1.19},
    {20.2, 20.3},
    {-21.1, -21.13},
    {-21.1, -21.11},
    {-21.1, -21.09},
    {-21.1, -21.07},
    {-21.1, -21.05},
    {-21.1, -21.03},
    {-21.1, -21.01},
  };

  int num_pairs = sizeof(pairs)/sizeof(double[2]);

  double x, y;
  int d;
  for (int i = 0; i < num_pairs; ++i) {
    x = pairs[i][0];
    y = pairs[i][1];
    d = get_sig_digits(x, y);
    printf("x = %f, y = %f: sig digits = %d\n", x, y, d);
  }
}

Compiled on macOS using cc digits.c -o digits. Output:

sfp@Samuels-MBP ~ % ./digits
x = 0.100000, y = -0.100000: sig digits = 0
x = 1.001200, y = 1.001200: sig digits = 16
x = 1.010000, y = 1.001200: sig digits = 2
x = 7543.200010, y = 75.000000: sig digits = 0
x = -987.400000, y = -987.000400: sig digits = 3
x = 1010.000000, y = 1012.000000: sig digits = 3
x = 0.100000, y = -0.100000: sig digits = 0
x = -1.200000, y = -1.190000: sig digits = 1
x = 20.200000, y = 20.300000: sig digits = 2
x = -21.100000, y = -21.130000: sig digits = 3
x = -21.100000, y = -21.110000: sig digits = 4
x = -21.100000, y = -21.090000: sig digits = 2
x = -21.100000, y = -21.070000: sig digits = 2
x = -21.100000, y = -21.050000: sig digits = 2
x = -21.100000, y = -21.030000: sig digits = 2
x = -21.100000, y = -21.010000: sig digits = 2

Note the line where x = -21.1 and y = -21.11. This returns 4, which is correct, since -21.11 isn't exactly representable in double precision---you get -21.11 = -21.099999.... Not 100% sure get_sig_digits above is totally correct, but this is the best I've come up with so far.

I get the feeling you're more of a C hacker than I am. Maybe you can think of some clever ways to optimize this, or some corner cases I missed. :-)

I also want to bring up the naming of the user level macro again. After some time I realized what bugged me with nearly_double intially, and that is that just reading it it sounds like nearly_double(1) should be close to 2 since that is the double of 1.

Coming from is_nearly(1) (which is really good!) and adding the double to indicate "type" (which unfortunately is necessary in C) one gets is_nearly_double()...

I agree, that's unfortunate... I've been using these macros in my own branch for a library that I've been developing for a little while now and have basically learned to just ignore the type in each of these lines. I guess there's a bit of a tension between, on the one hand, wanting to provide fluent tests that indicate the type, and needing to accommodate C's weak type system on the other. E.g., I don't think this would be an issue as much in C++. (I see you mentioned this in your reply, as well.)

Looking around I see that both Jasmine and Hamcrest use some variation of "is close to", while FluentAssertions use "BeApproximately" (yuck... ;-).

Also, the "double" type indicator (and "_string") for is_equal_to() is forced on us because we can't do type-based overloading of functions in C, not even with macro-magic. It is not there for any other reason, if it was possible to avoid, we would.

So, thinking that we don't actually need the typing information here (an int will be automatically type converted into a double, and there is no "int-typed" version of nearly() that we want to "overload") a good option is actually just is_nearly(). Possibly is_close_to() would be more "standard". Do I sound hesitant? I am, since I don't do tests in double precision, so I don't know what would feel the most natural and fluent.

Last point, do we need is_not_nearly() and the other predicates, greater etc.? If so, that might influence the naming decision also.

To make it a bit more explicit for the user, how about this option? Add functions to be called in BeforeEach called something like do_single_precision_floating_point_tests and do_double_precision_floating_point_tests (I don't have a strong opinion about the name of these...). Combine this with your idea to just use the much more ideal is_nearly, and then just have it switch the implementation in the background. E.g.:

BeforeEach (blah) {
  do_single_precision_floating_point_tests();
  absolute_tolerance_is(1e-6);
}

Ensure (blah, ...) {
  ...
  float x = 1.01, y = 1.02;
  assert_that(x, is_not_nearly(y));
  ...
}

Incidentally, a cool side benefit of here would be that cgreen would then be in a position to tell the user what they're doing doesn't make sense. If the library knows it's going to do single precision tests, then it can alert the user that it passed a relative tolerance that's too small. For example, relative_tolerance_is(1e-12) doesn't make sense for single precision, since it's smaller than single precision machine epsilon.

@sampotter
Copy link
Author

Last point, do we need is_not_nearly() and the other predicates, greater etc.? If so, that might influence the naming decision also.

I think is_not_nearly makes sense to me, since it's just the negation of is_nearly. On the other hand, the "approximate less than/greater than" tests are a little weird. I've never seen anything like those and can't imagine a situation where I'd need them. It seems simpler to expect the user to handle that kind of thing themselves. E.g., compute diff = x - y and check if diff > eps themselves.

@thoni56
Copy link
Contributor

thoni56 commented Mar 19, 2021

As for Issue #2: I think it would be good to have an implementation of "check sig digits" that we can be confident in. I spent about an hour trying to come up with an approach based just on floating point arithmetic, but because of rounding wasn't able to find something that worked robustly. I came up with a bit of a more heavyweight solution instead. Let me know what you think:

At first I didn't really follow here, but I think I now understand (by reading the code ;-) that you aimed for exploring if significant figures actually "work" in various situations, right?

If so, not being fluent in floating point arithmetic, I'm not sure I can help with the actual core issue. I initially thought you could just take any floating point number, say 1.001 and then add next 10e-1 to it which should give 1.0011 which are equal to 4 digits but not to 5, then continue with 1.0001 for 5 & 6 a.s.o. But again, I'm not sure I even understand where to look, this is probably only one of many vectors in the problem, and again, rounding... Just throwing around ideas.

@thoni56
Copy link
Contributor

thoni56 commented Mar 19, 2021

To make it a bit more explicit for the user, how about this option? Add functions to be called in BeforeEach called something like do_single_precision_floating_point_tests and do_double_precision_floating_point_tests (I don't have a strong opinion about the name of these...). Combine this with your idea to just use the much more ideal is_nearly, and then just have it switch the implementation in the background. E.g.:

This is a really good idea! Not only can we check some things, like you mentioned, but we also get a way to stepwise deprecate the "significant digits approach" if we need/want/should:

  • implement the switching functions and make "significant digits" the default
  • promote the "accuracy approach"
  • next major release deprecate "significant digits" by making "accuracy" the default

(I see that I think of the two "methods" as "significant digits" and "accuracy" respectively, but you talk about them as "single precision" and "double precision", am I (again) miss(understand)ing something here? I feel really thick here...)

@thoni56
Copy link
Contributor

thoni56 commented Jun 21, 2021

I'm checking in on this to see where we are ;-)

I've pondered this off and on for a while and my position (today ;-) is that

  1. The contribution that you are offering here is very valuable
  2. It is even, probably, more "correct" than the current "significant figures" approach
  3. It should become the recommended approach for floating point comparisons
  4. As such it should be as mathematically correct and sound as possible
  5. This might cause the current "significant figures" approach to be less accurate and mathematically precise, even contain bugs ;-), but so be it
  6. I like the idea of keeping them both, particularly for backwards compatibility, and selecting with a switch (see note below)
  7. Names of constraints should be is_nearly_equal_to() and is_not_nearly_equal_to().

NOTE: I'm still a bit confused about your comment

Add functions to be called in BeforeEach called something like do_single_precision_floating_point_tests() and do_double_precision_floating_point_tests()

because the previous implementation is all about "double" too. I'd like to understand this a bit better way the reasoning about "single" and "double", in particular why you consider the "significant figures" approach being "single precision", before making a final decision on those names. But assuming that is well-founded, then I'd like to have a function with an argument to switch implementation. Similar to cgreen_mocks_are() and significant_digits_for_double_asserts_are(), so (again assuming "single precision" and "double precision" are names that would work for distingushing between the two):

cgreen_floating_points_are(DOUBLE_PRECISION);

I'm trying to not sound rigid, but I'd like this to move forward and I don't have the knowledge to continue from this point (yet). But I think part of the stall is lack of decisions, so that's what I tried to provide better this time.

If you have any disagreements, I'm happy to take them and discuss further.

@sampotter
Copy link
Author

Sorry about the silence on my end. So that you're aware, I'm in the middle of finishing up my PhD dissertation and have no extra time to spare at the moment. This is 100% of where the delay is coming from on my end. I'll be done by August or so, and then I'll have some more time to help out here. I'm using my own fork of cgreen with the improved floating point tests on a daily basis, and it's worked quite nicely for me. I'm very happy with cgreen and plan to continue using it, so don't worry about me disappearing. ;-)

Re: your most recent response:

I agree completely with points 1-6. For 7, I still prefer is_nearly and is_not_nearly, but these can easily be defined on a per-project basis (my main complaint with cgreen is that I find some of the names to be pretty verbose).

For the rest of your response, I can't remember all the details of our previous conversations, but I'll try to summarize what I was thinking in case it helps:

  • I don't think that the significant figures approach is inherently single precision (sorry for any confusion).
  • Both the proposed set of "relative tolerance" ("is_nearly") tests and the "significant figures" tests can be made to work assuming single- or double-precision floats.
  • Appending "_double" and "_float" (or "_single") to everything is pretty unpleasant. An alternative would be to select which floating point type to "back" the implementation of "is_nearly", etc. in BeforeEach using something like your proposed cgreen_floating_points_are(DOUBLE_PRECISION) (although probably better would be cgreen_floating_point_precision_is(...)).

So I think we're basically in agreement about everything, apart from some naming details?

(BTW, even though I don't have time to code here now, I can continue discussing this, so I should reply rather quickly)

@thoni56
Copy link
Contributor

thoni56 commented Jun 21, 2021

Thanks for the update, and the collection of thoughts!

Yes, it seems like we mostly agree. So I'll let you focus on your PhD and we'll touch base in August!

@matthargett
Copy link
Contributor

@sampotter hopefully your PhD program went well. here's a ping in case you have some time over your holidays :)

@sampotter
Copy link
Author

sampotter commented Dec 26, 2021

@matthargett @thoni56

Thanks for the ping. PhD successfully defended and now fully engaged in a postdoc, so quite busy.

I have time now to help now, and would like to get this off my backburner (!), but I'm frankly lost as to where to start. I haven't thought about this in several months and the details are no longer at my fingertips. From what I can tell, is_nearly_double in my branch works correctly, and also passes the Travis tests.

If someone could feed me a single, self-contained task to focus on, I'd be happy to take care of it.

Happy holidays!

@thoni56
Copy link
Contributor

thoni56 commented Dec 26, 2021

@matthargett @thoni56

Thanks for the ping. PhD successfully defended and now fully engaged in a postdoc, so quite busy.

Congratulations!

I have time now to help now, and would like to get this off my backburner (!), but I'm frankly lost as to where to start. I haven't thought about this in several months and the details are no longer at my fingertips. From what I can tell, is_nearly_double in my branch works correctly, and also passes the Travis tests.

If someone could feed me a single, self-contained task to focus on, I'd be happy to take care of it.

I think your implementation of double assertions has great merit, and I'd like to see that as the future default (only reservation is that I have not had any use, and thus no experiance, for doubles and am no "mathematician" so I feel a bit hestiant to make this call, but I trust you on this).

To get there we need and implementation that can exist side by side with the current and allowing programmatic selection during run-time is the way to go. Then we can decide to switch default at some breakpoint and then deprecate the "old" style.

So, that would pretty much point to your post on Jun 21. I'm not sure if that is "single, self-contained" enough, but consider it a Christmas wish ;-)

Happy holidays!

Thanks! And to you too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants