-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore smaller float code with uint32_t #221
Comments
Hello, I would to use nanoprintf on c compiler without 64bits integer. (long long is restrict to 32 bits). Thanks |
I'll take a shot at it sometime in the next few days; the tricky part is that breaking the float up into 3x integers (integer portion, fractional portion as an integer, and number of leading zeroes after the decimal point) currently requires 64 bits of working space. Simply changing all of the uint64_t to uint32_t is the first step, also changing 64 to a 32 for "max bits we can handle for a fractional part", but then some of the tests fail for fractional numbers that have no mantissa. |
Yes I did all of this and had fail also. |
One nice thing about the current approach is that it doesn't actually do any floating point operations. Unfortunately, the 64-bit variables are very hard to get rid of. I did sketch out a quick-and-dirty replacement here https://github.com/charlesnicholson/nanoprintf/pull/243/files that uses floating point operations but only uint32_t. It's not quite as precise, but it did pass all but 4 of the conformance tests so it's almost certainly "good enough". Could I ask anyone interested to try the PR branch and give feedback? If it's satisfactory, I wouldn't mind adding another compile-time configuration macro to pick the version; it's a reasonable tradeoff. |
That's great! |
It's a garbage implementation, but if it works "well enough" i think it's straightforward to add rounding, and can use the powers-of-ten table trick to get rid of loops. I'm not excited by how much larger it is than the u64 version, but if it allows the project to support more architectures, that's probably best for the users :) |
Hello, I test the 32 bits version, I need to change in body variable declarations because my compiler is pure C (on an old HCS12 codewarrior), and fix some compiler unknows type (like uintptr), and it work as expected. I could upload my pure C "fix" version if need Regards |
I'd love to see it, if you want to open it up as a PR. (Unsure w/o looking at it first which specific branch would land into main, but if you're opening a PR anyway please add yourself to the contributors file!) |
Hello (again), guys! After doing the #255 , I started looking at this issue. After some thinking and tinkering with a calculator, there came an idea of an algorithm, which holds a 28-bit value, but moves left/right through the bits/digits, by doing multiplication/division iteratively and dynamically. For now I implemented some experimental code just for the fractional part to see what it's practically capable of. Here are some output examples of the standard
Pros:
Cons:
The dynamic nature of the error can be seen in the last two examples, which prints a value of 2/3 and the same value divided by 10^21. The first one prints 8 pieces of '6', while the second prints just 7 pieces. But in my opinion the precision is pretty good anyway. So, are we willing to take such a sacrifice? |
Hello again! It's great to hear from you; this sounds very interesting and promising! It aligns very much with the float philosophy I've had for nanoprintf, which is roughly "code size and compatibility are more important than perfect round-trip float/double support", especially if there's a compile-time flag that gives users the option to trade speed/space for more precision. To use nanoprintf in the first place users clearly need to know what they're getting into, so I don't mind burdening them with cognitive overhead like this; it's easy to experiment and see what works best for a given codebase. Do you have any initial estimates about the impact (positive or negative) to the binary size? Now that Actions automatically run on every PR, it's hopefully easier to do experiments (check out the "Size Report" job at the very bottom of the action run). As long as it's in the same ballpark, I'd be happy to incorporate this, and please update the readme.md to credit yourself with the details! |
Then it seems that I have captured the philosophy as it was intended. :) At this point the code is just a very crude experiment in Visual Studio completely separate from the nanoprintf, so I cannot give exact numbers, but I can give an estimate based on the amount and logic of the code. The 64-bit version should be about the same both size and speed wise. The 32-bit version should be both smaller and faster by a significant amount. For the latter the only 64-bit operations left will be a few bitwise operations when extracting the bits from the original By the way, I did a funny experiment for the integer part by moving the decimal separator to the left and then printing the whole number as a fraction with a single loop. It is probably the smallest implementation possible, but, because the precision has an impact on the right side, the result is this:
Not only it "damages" the integer part, but also the nicely printable fractional part. It will do this when the integer part is larger than '9'. In many cases the rounding can come in and save the day, but I guess this is not what people will accept and I don't like it either. As un-intuitive as it seems, for the "moving window" approach the integer part is more tricky than the fractional part. So, now the plan is to implement a symmetric algorithm, which prints nicely the integer part up to about 2^32 and then starts stuffing zeros to the right for a larger values. Then I have to integrate rounding, process special values and then there is a question about the limits. As the algorithm itself has no range limits and even the 32-bit version can provide 32+28 (integer+fraction) bit precision for a numbers around the decimal separator, probably we should reconsider the 32 character limit. Maybe it should also be made as a compile time parameter? |
Hello! Sorry for the delay; I'm traveling for work. This all looks very interesting, and I'm not sure increasing the stack usage matters that much for this- if you have an FPU, you can probably afford another 29 bytes of stack usage :) I'm eager to learn more! |
Here are some news on the progress. Implemented the integer part, rounding and special values as planned. By the way, the current nanoprintf implementation doesn't handle subnormal numbers correctly. It turns out that the C standard doesn't define the sizes of
For now, the numbers, which are not represented exactly, are always lower than the real values. That is because the dynamic scaling internally preserves at least 29.678 bits (for 32-bit version), but truncates values instead of rounding while processing. I'm looking into whether this can also be improved with a small amount of code. Also tried compiling it for Cortex-M7 with GCC and Os optimization and the size of the new |
Whoa, this is very cool! I'm eager to see it! |
Did the first integration into nanoprintf. I'm still working on it, but the published code is pretty complete and decent, so everybody feel free to try it:
Compared to the current v0.4.0 release, the total FLASH size of my test build on Cortex-M7 with GCC and Os optimization for 64-bit version is reduced by just 56 B, but for 32-bit version by 948 B. Also, compared to 32-bit version, the 16-bit and 8-bit versions increase the size by 48 B and 40 B respectively, which makes sense for a 32-bit CPU. Take a note that the size of the |
Getting rid of the built-in 64-bit stuff is huge! I'll play with it when I'm not working; thanks again for figuring this out :) |
Added an internal rounding for the dynamic scaling process, which improves precision, especially for a very large/small values. Mathematically the code is now complete. There is just one thing I don't like - handling of the fractional part, when the integer part exceeds the integer range. Here is an example (actual value : printed value) with 32-bit integers:
The fact, that the 4294967297.0 gets a printed as a smaller number that the 4294967296.5 seems unacceptable to me. This is a consequence of the fact that currently I'm printing the fractional part first. Initially such a decision was driven by the fact that it saves the rounding (of the printed characters) code of going over the integer part. Now it seems that it could be more efficient to print the integer part first and readjust the rounding code so that it can go over all digits, if necessary. And, in case the integer part exceeds the integer range, ignore the fractional part completely and just print zeros instead. At least it will print 4294967300.0 in all of the lines in the example, except for the first one, of course. Does it sound like the right approach? |
Heya @Okarss - sorry I let this languish :( I think extrema errata like this is totally fine, as long as it's documented. Nanoprintf pretty clearly is comfortable trading float precision for implementation size efficiency, so I think clearly explaining the limitations is plenty sufficient. I agree with what you're saying about printing a number smaller than the input, this close to the positive limit, and I think what you've done makes fine sense. :) |
Also, I'm excited to see how it does on the CI jobs! Feel free to open a PR whenever you'd like- when you do, could I ask you to rewrite the float portion of the docs as well, give yourself front-and-center credit for it, and explain (or link to) your algorithm please? |
Swapped the order of processing the integer and the fraction parts. In the previous example, if the integer part exceeds the integer range, now it will print 4294967300.0 and the fraction part will not be processed at all. Though, if the actual value is 4294967295.96875, now it will still print 4294967296.0 because of the rounding code. Another interesting thing is the default integer type/size for processing. Currently it is set to
As the library is named "nano..." and the size is the highest priority, I'm inclined to leave the default to the Overall it seems that now the code is ready and I'm nearing a PR. Then I will also update the documentation and add a commit for that. By the way, the algorithm is basically the same Wojciech Muła's idea, but evolved and extended further. It solves the range problem, by implementing scaling, which keeps just the most significant bits of the mantissa in the process. The idea just came to me naturally. Now I actually searched online for this approach and I cannot find anything. Unbelievable... |
That is interesting. I agree with your suggestion about I'm really excited to see it! Novel work is very exciting in general, and the floating point stuff has been the weakest part of npf, I think. :) |
The code is in the same GitHub branch, to which I gave a link in an earlier post - there are new commits. The integer type is also the same I showed in that post, now I'm just reevaluating the choice of the default definition for it. So in total there are still the same two new optional parameters. One can also see the new npf_ftoa_rev() function there. :) |
Right now the code that splits floats up into
[integer portion | number of leading 0's after decimal | decimal portion]
stores the decomposed integer + decimal portions as uint64_t. This is very very expensive (in time and space) on 8-bit platforms like the eZ80.Nanoprintf captures full floating-point precision, but truncates instead of rounds as a performance optimization.
Explore a compile-time flag that intentionally sacrifices precision for size/speed by storing the integer + decimal portions as uint32_t's instead.
cc @mateoconlechuga
The text was updated successfully, but these errors were encountered: