hashAggregates: group by sum decimal issue #1554

ssathi · 2020-12-08T06:02:50Z

Aggregating negative decimal values produces incorrect results.

table:

charge_item_name (string)	charge_type (string)	charge_amount (decimal(16,2)
WHT	tax	-300.00
WHT	tax	-300.00
WHT	tax	-300.00
WHT	tax	-240.00
WHT	tax	-300.00
WHT	tax	-300.00
WHT	tax	-300.00
WHT	tax	-300.00
WHT	tax	-300.00
WHT	tax	-300.00
WHT	tax	-300.00
WHT	tax	-200.00
WHT	tax	-300.00
WHT	tax	-300.00

Spark SQL query:

SELECT itemchargingtype, itemdescription, count(*) AS `count`, sum(itemamount) AS `sum`
FROM `transaction_items`
where itemdescription = 'WHT'
GROUP BY itemdescription, itemchargingtype

result:

charge_type	charge_item_name	count	sum
tax	WHT	14	1,202.88

After disabling
snappydata.sql.hashAggregateSize=-1
snappydata.sql.useOptimizedHashAggregateForSingleKey=false

it produces correct values.

result:

charge_type	charge_item_name	count	sum
tax	WHT	14	-4040.00

The text was updated successfully, but these errors were encountered:

sumwale · 2020-12-08T12:20:39Z

Thanks for reporting.

Tested this and it looks to be a bug in the new BufferHashMap based implementation that reduces memory overhead for large aggregates and DISTINCT. For now you can switch to using the older implementation "set snappydata.sql.optimizedHashAggregate=false" which is as fast (and in many cases faster) than the newer one albeit may fail for very large aggregation/DISTINCT results. If this works for your use-cases it is much better than turning it off completely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hashAggregates: group by sum decimal issue #1554

hashAggregates: group by sum decimal issue #1554

ssathi commented Dec 8, 2020 •

edited

sumwale commented Dec 8, 2020

hashAggregates: group by sum decimal issue #1554

hashAggregates: group by sum decimal issue #1554

Comments

ssathi commented Dec 8, 2020 • edited

sumwale commented Dec 8, 2020

ssathi commented Dec 8, 2020 •

edited