Skip to content

Releases: apache/druid

Druid 0.6.105 - Stable

07 Jun 04:58
Compare
Choose a tag to compare

Updating

When updating Druid with no downtime, we highly recommend updating historical nodes and real-time nodes before updating the broker layer. Changes in queries are typically compatible with an old broker version and a new historical node version, but not vice versa. Our recommended rolling update process is:

  1. indexing service/real-time nodes
  2. historical nodes (with a wait in between each node, the wait time corresponds to how long it takes for a historical node to restart and load all locally cached segments)
  3. broker nodes
  4. coordinator nodes

Release Notes

  • Historical nodes can now use and maintain a local cache (disabled by default). This cache can either be heap based or memcached. This allows historical nodes to merge results locally and reduces much of the memory pressure seen on brokers while pulling a large number of results from the cache. Populating the cache is also now done in an asynchronous manner.
  • Experimental router node. We’ve been experimenting with a fully asynchronous router node that can route queries to different brokers depending on the actual query. Currently, the router node makes decisions about which broker to talk to based on rules from the coordinator node. It is our goal to at some point merge the router and broker logic and move towards hierarchical brokers.
  • Post aggregation optimization. We’ve optimized calculations of post aggregations (previously post aggs were being calculated more than necessary). In some initial benchmarks, this can lead to 20%-30% improvement in queries that involve post aggregations.
  • Support hyperUnique in groupBys. We’ve fixed a reported problem where groupBys would report incorrect results when using complex metrics (especially hyperUnique).
  • Support dimension extraction functions in groupBy
  • Persist and persist-n-merge threads now no longer block each other during real-time ingestion. We added a parameter for throttling real-time ingestion a few months ago, and what we’ve seen is that very high ingestion rates that lead to a high number of intermediate persists can be blocked while waiting for a hand-off operation to complete. This behavior has now been improved. You are also now able to set maxPendingPersists in the plumber.
  • hyperUnique performance optimizations: ~30-50% faster aggregations

Miscellaneous other things

  • Fix integer overflow in hash based partitions
  • Support for arbitrary JSON objects in query context
  • Request logs now include query timing statistics
  • Hadoop 2.3 support by default
  • Update to Jetty 9
  • Do not require valid database connections for testing
  • Gracefully handle NaN / Infinity returned by compute nodes
  • better error reporting for cases where the ChainedExecutionQueryRunner throws NPEs

Extensions:

  • HDFS Storage should now work better with Cloudera CDH4
  • S3 Storage: object ACLs now consistently default to "bucket owner full control"

Druid 0.6.73 - Stable

18 Jun 20:37
@fjy fjy
Compare
Choose a tag to compare

We are pleased to announce a new Druid stable, version 0.6.73. New features include:

A production tested dimension cardinality estimation module

We recently open sourced our HyperLogLog module described in bit.ly/1fIEpjM and //bit.ly/1ebLnNI . Documentation has been added on how to use this module as an aggregator and as part of post aggregators.

Hash-based partitioning

We recently introduced a new sharding format for batch indexing. We use the HyperLogLog module to estimate the size of a data set and create partitions based on this size. In our tests, partitioning via this hash based method is both faster and leads to more evenly partitioned segments.

Cross-tier replication

We can now replicate segments across different tiers. This means that you can create a “hot” tier that loads a single copy of the data on more powerful hardware and a “cold” tier that loads another copy of the data on less powerful hardware. This can lead to significant reductions in infrastructure costs.

Nested GroupBy Queries

Thanks to an awesome contribution from Yuval Oren et. al, we can do multi-level aggregation with groupBys. More info here: https://groups.google.com/forum/#!topic/druid-development/8oL28iuC4Gw

GroupBy memory improvements

We’ve made improvements as to how multi-threaded groupBy queries utilize memory. This should help reduce memory pressure on nodes with concurrent, expensive groupBy queries.

Real-time ingestion stability improvements

We’ve seen some stability issues with real-time ingestion with a high number of concurrent persists and have added smarter throttling to handle this type of workload.

Additional features

  • multi-data center distribution (experimental)
  • request tracing
  • restore tasks (to restore archived segments)
  • memcached stability improvements
  • indexing service stability improvements
  • smarter autoscaling in the indexing service
  • numerous bug fixes
  • new documentation for production configurations

Things on our plate

  • Reducing CPU usage on the broker nodes when interacting with the cache (we are seeing query bottlenecks when merging too many results from memcached)
  • Having historical nodes populate memcached (so bySegment results are no longer returned and historical nodes can do their own local merging)
  • Consolidating batch and real-time ingestion schemas so we can move towards a simpler data ingestion model
  • Scaling groupBys with off-heap result merging
  • Improving real-time ingestion stability and performance by moving to more off-heap data structures
  • Autoscaling and sharding the real-time ingestion pipeline
  • Evaluating append only style updates for streaming data (https://github.com/metamx/druid/issues/418)

Druid 0.6.52 - Stable

18 Jun 20:40
@fjy fjy
Compare
Choose a tag to compare
druid-0.6.52

[maven-release-plugin]  copy for tag druid-0.6.52