Skip to content

Apache Pinot Release 1.2.0

Latest
Compare
Choose a tag to compare
@ankitsultana ankitsultana released this 31 Jul 20:31
· 811 commits to master since this release

What Changed

This release comes with several Improvements and Bug Fixes for the Multistage Engine, Upserts and Compaction. There are a ton of other small features and general bug fixes.

Multistage Engine Improvements

Features

New Window Functions: LEAD, LAG, FIRST_VALUE, LAST_VALUE #12878 #13340

  • LEAD allows you to access values after the current row in a frame.
  • LAG allows you to access values before the current row in a frame.
  • FIRST_VALUE and LAST_VALUE return the respective extremal values in the frame.

Support for Logical Database in V2 Engine #12591 #12695

  • V2 Engine now supports a "database" construct, enabling table namespace isolation within the same Pinot cluster.
  • Improves user experience when multiple users are using the same Pinot Cluster.
  • Access control policies can be set at the database level.
  • Database can be selected in a query using a SET statement, such as SET database=my_db;.

Improved Multi-Value (MV) and Array Function Support

  • Added array sum aggregation functions for point-wise array operations #13324.
  • Added support for valueIn MV transform function #13443.
  • Fixed bug in numeric casts for MV columns in filters #13425.
  • Fixed NPE in ArrayAgg when a column contains no data #13358.
  • Fixed array literal handling #13345.

Support for WITHIN GROUP Clause and ListAgg #13146

  • WITHIN GROUP Clause can be used to process rows in a given order within a group.
  • One of the most common use-cases for this is the ListAgg function, which when combined with WITHIN GROUP can be used to concatenate strings in a given order.

Scalar/Transform Function and Set Operation Improvements

  • Added Geospatial Scalar Function support for use in intermediate stage in the v2 query engine #13457.
  • Fix 'WEEK' transform function #13483.
  • Support EXTRACT as a scalar function #13463.
  • Added support for ALL modifier for INTERSECT and EXCEPT Set Operations #13151 #13166.

Improved Literal Handling Support

  • Fixed bug in handling literal arguments in aggregation functions like Percentile #13282.
  • Allow INT and FLOAT literals #13078.
  • Fixed literal handling for all types #13344 #13345.
  • Fixed null literal handling for null intolerant functions #13255.

Metrics Improvements

  • Added new metrics for tracking queries executed globally and at the table level #12982.
  • New metrics to track join counts and window function counts #13032.
  • Multiple meters and timers to track Multistage Engine Internals #13035.

Notable Improvements and Bug Fixes

  • Improved Window operators resiliency, with new checks to make sure the window doesn't grow too large #13180 #13428 #13441.
  • Optimized Group Key generation #12394.
  • Fixed SortedMailboxReceiveOperator to honor convention of pulling at most 1 EOS block #12406.
  • Improvement in how execution stats are handled #12517 #12704 #13136.
  • Use Protobuf instead of Reflection for Plan Serialization #13221.

Upsert Compaction and Minion Improvements

Features and Improvements

Minion Resource Isolation #12459 #12786

  • Minions now support resource isolation based on an instance tag.
  • Instance tag is configured at table level, and can be set for each task on a table.
  • This enables you to implement arbitrary resource isolation strategies, i.e. you can use a set of Minion Nodes for running any set of tasks across any set of tables.

Greedy Upsert Compaction Scheduling #12461

  • Upsert compaction now schedules segments for compaction based on the number of invalid docs.
  • This helps the compaction task to handle arbitrary temporal distribution of invalid docs.

Notable Improvements

  • Minions can now download segments from servers when deepstore copy is missing. This feature is enabled via a cluster level config allowDownloadFromServer #12960 #13247.
  • Added support for TLS Port in Minions #12943.
  • New metrics added for Minions to track segment/record processing information #12710.

Bug Fixes

  • Minions can now handle invalid instance tags in Task Configs gracefully. Prior to this change, Minions would be stuck in IN_PROGRESS state until task timeout #13092.
  • Fix bug to return validDocIDsMetadata from all servers #12431.
  • Upsert compaction doesn't retain maxLength information and trims string fields #13157.

Upsert Improvements

Features and Improvements

Consistent Table View for Upsert Tables #12976

  • Adds different modes of consistency guarantees for Upsert tables.
  • Adds a new UpsertConfig called consistencyMode which can be set to NONE, SYNC, SNAPSHOT.
  • SYNC is optimized for data freshness but can lead to elevated query latencies and is best for low-qps use-cases. In this mode, the ingestion threads will take a WLock when updating validDocID bitmaps.
  • SNAPSHOT mode can handle high-qps/high-ingestion use-cases by getting the list of valid docs from a snapshot of validDocID. The snapshot can be refreshed every few seconds and the tolerance can be set via a query option upsertViewFreshnessMs.

Pluggable Partial Upsert Merger #11983

  • Partial Upsert merges the old record and the new incoming record to generate the final ingested record.
  • Pinot now allows users to customize how this merge of an old row and the new row is computed.
  • This allows a column value in the new row to be an arbitrary function of the old and the new row.

Support for Uploading Externally Partitioned Segments for Upsert Backfill 13107

  • Segments uploaded for Upsert Backfill can now explicitly specify the Kafka partition they belong to.
  • This enables backfilling an Upsert table where the externally generated segments are partitioned using an arbitrary hash function on an arbitrary primary key.

Misc Improvements and Bug Fixes

  • Fixed a Bug in Handling Equal Comparison Column Values in Upsert, which could lead to data inconsistency (#12395)
  • Upsert snapshot will now snapshot only those segments which have updates. #13285.

Notable Features

JSON Support Improvements

  • JSON Index can now be used for evaluating Regex and Range Predicates. #12568
  • jsonExtractIndex now supports contextual array filters. #12683 #12531.
  • JSON column type now supports filter predicates like =, !=, IN and NOT IN. This is convenient for scenarios where the JSON values are very small. #13283.
  • JSON_MATCH now supports exclusive predicates correctly. For instance, you can use predicates such as JSON_MATCH(person, '"$.addresses[*].country" != ''us''' to find all people who have at least one address that is not in the US. #13139.
  • jsonExtractIndex supports extracting Multi-Value JSON Fields, and also supports providing any default value when the key doesn't exist. #12748.
  • Added isJson UDF which increases your options to handle invalid JSONs. This can be used in queries and for filtering invalid json column values in ingestion. #12603.
  • Fix ArrayIndexOutOfBoundsException in jsonExtractIndex. #13479.

Lucene and Text Search Improvements

  • Improved Segment Build Time for Lucene Text Index by 40-60%. This improvement is realized when a consuming segment commits and changes to an ImmutableSegment. This significantly helps in lowering ingestion lag at commit time due to a large text index #12744 #13094 #13050.
  • Phrase Search can run 3x faster when the Lucene Index Config enablePrefixSuffixMatchingInPhraseQueries is set to true. This is achieved by rewriting phrase search query to a wildcard and prefix matching query #12680.
  • Fixed bug in TextMatchFilterOptimizer that was not applying precedence to the filter expressions properly, which could lead to incorrect results. #13009.
  • Fixed bug in handling NOT text_match which could have returned incorrect results. #12372.
  • Added SchemaConformingTranformerV2 to enhance text search abilities. #12788.
  • Added metrics to track Lucene NRT Refresh Delay #13307.
  • Switched to NRTCachingDirectory for Realtime segments and prevented duplicates in the Realtime Lucene Index to avoid IndexOutOfBounds query time exceptions. #13308.
  • Lucene Version is upgraded to 9.11.1. #13505.

New Funnel Functions #13176 #13231 #13228

  • Added funnelMaxStep function which can be used to calculate max funnel steps for a given sliding window .
  • Added funnelCompleteCount to calculate the number of completed funnels, and funnelMatchStep to get the funnel match array.

Support for Interning for OnHeapByteDictionary #12342

  • This can reduce the heap usage of a dictionary encoded byte column, for a certain distribution of duplicate values. See #12223 for details.

Column Major Builder On By Default for New Tables #12770

  • Prior to this feature, on a segment commit, Pinot would convert all the columnar data from the Mutable Segment to row-major, and then re-build column major Immutable Segments.
  • This feature skips the row-major conversion and is expected to be both space and time efficient.
  • It can help lower ingestion lag from segment commits, especially helpful when your segments are large.

Support for SQL Formatting in Query Editor #11725

  • You can now prettify SQL right in the Controller UI!

Hash Function for UUID Primary Keys #12538

  • Added a new lossless hash-function for Upsert Primary Keys optimized for UUIDs.
  • The hash function can reduce Old Gen by up to 30%.
  • It maps a UUID to a 16 byte array, vs encoding it in a UTF string which would take 36 bytes.

Column Level Index Skip Query Option #12414

  • Convenient for debugging impact of indexes on query performance or results.
  • You can add the skipIndexes option to your query to skip any number of indexes. e.g. SET skipIndexes=inverted,range;

New UDFs and Scalar Functions

  • New GeoHash functions: encodeGeoHash, decodeGeoHash, decodeGeoHashLatitude and decodeGeoHashLongitude.
  • dateBin can be used to align a timestamp to the nearest time bucket.
  • prefixes, suffixes and uniqueNgrams UDFs for generating all respective string subsequences from a string input. #12392.
  • Added isJson UDF which increases your options to handle invalid JSONs. This can be used in queries and for filtering invalid json column values in ingestion. #12603.
  • splitPart UDF has minor improvements. #12437.

CLP Compression Codec in Forward Indexes #12504

  • CLP is a compressed log processor which has really high compression ratio for certain log types.
  • To enable this, you can set the compressionCodec in the fieldConfigList of the column you want to target.

Misc. Improvements

  • Enable segment preloading at partition level #12451.

  • Use Temurin instead of AdoptOpenJdk #12533

  • Adding record reader config/context param to record transformer #12520

  • Removing legacy commons-lang dependency #13480

  • 12508: Feature add segment rows flush config #12681

  • ADSS Race Condition and update to client error codes #13104

  • Add ExceptionMapper to convert Exception to Response Object for Broker REST API's #13292

  • Add FunnelMaxStepAggregationFunction and FunnelCompleteCountAggregationFunction #13231

  • Add GZIP Compression Codec (#11434) #12668

  • Add PodDisruptionBudgets to the Pinot Helm chart #13153

  • Add Postgres compliant name aliasing for String Functions. #12795

  • Add SchemaConformingTransformerV2 to enhance text search abilities #12788

  • Add a benchmark to measure multi-stage block serde cost #13336

  • Add a plan version field to QueryRequest Protobuf Message #13267

  • Add a post-validator visitor that verifies there are no cast to bytes #12475

  • Add a safe version of CLStaticHttpHandler that disallows path traversal. #13124

  • Add ability to track filtered messages offset #12602

  • Add back 'numRowsResultSet' to BrokerResponse, and retain it when result table id hidden #13198

  • Add back profile for shade #12979

  • Add back some exclude deps from hadoop-mapreduce-client-core #12638

  • Add backward compatibility regression test suite for multi-stage query engine #13193

  • Add base class for custom object accumulator #12685

  • Add clickstream example table for funnel analysis #13379

  • Add config option for timezone #12386

  • Add config to skip record ingestion on string column length exceeding configured max schema length #13103

  • Add controller API to get allLiveInstances #12498

  • Add isJson UDF #12603

  • Add list of collaborators to asf.yaml #13346

  • Add locking logic to get consistent table view for upsert tables #12976

  • Add metric to track number of segments missed in upsert-snapshot #12581

  • Add metrics for SEGMENTS_WITH_LESS_REPLICAS monitoring #12336

  • Add mode to allow adding dummy events for non-matching steps #13382

  • Add offset based lag metrics #13298

  • Add protobuf codegen decoder #12980

  • Add retry policy to wait for job id to persist during rebalancing #13372

  • Add round-robin logic during downloadSegmentFromPeer #12353

  • Add schema as input to the decoder. #12981

  • Add splitPartWithLimit and splitPartFromEnd UDFs #12437

  • Add support for creating raw derived columns during segment reload #13037

  • Add support for raw JSON filter predicates #13283

  • Add the possibility of configuring ForwardIndexes with compressionCodec #12218

  • Add upsert-snapshot timer metric #12383

  • Add validation check for forward index disabled if it's a REALTIME table #12838

  • Added PR compatability test against release 1.1.0 #12921

  • Added kafka partition number to metadata. #13447

  • Added pinot-error-code header in query response #12338

  • Added tests for additional data types in SegmentPreProcessorTest.java #12755

  • Adding a cluster config to enable instance pool and replica group configuration in table config #13131

  • Adding batch api support for WindowFunction #12993

  • Adding bytes string data type integration tests #12387

  • Adding registerExtraComponents to allow registering additional components in various services #13465

  • Adding support of insecure TLS #12416

  • Adding support to insecure TLS when creating SSLFactory #12425

  • Adds AGGREGATE_CASE_TO_FILTER rule #12643

  • Adds per-column, query-time index skip option #12414

  • Allow Aggregations in Case Expressions #12613

  • Allow PintoHelixResourceManager subclasses to be used in the controller starter by providing an overridable PinotHelixResouceManager object creator function #13495

  • Allow RequestContext to consider http-headers case-insensitivity #13169

  • Allow Server throttling just before executing queries on server to allow max CPU and disk utilization #12930

  • Allow all raw index config in star-tree index #13225

  • Allow apply both environment variables and system properties to user and table configs, Environment variables take precedence over system properties #13011

  • Allow configurable queryWorkerThreads in Pinot server side GrpcQueryServer #13404

  • Allow dynamically setting the log level even for loggers that aren't already explicitly configured #13156

  • Allow passing custom record reader to be inited/closed in SegmentProcessorFramework #12529

  • Allow passing database context through database http header #12417

  • Allow stop to interrupt the consumer thread and safely release the resource #13418

  • Allow user configurable regex library for queries #13005

  • Allow using 'serverReturnFinalResult' to optimize server partitioned table #13208

  • Assign default value to newly added derived column upon reload #12648

  • Avoid port conflict in integration tests #13390

  • Better handling of null tableNames #12654

  • CLP as a compressionCodec #12504

  • Change helm app version to 1.0.0 for Apache Pinot latest release version #12436

  • Clean Google Dependencies #13297

  • Clean up BrokerRequestHandler and BrokerResponse #13179

  • Clean up arbitrary sleep in /GrpcBrokerClusterIntegrationTest #12379

  • Cleaning up vector index comments and exceptions #13150

  • Cleanup HTTP components dependencies and upgrade Thrift #12905

  • Cleanup Javax and Jakarta dependencies #12760

  • Cleanup deprecated query options #13040

  • Cleanup the consumer interfaces and legacy code #12697

  • Cleanup unnecessary dependencies under pinot-s3 #12904

  • Cleanup unused aggregate internal hint #13295

  • Consistency in API response for live broker #12201

  • Consolidate bouncycastle libraries #12706

  • Consolidate nimbus-jose-jwt version to 9.37.3 #12609

  • ControllerRequestClient accepts headers. Useful for authN tests #13481

  • Custom configuration property reader for segment metadata files #12440

  • Delete database API #12765

  • Deprecate PinotHelixResourceManager#getAllTables() in favour of getAllTables(String databaseName) #12782

  • Detect expired messages in Kafka. Log and set a gauge. #12608

  • Do not hard code resource class in BaseClusterIntegrationTest #13400

  • Do not pause ingestion when upsert snapshot flow errors out #13257

  • Don't drop original field during flatten #13490

  • Don't enforce -realTimeInstanceCount and -offlineInstanceCount options when creating broker tenants #13236

  • Egalpin/skip indexes minor changes #12514

  • Emit Metrics for Broker Adaptive Server Selector type #12482

  • Emit table size related metrics only in lead controller #12747

  • Enable complexType handling in SegmentProcessFramework #12942

  • Enable more integration tests to run on the v2 multi-stage query engine #13467

  • Enabling avroParquet to read Int96 as bytes #12484

  • Enhance Kinesis consumer #12806

  • Enhance Parquet Test #13082

  • Enhance ProtoSerializationUtils to handle class move #12946

  • Enhance Pulsar consumer #12812

  • Enhance PulsarConsumerTest #12948

  • Enhance commit threshold to accept size threshold without setting rows to 0 #12684

  • Enhance json index to support regexp and range predicate evaluation #12568

  • Enhancement: Sketch value aggregator performance #13020

  • Ensure FieldConfig.getEncodingType() is never null #12430

  • Ensure all the lists used in PinotQuery are ArrayList #13017

  • Ensure brokerId and requestId are always set in BrokerResponse #13200

  • Enter segment preloading at partition level #12451

  • Exclude dimensions from star-tree index stored type check #13355

  • Expose more helper API in TableDataManager #13147

  • Extend compatibility verifier operation timeout from 1m to 2m to reduce flakiness #13338

  • Extract json individual array elements from json index for the transform function jsonExtractIndex #12466

  • Fetch query quota capacity utilization rate metric in a callback function #12767

  • First with time #12235

  • GitHub Actions checkout v4 #12550

  • Gzip compression, ensure uncompressed size can be calculated from compressed buffer #12802

  • Handle errors gracefully during multi-stage stats collection in the broker #13496

  • Handle shaded classes in all methods of kafka factory #13087

  • Hash Function for UUID Primary Keys #12538

  • Ignore case when checking for Direct Memory OOM #12657

  • Improve Retention Manager Segment Lineage Clean Up #13232

  • Improve error message for max rows in join limit breach #13394

  • Improve exception logging when we fail to index / transform message #12594

  • Improve logging in range index handler for index updates #13381

  • Improve upsert compaction threshold validations #13424

  • Improve warn logs for requesting validDocID snapshots #13280

  • Improved metrics for server grpc query #13177

  • Improved null check for varargs #12673

  • Improved segment build time for Lucene text index realtime to offline conversion #12744

  • In ClusterTest, make start port higher to avoid potential conflict with Kafka #13402

  • Introduce PinotLogicalAggregate and remove internal hint #13291

  • Introduce retries while creating stream message decoder for more robustness #13036

  • Isolate bad server configs during broker startup phase #12931

  • Issue #12367 #12922

  • Json extract index filter support #12683

  • Json extract index mv #12532

  • Keep get tables API with and without database #12804

  • Lint failure #12294

  • Logging a warn message instead of throwing exception #12546

  • Made the error message around dimension table size clearer #13163

  • Make Helix state transition handling idempotent #12886

  • Make KafkaConsumerFactory method less restrictive to avoid incompatibility #12815

  • Make task manager APIs database aware #12766

  • Metric for count of tables configured with various tier backends #12940

  • Metric for upsert tables count #12505

  • Metrics for Realtime Rows Fetched and Stream Consumer Create Exceptions #12522

  • Minmaxrange null #12252

  • Modify consumingSegmentsInfo endpoint to indicate how many servers failed #12523

  • Move offset validation logic to consumer classes #13015

  • Move package org.apache.calcite to org.apache.pinot.calcite #12837

  • Move resolveComparisonTies from addOrReplaceSegment to base class #13396

  • Move some mispositioned tests under pinot-core #12884

  • Move wildfly-openssl dependency management to root pom #12597

  • Moving deleteSegment call from POST to DELETE call #12663

  • Optimize unnecessary extra array allocation and conversion for raw derived column during segment reload #13115

  • Pass explicit TypeRef when evaluating MV jsonPath #12524

  • Percentile operations supporting null #12271

  • Prepare for next development iteration #12530

  • Propagate Disable User Agent Config to Http Client #12479

  • Properly handle complex type transformer in segment processor framework #13258

  • Properly return response if SegmentCompletion is aborted #13206

  • Publish helm 0.2.8 #12465

  • Publish helm 0.2.9 #13230

  • Pull janino dependency to root pom #12724

  • Pull pulsar version definitaion into root POM #13002

  • Query response opt #13420

  • Re-enable the Spotless plugin for Java 21 #12992

  • Readme - How to setup Pinot UI for development #12408

  • Record enricher #12243

  • Refactor PinotTaskManager class #12964

  • Refactored CommonsConfigurationUtils for loading properties configuration. #13201

  • Refactored compatibility-verifier module #13359

  • Refactoring removeSegment flow in upsert #13449

  • Refine PeerServerSegmentFinder #12933

  • Refine SegmentFetcherFactory #12936

  • Replace custom fmpp plugin with fmpp-maven-plugin #12737

  • Reposition query submission spot for adaptive server selection #13327

  • Reset controller port when stopping the controller in ControllerTest #13399

  • Rest Endpoint to Create ZNode #12497

  • Return clear error message when no common broker found for multi-stage query with tables from different tenants #13235

  • Returning tables names failing authorization in Exception of Multi State Engine Queries #13195

  • Revert " Adding record reader config/context param to record transformer (#12520)" #12526

  • Revert "Using local copy of segment instead of downloading from remote (#12863)" #13114

  • Short circuit SubPlanFragmenter because we don't support multiple sub-plans yet #13306

  • Simplify Google dependencies by importing BOM #12456

  • Specify version for commons-validator #12935

  • Support NOT in StarTree Index #12988

  • Support empty strings as json nodes^ #12555

  • Supporting human-readable format when configuring broker response size #12510

  • Use ArrayList instead of LinkedList in SortOperator #12783

  • Use a two server setup for multi-stage query engine backward compatibility regression test suite #13371

  • Use more efficient variants of URLEncoder::encode and URLDecoder::decode #13030

  • Use parameterized log messages instead of string concatenation #13145

  • Use separate action for /tasks/scheduler/jobDetails API #13054

  • Use try-with-resources to close file walk stream in LocalPinotFS #13029

  • Using local copy of segment instead of downloading from remote #12863

  • [Adaptive Server Selector] Add metrics for Stats Manager Queue Size #12340

  • [Cleanup] Move classes in pinot-common to the correct package #13478

  • [Feature] Add Support for SQL Formatting in Query Editor #11725

  • [HELM]: Added additional probes options and startup probe. #13165

  • [HELM]: Added checksum config annotation in stateful set for broker, controller and server #13059

  • [HELM]: Added namespace support in K8s deployment. #13380

  • [HELM]: zookeeper chart upgrade to version 13.2.0 #13083

  • [Minor] Add Nullable annotation to HttpHeaders in BrokerRequestHandler #12816

  • [Minor] Small refactor of raw index creator constructor to be more clear #13093

  • [Multi-stage] Clean up RelNode to Operator handling #13325

  • [null-aggr] Add null handling support in mode aggregation #12227

  • [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager #13256

  • [spark-connector] Add option to fail read when there are invalid segments #13080

  • add Netty arm64 dependencies #12493

  • add Netty unit test #12486

  • add SegmentContext to collect validDocIds bitmaps for many segments together #12694

  • add skipUnavailableServers query option #13387

  • add insecure mode when Pinot uses TLS connections #12525

  • add instrumentation to json index getMatchingFlattenedDocsMap() #13164

  • add jmx to promethues metric exporting rule for realtimeRowsFiltered #12759

  • add metrics for IdeaState update #13266

  • add some metrics for upsert table preloading #12722

  • add some tests on jsonPathString #12954

  • add test cases in RequestUtilsTest #12557

  • add unit test for JsonAsyncHttpPinotClientTransport #12633

  • add unit test for QueryServer #12599

  • add unit test for ServerChannels #12616

  • add unit test for StringFunctions encodeUrl #13391

  • add unit tests for pinot-jdbc-client #13137

  • add url assertion to SegmentCompletionProtocolTest #13373

  • adjust the llc partition consuming metric reporting logic #12627

  • allow passing null http headers object to translateTableName #12764

  • allow to set segment when use SegmentProcessorFramework #13341

  • auto renew jvm default sslconext when it's loaded from files #12462

  • avoid useless intermediate byte array allocation for VarChunkV4Reader's getStringMV #12978

  • aws sdk 2.25.3 #12562

  • build-helper-maven-plugin 3.5.0 #12548

  • cache ssl contexts and reuse them #12404

  • clean up jetbrain nullable annotation #13427

  • cleanup: maven no transfer progress #12444

  • close JDBC connections #12494

  • do not fail on duplicate relaxed vars (#13214)z

  • dropwizard metrics 4.2.25 #12600

  • dynamic chunk sizing for v4 raw forward index #12945

  • enable Netty leak detection #12483

  • enable parallel Maven in pinot linter script #12751

  • ensure inverse And/OrFilterOperator implementations match the query #13199

  • exclude .mvn directory from source assembly #12558

  • extend CompactedPinotSegmentRecordReader so that it can skip deleteRecord #13352

  • get startTime outside the executor task to avoid flaky time checks #13250

  • handle absent segments so that catchup checker doesn't get stuck on them #12883

  • handle overflow for MutableOffHeapByteArrayStore buffer starting size #13215

  • handle segments not tracked by partition mgr and add skipUpsertView query option #13415

  • handle table name translation on missed api resources #12792

  • hash4j version upgrade to 0.17.0 #12968

  • including the underlying exception in the logging output #13248

  • int96 parity with native parquet reader #12496

  • jsonExtractIndex support array of default values #12748

  • log the log rate limiter rate for dropped broker logs #13041

  • make http listener ssl config swappable #12455

  • make reflection calls compatible with 0.9.11 [#12958](https://github.com/apache/

  • maven: no transfer progress #12528

  • missed to delete the temp dir #12637

  • move shouldReplaceOnComparisonTie to base class to be more reusable #13353

  • reduce Java enum .values() usage in TimerContext #12579

  • reduce logging for SpecialValueTransformer #12970

  • reduce regex pattern compilation in Pinot jdbc #13138

  • refactor TlsUtils class #12515

  • refine when to registerSegment while doing addSegment and replaceSegment for upsert tables for better data consistency #12709

  • reformat AdminConsoleIntegrationTest.java #12552

  • reformat ClusterTest.java #12531

  • release segment mgrs more reliably #13216

  • replaced getServer with getServers #12545

  • report rebalance job status for the early returns like noops #13281

  • require noDictionaryColumns with aggregationConfigs #12464

  • share the same table config object #12463

  • track segments for snapshotting even if they lost all comparisons #13388

  • untrack the segment out of TTL #12449

  • update ControllerJobType from enum to string #12518

  • update RewriterConstants so that expr min max would not collide with columns start with "parent" #13357

  • update access control check error handling to catch throwable and log errors #13209

Bug Fixes

  • Use gte(lte) to replace between() which has a bug #12595
  • Fix the ConcurrentModificationException for And/Or DocIdSet #12611
  • Upgrade RoaringBitmap to 1.0.5 to pick up the fix for RangeBitmap.between() #12604
  • bugfix: do not move src ByteBuffer position for LZ4 length prefixed decompress #12539
  • Bug Fix createDictionaryForColumn does not take into account inverted index #13048
  • fix Cluster Manager error #12632
  • fix for quick start Cluster Manager issue #12610
  • Adding config for having suffix for client ID for realtime consumer #13168
  • Addressed comments and fixed tests from pull request 12389. /uptime and /start-time endpoints working all components #12512
  • Bigfix. Added missing paramName #13060
  • Bug fix: Do not ignore scheme property #12332
  • Bug fix: Handle missing shade config overwrites for Kafka #13437
  • BugFix: Fix merge result from more than one server #12778
  • Bugfix. Allow tenant rebalance with downtime as true #13246
  • Bugfix. Avoid passing null table name input to translation util #12726
  • Bugfix. Correct wrong method call from scheduleTask() to scheduleTaskForDatabase() #12791
  • Bugfix. Maintain literal data type during function evaluation #12607
  • Cleanup: Fix grammar in error message, also improve readability. #13451
  • Fix Bug in Handling Equal Comparison Column Values in Upsert #12395
  • Fix ColumnMinMaxValueGenerator #12502
  • Fix JavaEE related dependencies #13058
  • Fix Logging Location for CPU-Based Query Killing #13318
  • Fix PulsarUtils to not share buffer #12671
  • Fix URI construction so that AddSchema command line tool works when override flag is set to true #13320
  • Fix [Type]ArrayList elements() method usage #13354
  • Fix a typo when calculating query freshness #12947
  • Fix an overflow in PinotDataBuffer.readFrom #13152
  • Fix bug in logging in UpsertCompaction task #12419
  • Fix bug to return validDocIDsMetadata from all servers #12431
  • Fix connection issues if using JDBC and Hikari (#12267) #12411
  • Fix controller host / port / protocol CLI option description for admin commands #13237
  • Fix environment variables not applied when creating table #12560
  • Fix error message for insufficient number of untagged brokers during tenant creation #13234
  • Fix few metric rules which were affected by the database prefix handling #13290
  • Fix file handle leaks in Pinot Driver (#12263) #12356
  • Fix flakiness of ControllerPeriodicTasksIntegrationTest #13337
  • Fix issue with startree index metadata loading for columns with '__' in name #12554
  • Fix metric rule pattern regex #12856
  • Fix pinot-parquet NoClassFound issue #12615
  • Fix segment size check in OfflineClusterIntegrationTest #13389
  • Fix some resource leak in tests #12794
  • Fix the NPE from IS update metrics #13313
  • Fix the NPE when metadataTTL is enabled without delete column #13262
  • Fix the ServletConfig loading issue with swagger. #13122
  • Fix the issue that map flatten shouldn't remove the map field from the record #13243
  • Fix the race condition for H3InclusionIndexFilterOperator #12487
  • Fix the time segment pruner on TIMESTAMP data type #12789
  • Fix time stats in SegmentIndexCreationDriverImpl #13429
  • Fixed infer logical type name from avro union schema #13224
  • Fixing instance type to resolve #12677 and #12678
  • Helm: bug fix for chart rendering issue. #13264
  • Try to amend kafka common package with pinot shaded package prefix #13056
  • Update getValidDocIdsMetadataFromServer to make call in batches to servers and other bug fixes #13314
  • Upgrade com.microsoft.azure:msal4j from 1.3.5 to 1.3.10 for CVE fixing #12580
  • [bugfix] Handling null value for kafka client id suffix #13279
  • bugfix: fixing jdbc client sql feature not supported exception #12480
  • bugfix: re-add support for not text_match #12372
  • bugfix: reduce enum array allocation in QueryLogger #12478
  • bugfix: use consumerDir during lucene realtime segment conversion #13094
  • cleanup: fix apache rat violation #12476
  • fix GuavaRateLimiter acquire method #12500
  • fix fieldsToRead class not in decoder #13186
  • fix flakey test, avoid early finalization #13095
  • fix merging null multi value in partial upsert #13031
  • fix race condition in ScalingThreadPoolExecutor #13360
  • fix shared buffer, tests #12587
  • fix(build): update node version to 16 #12924
  • fixing CVE critical issues by resolving kerby/jline and wildfly libraries #12566
  • fixing pinot-adls high severity CVEs #12571
  • fixing swagger setup using localhost as host name #13254
  • swagger-ui upgrade to 5.15.0 Fixes #12908
  • upgrade jettison version to fix CVE #12567