Druid v0.14.0 Release Notes

Release Date: 2019-04-09 // about 5 years ago
  • ๐Ÿ“š Apache Druid (incubating) 0.14.0-incubating contains over 200 new features, performance/stability/documentation improvements, and bug fixes from 54 contributors. Major new features and improvements include:

    • ๐Ÿ†• New web console
    • Amazon Kinesis indexing service
    • Decommissioning mode for Historicals
    • Published segment cache in Broker
    • Bloom filter aggregator and expression
    • โšก๏ธ Updated Apache Parquet extension
    • ๐Ÿ‘ฎ Force push down option for nested GroupBy queries
    • ๐Ÿ‘ Better segment handoff and drop rule handling
    • ๐Ÿ‘ท Automatically kill MapReduce jobs when Apache Hadoop ingestion tasks are killed
    • ๐Ÿ‘ DogStatsD tag support for statsd emitter
    • ๐Ÿ†• New API for retrieving all lookup specs
    • ๐Ÿ†• New compaction options
    • More efficient cachingCost segment balancing strategy

    The full list of changes is here: https://github.com/apache/incubator-druid/pulls?q=is%3Apr+is%3Amerged+milestone%3A0.14.0

    ๐Ÿ“š Documentation for this release is at: http://druid.io/docs/0.14.0-incubating/

    Highlights

    ๐Ÿ†• New web console

    new-druid-console

    ๐ŸŒ Druid has a new web console that provides functionality that was previously split between the coordinator and overlord consoles.

    ๐Ÿ”ง The new console allows the user to manage datasources, segments, tasks, data processes (Historicals and MiddleManagers), and coordinator dynamic configuration. The user can also run SQL and native Druid queries within the console.

    ๐Ÿ‘€ For more details, please see http://druid.io/docs/0.14.0-incubating/operations/management-uis.html

    โž• Added by @vogievetsky in #6923.

    Kinesis indexing service

    ๐Ÿ‘ Druid now supports ingestion from Kinesis streams, provided by the new druid-kinesis-indexing-service core extension.

    ๐Ÿ‘€ Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/kinesis-ingestion.html for details.

    โž• Added by @jsun98 in #6431.

    Decommissioning mode for Historicals

    ๐Ÿšš Historical processes can now be put into a "decommissioning" mode, where the coordinator will no longer consider the Historical process as a target for segment replication. The coordinator will also move segments off the decommissioning Historical.

    ๐Ÿ‘€ This is controlled via Coordinator dynamic configuration. For more details, please see http://druid.io/docs/0.14.0-incubating/configuration/index.html#dynamic-configuration.

    โž• Added by @egor-ryashin in #6349.

    Published segment cache on Broker

    ๐Ÿ“‡ The Druid Broker now has the ability to maintain a cache of published segments via polling the Coordinator, which can significantly improve response time for metadata queries on the sys.segments system table.

    ๐Ÿ“‡ Please see http://druid.io/docs/0.14.0-incubating/querying/sql.html#retrieving-metadata for details.

    โž• Added by @surekhasaharan in #6901

    Bloom filter aggregator and expression

    ๐Ÿ‘ A new aggregator for constructing Bloom filters at query time and support for performing Bloom filter checks within Druid expressions have been added to the druid-bloom-filter extension.

    ๐Ÿ‘€ Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/bloom-filter.html

    โž• Added by @clintropolis in #6904 and #6397

    โšก๏ธ Updated Parquet extension

    ๐Ÿšš druid-extensions-parquet has been moved into the core extension set from the contrib extensions and now supports flattening and int96 values.

    ๐Ÿ‘€ Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/parquet.html for details.

    โž• Added by @clintropolis in #6360

    ๐Ÿ‘ฎ Force push down option for nested GroupBy queries

    Outer query execution for nested GroupBy queries can now be pushed down to Historical processes; previously, the outer queries would always be executed on the Broker.

    ๐Ÿ‘€ Please see #5471 for details.

    โž• Added by @samarthjain in #5471.

    ๐Ÿ‘ Better segment handoff and retention rule handling

    Segment handoff will now ignore segments that would be dropped by a datasource's retention rules, avoiding ingestion failures caused by issue #5868.

    0๏ธโƒฃ Period load rules will now include the future by default.

    ๐Ÿ‘€ A new "Period Drop Before" rule has been added. Please see http://druid.io/docs/0.14.0-incubating/operations/rule-configuration.html#period-drop-before-rule for details.

    โž• Added by @QiuMM in #6676, #6414, and #6415.

    ๐Ÿ‘ท Automatically kill MapReduce jobs when Hadoop ingestion tasks are killed

    ๐Ÿ‘ท Druid will now automatically terminate MapReduce jobs created by Hadoop batch ingestion tasks when the ingestion task is killed.

    โž• Added by @ankit0811 in #6828.

    ๐Ÿ‘ DogStatsD tag support for statsd-emitter

    ๐Ÿ’… The statsd-emitter extension now supports DogStatsD-style tags. Please see http://druid.io/docs/0.14.0-incubating/development/extensions-contrib/statsd.html

    โž• Added by @deiwin in #6605, with support for constant tags added by @glasser in #6791.

    ๐Ÿ†• New API for retrieving all lookup specs

    ๐Ÿ‘€ A new API for retrieving all lookup specs for all tiers has been added. Please see http://druid.io/docs/0.14.0-incubating/querying/lookups.html#get-all-lookups for details.

    โž• Added by @jihoonson in #7025.

    ๐Ÿ†• New compaction options

    ๐Ÿ‘€ Auto-compaction now supports the maxRowsPerSegment option. Please see http://druid.io/docs/0.14.0-incubating/design/coordinator.html#compacting-segments for details.

    ๐Ÿ‘€ The compaction task now supports a new segmentGranularity option, deprecating the older keepSegmentGranularity option for controlling the segment granularity of compacted segments. Please see the segmentGranularity table in http://druid.io/docs/0.14.0-incubating/ingestion/compaction.html for more information on these properties.

    โž• Added by @jihoonson in #6758 and #6780.

    More efficient cachingCost segment balancing strategy

    ๐Ÿ‘ท The cachingCost Coordinator segment balancing strategy will now only consider Historical processes for balancing decisions. Previously the strategy would unnecessarily consider active worker tasks as well, which are not targets for segment replication.

    โž• Added by @QiuMM in #6879.

    ๐Ÿ†• New metrics:

    • ๐Ÿ†• New allocation rate metric jvm/heapAlloc/bytes, added by @egor-ryashin in #6710.
    • ๐Ÿ†• New query count metric query/count, added by @QiuMM in #6473.
    • SQL query metrics sqlQuery/bytes and sqlQuery/time, added by @gaodayue in #6302.
    • Apache Kafka ingestion lag metrics ingest/kafka/maxLag and ingest/kafka/avgLag, added by @QiuMM in #6587
    • Task count metrics task/success/count, task/failed/count, task/running/count, task/pending/count, task/waiting/count, added by @QiuMM in #6657

    ๐Ÿ†• New interfaces for extension developers

    RequestLogEvent

    ๐Ÿ‘€ It is now possible to control the fields in RequestLogEvent, emitted by EmittingRequestLogger. Please see #6477 for details. Added by @leventov.

    Custom TLS certificate checks

    ๐Ÿ‘€ An extension point for custom TLS certificate checks has been added. Please see http://druid.io/docs/0.14.0-incubating/operations/tls-support.html#custom-tls-certificate-checks for details. Added by @jon-wei in #6432.

    Kafka Indexing Service no longer experimental

    ๐Ÿšš The Kafka Indexing Service extension has been moved out of experimental status.

    SQL Enhancements

    โœจ Enhancements to dsql

    ๐Ÿ‘ The dsql command line client now supports CLI history, basic autocomplete, and specifying query timeouts in the query context.

    โž• Added in #6929 by @gianm.

    โž• Add SQL id, request logs, and metrics

    ๐Ÿ”Š SQL queries now have an ID, and native queries executed as part of a SQL query will have the associated SQL query ID in the native query's request logs. SQL queries will now be logged in the request logs.

    Two new metrics, sqlQuery/time and sqlQuery/bytes, are now emitted for SQL queries.

    ๐Ÿ‘€ Please see http://druid.io/docs/0.14.0-incubating/configuration/index.html#request-logging and http://druid.io/docs/0.14.0-incubating/querying/sql.html#sql-metrics for details.

    โž• Added by @gaodayue in #6302

    ๐Ÿ‘ More SQL aggregator support

    ๐Ÿ‘ The follow aggregators are now supported in SQL:

    • DataSketches HLL sketch
    • DataSketches Theta sketch
    • DataSketches quantiles sketch
    • ๐Ÿ›  Fixed bins histogram
    • Bloom filter aggregator

    โž• Added by @jon-wei in #6951 and @clintropolis in #6502

    Other SQL enhancements

    • ๐Ÿ‘ SQL: Add support for queries with project-after-semijoin. #6756
    • ๐Ÿ‘ SQL: Support for selecting multi-value dimensions. #6462
    • ๐Ÿ‘ SQL: Support AVG on system tables. #601
    • SQL: Add "POSITION" function. #6596
    • SQL: Set INFORMATION_SCHEMA catalog name to "druid". #6595
    • SQL: Fix ordering of sort, sortProject in DruidSemiJoin. #6769

    โž• Added by @gianm.

    โšก๏ธ Updating from 0.13.0-incubating and earlier

    โฌ†๏ธ Kafka ingestion downtime when upgrading

    โฌ†๏ธ Due to the issue described in #6958, existing Kafka indexing tasks can be terminated unnecessarily during a rolling upgrade of the Overlord. The terminated tasks will be restarted by the Overlord and will function correctly after the initial restart.

    Parquet extension changes

    ๐Ÿš€ The druid-parquet-extensions extension has been moved from contrib to core. When deploying 0.14.0-incubating, please ensure that your extensions-contrib directory does not have any older versions of the Parquet extension.

    โž• Additionally, there are now two styles of Parquet parsers in the extension:

    • ๐Ÿ“œ parquet-avro: Converts Parquet to Avro, and then parses the Avro representation. This was the existing parser prior to 0.14.0-incubating.
    • ๐Ÿ“œ parquet: A new parser that parses the Parquet format directly. Only this new parser supports int96 values.

    โšก๏ธ Prior to 0.14.0-incubating, a specifying a parquet type parser would have a task use the Avro-converting parser. In 0.14.0-incubating, to continue using the Avro-converting parser, you will need to update your ingestion specs to use parquet-avro instead.

    ๐Ÿ“œ The inputFormat field in the inputSpec for tasks using Parquet input must also match the choice of parser:

    • parquet: org.apache.druid.data.input.parquet.DruidParquetInputFormat
    • parquet-avro: org.apache.druid.data.input.parquet.DruidParquetInputFormat

    ๐Ÿ‘€ Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/parquet.html for details.

    โš™ Running Druid with non-2.8.3 Hadoop

    If you plan to use Druid 0.14.0-incubating with Hadoop versions other than 2.8.3, you may need to do the following:

    • ๐Ÿ“„ Set the Hadoop dependency coordinates to your target version as described in http://druid.io/docs/0.14.0-incubating/operations/other-hadoop.html under Tip #3: Use specific versions of Hadoop libraries.
    • ๐Ÿ— Rebuild Druid with your target version of Hadoop by changing hadoop.compile.version in the main Druid pom.xml and then following the standard build instructions.

    Other Behavior changes

    Old task cleanup

    ๐Ÿ“‡ Old task entries in the metadata storage will now be cleaned up automatically together with their task logs. Please see http:/druid.io/docs/0.14.0-incubating/development/extensions-core/configuration/index.html#task-logging and #6592 for details.

    Automatic processing buffer sizing

    0๏ธโƒฃ The druid.processing.buffer.sizeBytes property has new default behavior if it is not set. Druid will now automatically choose a value for the processing buffer size using the following formula:

    processingBufferSize = totalDirectMemory / (numMergeBuffers + numProcessingThreads + 1)
    processingBufferSize = min(processingBufferSize, 1GB)
    

    Where:

    • totalDirectMemory: The direct memory limit for the JVM specified by -XX:MaxDirectMemorySize
    • numMergeBuffers: The value of druid.processing.numMergeBuffers.
    • numProcessingThreads: The value of druid.processing.numThreads.

    At most, Druid will use 1GB for the automatically chosen processing buffer size. The processing buffer size can still be specified manually.

    ๐Ÿ‘€ Please see #6588 for details.

    0๏ธโƒฃ Retention rules now include the future by default

    ๐Ÿ‘€ Please be aware that new retention rules will now include the future by default. Please see #6414 for details.

    Property changes

    Segment announcing

    โšก๏ธ The druid.announcer.type property used for choosing between Zookeeper or HTTP-based segment management/discovery has been moved to druid.serverview.type. If you were using http prior to 0.14.0-incubating, you will need to update your configs to use the new druid.serverview.type.

    ๐Ÿ‘€ Please see the following for details:

    ๐Ÿ›  fix missing property in JsonTypeInfo of SegmentWriteOutMediumFactory

    0๏ธโƒฃ The druid.peon.defaultSegmentWriteOutMediumFactory.@type property has been fixed. The property is now druid.peon.defaultSegmentWriteOutMediumFactory.type without the "@".

    ๐Ÿ‘€ Please see #6656 for details.

    ๐Ÿ—„ Deprecations

    Approximate Histogram aggregator

    ๐Ÿ—„ The [ApproximateHistogram](http:/druid.io/docs/0.14.0-incubating/development/extensions-core/approximate-histograms.html) aggregator has been deprecated; it is a distribution-dependent algorithm without formal error bounds and has significant accuracy issues.

    ๐Ÿ“„ The [DataSketches quantiles](http:/druid.io/docs/0.14.0-incubating/development/extensions-core/datasketches-quantiles.html) aggregator should be used instead for quantile and histogram use cases.

    ๐Ÿ‘€ Please see Histogram and Quantiles Aggregators

    Cardinality/HyperUnique aggregator

    ๐ŸŽ The Cardinality and HyperUnique aggregators have been deprecated in favor of the [DataSketches HLL](http:/druid.io/docs/0.14.0-incubating/development/extensions-core/datasketches-hll.html) aggregator and Theta Sketch aggregator. These aggregators have better accuracy and performance characteristics.

    ๐Ÿ‘€ Please see Count Distinct Aggregators for details.

    Query Chunk Period

    ๐Ÿ‘€ The chunkPeriod query context configuration is now deprecated, along with the associated query/intervalChunk/time metric. Please see #6591 for details.

    keepSegmentGranularity for Compaction

    ๐Ÿ‘€ The keepSegmentGranularity option for compaction tasks has been deprecated. Please see #6758 and the segmentGranularity table in http://druid.io/docs/0.14.0-incubating/ingestion/compaction.html for more information on these properties.

    Interface changes for extension developers

    SegmentId class

    ๐Ÿ‘€ Druid now uses a SegmentId class instead of plain Strings to represent segment IDs. Please see #6370 for details.

    โž• Added by @leventov.

    ๐Ÿšš druid-api, druid-common, java-util moved to druid-core

    โšก๏ธ The druid-api, druid-common, java-util modules have been moved into druid-core. Please update your dependencies accordingly if your project depended on these libraries.

    ๐Ÿ‘€ Please see #6443 for details.

    Credits

    ๐Ÿš€ Thanks to everyone who contributed to this release!

    @a2l007
    @AlexanderSaydakov
    @anantmf
    @ankit0811
    @asdf2014
    @awelsh93
    @benhopp
    @Caroline1000
    ๐Ÿ‘• @clintropolis
    @dclim
    @deiwin
    @DiegoEliasCosta
    @drcrallen
    @dyf6372
    @Dylan1312
    @egor-ryashin
    @elloooooo
    @evans
    @FaxianZhao
    @gaodayue
    @gianm
    @glasser
    @Guadrado
    @hate13
    @hoesler
    @hpandeycodeit
    @janeklb
    @jihoonson
    @jon-wei
    @jorbay-au
    @jsun98
    @justinborromeo
    @kamaci
    @leventov
    @lxqfy
    @mirkojotic
    @navkumar
    @niketh
    @patelh
    @pzhdfy
    @QiuMM
    @rcgarcia74
    @richardstartin
    @robertervin
    @samarthjain
    @seoeun25
    @Shimi
    @surekhasaharan
    @taiii
    @thomask
    @VincentNewkirk
    @vogievetsky
    @yunwan
    @zhaojiandong