Changelog History
Page 3
-
v0.22.0 Changes
January 23, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.22.0
π See also: example docker compose files in english and dutch.Contains Breaking Change!
π Note: While this release contains no API-level breaking changes, the internals have changes so much, that we recommend not to simply replace your existing Weaviate container with the new one. Instead you should create a new cluster and reimport our things and actions. See changelog below for more detailed reasons why.
π₯ Breaking Changes
π Improve cross-reference storing strategy (#1069)
π Prior to this release Weaviate would build an automated cache of referenced objects. This led to very fast response time for nested queries, at the cost of large disk usage. We have since learned that disk usage can be so excessive in heavily connected graphs that the benefits don't outweigh the costs. In addition configuring cache boundaries led to unnecessary complexity.The major goal of 0.22.0 was to replace automated denormalization caching with a smarter strategy without losing the snappiness of cached results and the overall low latencies of queries our users have come to appreciate.
π We believe we have found a good strategy with this release, by implementing smarter query strategies to keep inter-container traffic to a minimum and use our backing storage in a way it performs well.
This boils down to the following advantages that 0.22.0 provides over 0.21.x:
- Feature parity No feature got lost through the rewrite. If it worked with 0.21.x it works with 0.22.x. If you think otherwise, please open an issue
- Much smaller disk footprint Since we don't excessively normalize references anymore, the disk footprint got much smaller. Essentially the size on disk is now
(object size + vector size + index overheads) * desiredReplication
. The amount of cross-references no longer has a direct impact on disk space (other than storing the link itself which is effectively the size of the bytes in aweaviate://...
beacon) - No depth limit on nested filters Prior to this release a filter on a cross ref prop, such as
path: ["inCity", "City", "inCountry", "Country", "name"]
had a limit. It would only work within a cache boundary. This limitation is now gone and you can filter as deep as you like. Please note that an excessively deep query will have a perfomance impact. - Smaller CPU impact during imports Prior to this release we'd spent a share of the available resources on building a denormalized cache asynchronously after importing a connected object. Without having to build such a cache, more performance on imports is available for storing, vectorizing and indexing objects.
π Please note that caching was previously done at import time. We recommend not to try to upgrade a 0.21.x cluster, but instead creating a new cluster and reimporting. This is the only way to guarantee your cluster won't have cache leftovers which can impact performance.
π New Features
none
π Fixes
- #967 became obsolete through this change
-
v0.21.12 Changes
January 17, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.21.12
π See also: example docker compose files in english and dutch.π₯ Breaking Changes
none
π New Features
none
π Fixes
- π Improved Contextionary Weighting Algorithm
π This release updates the default contextionary version to...v0.4.6
which includes an improved weighting algorithms. Prior to this release the occurrence-based weighting was done with a linear algorithm. This often led to unimportant words getting too much weight. The latest version uses a logarithmic approach. With this approach we were able to improve the accuracy of classifications done with weaviate.
β‘οΈ The example docker-compose files linked above have already been updated. If you're not using them, make sure to update the contextionary version accordingly in your setup.
β‘οΈ This change is non-breaking. Keep in mind that object vectorization happens at import time. So if you want all your objects to benefit from the updated algorithm, you should reimport them.
If you aren't happy with the results and would like to use the classic linear approach, you can force the contextionary to do so, by setting the environment variable
OCCURRENCE_WEIGHT_STRATEGY=linear
for the contextionary (!) service. It defaults tolog
. - π Improved Contextionary Weighting Algorithm
-
v0.21.11 Changes
January 16, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.21.11
π See also: example docker compose files in english and dutch.π₯ Breaking Changes
none
π New Features
π Entity Merging (#975)
π Entity merging allows you to deduplicate results. If you have several objects which describe the same physical entity, e.g. "Google Inc." and "Google Incorporated" (they both describe the real-world company "Google"), you can hide duplicates or even let Weaviate merge duplicates into a single entity.Usage
Usage is best described in the following three example screenshots.
π No grouping/merging
π First up is the behavior without any grouping or merging strategy. As you can see there are a lot of duplicates:Grouping strategy
closest
π With strategyclosest
Weaviate tries to build groups based on your results. For each group it will show the results closest to your search query. Note that there is also aforce
field. The higher the force the more likely Weaviate is going to group two objects together. Theforce: 1.0
would mean that every single item, no matter how different should be grouped. Aforce: 0
means that only exactly identical items should be grouped. The example below usesforce: 0.1
as that yielded the best results. You can see that no more company names are duplicated:π Grouping strategy
merge
π The example above hides duplicates. This isn't an issue if every single field is identical. But what if you need to know the original values. Strategymerge
will keep the contents of the original fields. String fields contain all original values as shown below, numerical fields display a mean and reference fields contain all the references from all merged objects:Best Practices
To get the best possible results, please keep the following things in mind:
- The grouping/merging is done internally based on vector distance. It is thus important that the items to be merged are as close to each other as possible. If your items use a lot of words which are not recognized by the contextionary, those words do not influence the vector position. In this case consider extending the contextionary using the REST API (
/c11y/extensions
), so that it understands more words from your object - You get the best possible results if noise is removed in vectorization, we thus strongly recommend setting
vectorizeClassName: false
andvectorizePropertyName: false
for each property. Those settings were introduced in 0.21.10.
π Fixes
none
- The grouping/merging is done internally based on vector distance. It is thus important that the items to be merged are as close to each other as possible. If your items use a lot of words which are not recognized by the contextionary, those words do not influence the vector position. In this case consider extending the contextionary using the REST API (
-
v0.21.10
January 15, 2020 -
v0.21.9 Changes
January 09, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.21.9
π See also: example docker compose files in english and dutch.π₯ Breaking Changes
π New Features
π Fixes
- Custom concept not visible on
/c11y/concepts
endpoint (#1061)
π Fixed by upgrading to c11y version...v0.4.4
which contains the fix semi-technologies/contextionary#20
- Custom concept not visible on
-
v0.21.8 Changes
January 07, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.21.8
π See also: example docker compose files in english and dutch.π₯ Breaking Changes
none
π New Features
none
π Fixes
Missing limit parameter in Aggregation (#1058)
It wasn't possible to limit (or increase) the amount of groups when making a grouped aggregation, this addresses this by adding alimit: <int>
Option toAggregate{ Things { Class() } }
Missing limit parameter in stringProp aggregation's field
topOccurrences
(#992)
π It wasn't possible to limit (or increase) the amount of string prop results usingstringProp { topOccurrences { value occurs } }
which always defaulted to 5. This release introduces alimit: <int>
field, so that the size of the result buckets can be freely set by the user, e.g.stringProp { topOccurrences(limit: 2000) { value occurs } }
-
v0.21.7 Changes
December 19, 2019 -
v0.21.6 Changes
December 18, 2019π³ Docker image/tag:
semitechnologies/weaviate:0.21.6
π See also: example docker compose files in english and dutch.π₯ Breaking Changes
none
π New Features
- π Better defaults for Replication/Sharding (#1014)
0οΈβ£ This affects the underlying Elasticsearch database ("esvector"). Sets reasonable defaults, but can also be overwritten in config usingvectorIndex.numberOfShards: <int>
as well asvectorIndex.autoExpandReplicas: <string>
. The fields behave like their equivalents in Elasticsearch.
π Fixes
Code base is now compatible with Go 1.13 (#1056)
This mostly resolvedgo module
incompatibilities.Supernodes are no longer cached (#1053)
This is a simple fix that will need a more elaborate solution later one. Right now, a supernode is simply not cached at all. When traversing the graph the references are thus resolved in real-time. This means a supernode can currently not be used in a "search by reference" where filter, e.g. on a classCity
withpath: ["hasRestaurants", "Restaurant", "name"]
will not find those Cities which are considered super nodes.0οΈβ£ By default any class with at least 100 outgoing references is considered a supernode, but this setting can be overwritten by setting
vectorIndex.supernodeThreshold: int
."No value given" error when using
valueText
filter (#1048)
This error only affected the GraphQL where filter, this issue was not present in the REST where filter (currently only available on the classifications API) - π Better defaults for Replication/Sharding (#1014)
-
v0.21.5 Changes
December 06, 2019π³ Docker image/tag:
semitechnologies/weaviate:0.21.5
π See also: example docker compose files in english and dutch.π₯ Breaking Changes
none
π New Features
Set
where
filters in all classification types (#985)
This feature allows narrowing down objects in a classification through setting where filters.A total of three filters can be set (all three are optional).
sourceWhere
: This limits the to-be-classified (or "unclassified") items which will be processed during a classification run.trainingSetWhere
: This limits the training set. This filter can only be used with classification types which rely on a training set, such as"type": "knn"
targetWhere
: This limits the potential targets (or "labels") of a classification. This filter can only be used on classification types which don't have a training set, but rather produce a direct relationship between source and target, such as"type": "contextual"
π For more elaborate examples on when to use which filter, see this post.
The type/structure of the
where
filter object for all three options is identical to the those of the existingwhere
filters currently present in the GraphQL API.π Fixes
- 0οΈβ£ Incorrect defaults in classification of type
contextual
(#1045)
0οΈβ£ Prior to this version the optional fieldk
would always default to3
. This is indeed the desired behavior for a classification of typeknn
. However, it would also be set forcontextual
where the field doesn't make sense. This fix makes sure that the defaults are only set where appropriate.
-
v0.21.4
December 05, 2019