Changelog History
Page 2
-
v0.22.10 Changes
June 19, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.22.10
π See also: example docker compose files in English, German, Dutch, Italian and Czech.π₯ Breaking Changes
none
π New Features
none
π Fixes
_interpration
showing null when batch importing is used (#1175)
π This fixes an issue where the vectorization meta info for the_interpretation
prop was not stored correctly if the batch importers where used.
-
v0.22.9 Changes
June 17, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.22.9
π See also: example docker compose files in English, German, Dutch, Italian and Czech.π₯ Breaking Changes
none
π New Features
none
π Fixes
- Accidental breaking change in 0.22.8: JSON fields uppercased (#1173)
β There was an accidental breaking change that lead to uppercased responses in some endpoints. Unfortunately it was consistently broken in both the server as well as the client which was used in tests. This way this breaking change could sneak past our extensive test suite.
- Accidental breaking change in 0.22.8: JSON fields uppercased (#1173)
-
v0.22.8 Changes
June 17, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.22.8
π See also: example docker compose files in English, German, Dutch, Italian and Czech.π WARNING: ACCIDENTAL BREAKING CHANGE IN THIS RELEASE
π This release accidentally introduced a regression that somehow made it past the test suite. See #1173 for details. Do not use this release, but instead use the next one (
0.22.9
) where the regression is fixed!π Deprecations
- π
meta?=true/false
in REST deprecated
Instead use the new underscore props, e.g.?include=_classification
for classification information or?include=_vector
to display the vector or?include=_vector,_classification
for both. Estimated removal in0.23.0
π₯ Breaking Changes
none
π New Features
Underscore-Prop
_classification
in REST and GraphQL (#1155)
Display meta information about a classification (if an object was subject to a classification). This information could previously be shown in REST only using the now-deprecated?meta=true
. Instead you can now explicitly request this information using?include=_classification
. Additionally this information is now also available in GraphQL (previously not possible) using the_classification
prop alongside the schema-defined propsUnderscore-Prop
_interpretation
in REST and GraphQL (#1156)
Display meta information about how an object was interpreted during vectorization, i.e. which words were usable, how they were weighed and additional meta information about each concept, such as the occurrence frequency in the underlying contextionary.π³ Note: This feature requires a contextionary version of at least
...-v0.4.12
which is used in the linked docker-compose files aboveThe feature is available as an optional ("underscore") prop in REST using
?include=_interpretation
as well as a GraphQL_interpretation
prop alongside the schema-defined props.π Fixes
none
π WARNING: ACCIDENTAL BREAKING CHANGE IN THIS RELEASE
π This release accidentally introduced a regression that somehow made it past the test suite. See #1173 for details. Do not use this release, but instead use the next one (
0.22.9
) where the regression is fixed! - π
-
v0.22.7 Changes
April 29, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.22.7
π See also: example docker compose files in English, German, Dutch, Italian and Czech.π₯ Breaking Changes
π New Features
π Improved Contextual classification algorithm (#1125)
π Prior to this released a contextual classification would often yield false positive for whichever label is closest to the "noise center". This means we would overweigh filler- and stop words and not pay enough attention to the most important words.As we compare a data object to its label in a contextual classification, rather data to other data as in a knn-type classification, this issue was far more prevalent in a contextual classification than in one of type knn. In the latter the noise would be present among all data objects, so it was likely to be cancelled out. However, in data objects with (long) texts the contextual classification suffered.
π This release introduces a complete rewrite of the classification algorithm. Instead of weighing each word purely on it's occurrence in the Contextionary, we know weigh (and even remove) words based on two new metrics: Information Gain and tf-idf.
π Information Gain is a custom measure to predict how likely a given word is going to influence the classification towards a specific target (label). For example imagine the data object
"I love my new computer"
with the possible labels"Technology", "Food", "Politics"
. When looking at each word in the source object Weaviate would identify "computer" as the word with the highest information gain as it would clearly move the vector towards one of the categories ("computers"). The other words might point to either of the categories without a clear favorite. Thus their information gain should be lower. As a result weaviate will weigh"computer"
the highest in the data object.π§ Tf-Idf, on the other hand, does not compare the data objects directly to a target (label), but rather to other objects. If multiple objects exist such as
"My new computer is great!", "Who is the new president?", "New dishes on the menu!"
, the word"new"
is identified to occur in every object, it thus has an Inverse Document Frequency of 0. Based on user configuration this word can be removed from vectorization entirely.π The new mechanisms are user-configurable. They come with reasonable defaults that will work for many datasets, but the get the most out of your classification, it might make sense to tweak them until you get the best possible results. For a detailed list and explanation of the newly introduced parameters, see this comment.
Benchmark
π In a benchmark based on the 20 news group data set we have seen a substantial improvement in success rates:
Note that this benchmark was done using a contextual classification, i.e. without training data (labeled data). The success rates are therefore not comparable to other mechanisms which rely on training data. If you want to compare Weaviate's perfomance with other classifications mechanisms which require labelled data, please run a
kNN
classification instead.Main Category
The posts were to be categorized as one of 6 categories (expected success rate for random distribution ~16,7%)
Granular Category
The posts were to be categorized as one of 20 categories (expected success rate for random distribution ~5%)
Goal Previous (<0.22.7) Improved Algorithm (>= 0.22.7) Main Category 18% 58% Granular Category 10% 42% The following settings were used:
# datasetn: 563 # randomly picked with a roughly equal size per category# configuration type: contextualinformationGainCutoffPercentile: 10informationGainMaximumBoost: 3tfidfCutoffPercentile: 80
π Fixes
- π Fix unexpected behavior on geoCoordinates 0,0 (#825)
π GeoCoordinates of 0,0 - infamously known as Null Island - would lead to the geoCoordinates property disappearing entirely as 0 also happens to be the null/initial value for a property of typefloat
in Golang. This release fixes this and we explicitly display a 0-Coordinate as such now.
- π Fix unexpected behavior on geoCoordinates 0,0 (#825)
-
v0.22.6 Changes
April 06, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.22.6
π See also: example docker compose files in English, German, Dutch, Italian and Czech.π₯ Breaking Changes
none
π New Features
Filter objects by count of references (#1101)
π Weaviate has already offered substantial "filter by references" capabilities in the past, such as "Find all Cities located in a Country with a population size larger than x". However, prior to this release it was not possible to filter for cases such as "Show all Cities not associated with a Country" or "Find all authors who wrote at least 2 articles".π This release adds the ability to filter by reference count. To do so, simply provide one of the existing compare operators (
Equal
,LessThan
,LessThanEqual
,GreaterThan
,GreaterThanEqual
) and use it directly on the reference element. For example, the following GraphQL query:{ Get { Things { Author( where:{ valueInt: 2operator:GreaterThanEqualpath: ["WroteArticles"] } ) { nameWroteArticles { ... on Article { title } } } } } }
π Note: The example above uses the News Publication dataset.
π Fixes
none
-
v0.22.5 Changes
April 01, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.22.5
π See also: example docker compose files in English, German, Dutch, Italian and Czech.π₯ Breaking Changes
none
π New Features
Hypertext Links on API root (#1108, #1103)
Prior to this, accessing the path/
would return404 Not Found
. This was changed as follows:/
redirects (301 Moved Permanently
) to/v1
which is the api base. If the client does not automatcially follow redirects, a json is presented which contains the link to/v1
/v1
shows a list of main APIs and links to documentation for each resource group. Note this is not a complete list, as the intention is not to list every possible option (We have the swagger document for this). Instead the links work like website links where on the root page you are a presented with a few main cateogories.
- If the
origin
optioned is configured in the weavite config, an absolute URI is used. This can be helpful when weaviate is running behind a reverse proxy (which is most likely the case in a production setting). Then weaviate has no way of knowing how the user accesses it without it being explicitly configured. If theorigin
config is not set, links do not default to the listen/bind address as origin, instead relative links are presented.Hypertext Links cross-references (#1106)
Similar to the API root links, all REST endpoints which can show cross-references now include a read-only fieldhref
alongside the existingbeacon
field. The field contains an HTTP Hypertext Reference to the resprective resources. The same behavior regardingorigin
in the config and absolute vs relative URIs as outline above applies to these links as well.π Fixes
π Memory leak fixed in contextionary (semi-technologies/contextionary#25)
We discovered a potential memory leak in a library used in the contextionary. In some cases after long import sessions the contextionary memory usage would keep growing without a limit. We have replaced the code from the external library with custom code in #26 thus fixing the issue.π³ The docker-compose files linked above already reference the new version. If you are running your own setup or the K8s setup via the official helm chart, make sure you reference version
<language>0.14.0-v0.4.8
or higher. -
v0.22.4 Changes
March 05, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.22.4
π See also: example docker compose files in English, German, Dutch, Italian and Czech.π₯ Breaking Changes
none
π New Features
- π New contextionary languages added for contextionary version
xx0.13.0-v0.4.7
. See the links above for example docker-compose files for supported languages. You can use the linked contextionary images in other setups (Kubernetes, Helm) as well.
π Fixes
none
- π New contextionary languages added for contextionary version
-
v0.22.3 Changes
March 03, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.22.3
π See also: example docker compose files in english and dutch.π₯ Breaking Changes
none
π New Features
Return objects' vector position when
meta=true
(#1041)
π¨ As part of the classification feature ameta
option (passed as a query parameter) was added to theGET /v1/things
andGET /v1/actions
API. If the object was part of a classification, meta information about that classification is printed. Additionally, the meta object will now - regardless of classifications - also contains the objects vector position.Keep in mind that the 600-dimensional vector is about 2.4KB of size in the underlying storage and about twice that size when encoded as float numbers in json. So you will add about 5KB of data per object when setting
meta=true
. While this is negligible on single objects, the additional data to be transferred on long list queries might add up to a lot of additional traffic. So, only set this option if really necessary.π Fixes
π Bug:
?meta=true
ignored on list queries (#1099)
π Prior to this release setting themeta=true
query param worked onGET /v1/things/{id}
(single object), but not onGET /v1/things
(list of objects). This releases fixes this and makes suremeta=true
can now be set on both types ofGET
queriesπ Bug: Numbers and other characters lead to error in
/c11y/concepts
endpoint (#1078)
The requirements for class names and other schema fields have been loosened in the past. As of now any utf-8 letter or digit is an acceptable character. However, the/c11y/concepts
endpoint. which can be used to inspect word concepts in the contextionary space, still validated a strict[A-Za-z]
. This has been changed and now all utf-8 letters and digits are acceptable. -
v0.22.2 Changes
February 28, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.22.2
π See also: example docker compose files in english and dutch.π₯ Breaking Changes
none
π New Features
β¬οΈ Upgrade to Go 1.14 (#1090)
β‘οΈ No user-facing changes. Even for contributors it's very unlikely that this update introduced any changes. But we recommend updating your Go environment to the latest version if you plan on contributing to Weaviate. Thanks.π New Data Type:
phoneNumber
(#1088 and #1087)
A new data type with the namephoneNumber
was added. This type is a primitive type liketext
,string
, etc - as compared to reference type. Similar to the existing typegeoProperties
, the new type contains more than a single field.π The full type definition can be seen in the
swagger.json
definitionUsage
0οΈβ£ There are two user-settable sub-fields (
input
anddefaultCountry
).input
must always be set when using the type,defaultCountry
must only be set in specific situations:- When you enter an international number (e.g.
+49 171 1234567
) nodefaultCountry
must be entered, as the underlying parser will recognize that the above is a German number due to the+49
prefix - When you enter the same number as above in a national format (e.g.
0171 1234567
), you need to specify thedefaultCounty
(in this case:"de"
), so that the parse can correctly convert the number into all formats.
Inputs and Formats
phoneNumber.input
is of typestring
. You can enter any phone number. Optional digits, such as an optional0
(e.g.+49 (0) 171 ....
) will be automatically recognized and normalized. Furthermore all formatting helpers, such as dashes or spaces are being removed by the parser.phoneNumber.defaultCountry
is of typestring
. See "Usage" above on when this optional field is required. Content should be entered as ISO 3166-1 alpha-2 country codes.
π Read-only fields after parsing
When reading back a field of type phone number, the following (read-only) fields appear:
internationalFormatted
(string
): Phone number in international format, e.g."+49 171 123456"
national
(unsigned integer
): National part of the phone number, eg.171123456
nationalFormatted
(string
): Phone number in national format, eg."0171 123456"
countryCode
(unsigned integer
): Country-code digits, e.g.49
valid
(boolean
): Whether the parser recognized the phone number as validinput
(string
): The raw phone number as put in by the user (helpful for debugging purposes), see Usage abovedefaultCountry
(string
) The default country as put in by the user, only set if explicitly set by the user, see Usage above
Limitations
The following phone-related features are not yet part of the above release
- Search by phone numbers (#1089)
- Aggregate phone numbers
π Fixes
none
- When you enter an international number (e.g.
-
v0.22.1 Changes
February 04, 2020π³ Docker image/tag:
semitechnologies/weaviate:0.22.1
π See also: example docker compose files in english and dutch.π₯ Breaking Changes
none
π New Features
Override weights on vector creation (#1070 and #1074)
π Prior to this release the weight of each individual word when creating a vector from an object was out of the user's control. The contextionary uses an algorithm based on the general occurrence of the word in its training data, to suggest how each word should be weighted. The underlying assumption is that a rare word should take more precedence over a very common word, similar to tf-idf.This works well in most cases, but in some use-case specific domain languages common words get a new meaning and therefore their importance should change. Imagine the words "far" and "near". They are quite common in overall language, so - especially when mixed with rarer words - they wouldn't get a great weight. However, now assume you're in the domain of optometry or manufacturing glasses. In the terms "far-sighted" and "near-sighted", the words "near" and "far" make a very important distinction. Imagine you were trying to classify objects based on those terms. With the changes in 0.22.1 you can now influence - or even completely override - the weights of individual words when creating vectors.
π To do so, the field
vectorWeights
was introduced to theThing
andAction
objects. The field is a key-value map where both the keys and the values must be strings. The keys are the words you want to influence and the value is a mathematical expression to set the new weight. You can use additions, subtractions, multiplications, divisions or simply overwrite the weight with a fixed number. To reference the original weights, use the single-letter variablew
. Some examples:"vectorWeights": {"far": "10 * w"}
Give the word "far" 10 times its original weight"vectorWeights": {"far": "w + 0.5", "near": "w - 0.5"}
Give the word "far" an absolute boost of 0.5, while penalizing the word "near" by 0.5."vectorWeights": {"sighted": "0.7", "glasses": "2 - 4 * w"}
π Let the word "sighted" have a fixed weight of 0.7 whereas the word "glasses" is calculated by subtracting 4 times the original weight from the number 2.Some important things to note:
- For this feature to work you need a contextionary version of at least
...v0.4.7
. The example docker-compose files linked above have already been updated to the required version. - Spaces in math expressions have no meaning.
- A word that is not referenced in "vectorWeights" will simply use its original weight as returned by the contextionary.
- Custom vectorWeights only affect the object which they are set on, there is no option to globally manipulate a specific word. If the same vectorWeights are required for multiple objects, simply attach them to all objects where needed.
- Whenever the mathematical expression is not a fixed number (such as
"17"
) an operator must be present. It is not valid to use implicit operators, such as"2w"
which would mean "two times the original weight". In this case explicitly use the multiplication operator, e.g."2 * w"
or"w*2"
.
Full example
Here's a full example for importing a thing object
POST /v1/things
{ "class": "Glasses", "schema": { "description": "These glasses are meant for far-sighted people" }, "vectorWeights": { "far": "5 \* w", "near": "5 \* w" } }
The above example will boost the words "far" or "near" by a factor of 5. Note that the object does not contain the word "near", so only the word "far" is boosted. The other unreferenced words maintain their original weights.
π Fixes
none
- For this feature to work you need a contextionary version of at least