All Versions
39
Latest Version
Avg Release Cycle
14 days
Latest Release
535 days ago

Changelog History
Page 2

  • v0.22.10 Changes

    June 19, 2020

    🐳 Docker image/tag: semitechnologies/weaviate:0.22.10
    👀 See also: example docker compose files in English, German, Dutch, Italian and Czech.

    💥 Breaking Changes

    none

    🆕 New Features

    none

    🛠 Fixes

    • _interpration showing null when batch importing is used (#1175)
      🛠 This fixes an issue where the vectorization meta info for the _interpretation prop was not stored correctly if the batch importers where used.
  • v0.22.9 Changes

    June 17, 2020

    🐳 Docker image/tag: semitechnologies/weaviate:0.22.9
    👀 See also: example docker compose files in English, German, Dutch, Italian and Czech.

    💥 Breaking Changes

    none

    🆕 New Features

    none

    🛠 Fixes

    • Accidental breaking change in 0.22.8: JSON fields uppercased (#1173)
      ✅ There was an accidental breaking change that lead to uppercased responses in some endpoints. Unfortunately it was consistently broken in both the server as well as the client which was used in tests. This way this breaking change could sneak past our extensive test suite.
  • v0.22.8 Changes

    June 17, 2020

    🐳 Docker image/tag: semitechnologies/weaviate:0.22.8
    👀 See also: example docker compose files in English, German, Dutch, Italian and Czech.

    🚀 WARNING: ACCIDENTAL BREAKING CHANGE IN THIS RELEASE

    🚀 This release accidentally introduced a regression that somehow made it past the test suite. See #1173 for details. Do not use this release, but instead use the next one (0.22.9) where the regression is fixed!

    🗄 Deprecations

    • 🗄 meta?=true/false in REST deprecated
      Instead use the new underscore props, e.g. ?include=_classification for classification information or ?include=_vector to display the vector or ?include=_vector,_classification for both. Estimated removal in 0.23.0

    💥 Breaking Changes

    none

    🆕 New Features

    Underscore-Prop _classification in REST and GraphQL (#1155)
    Display meta information about a classification (if an object was subject to a classification). This information could previously be shown in REST only using the now-deprecated ?meta=true. Instead you can now explicitly request this information using ?include=_classification. Additionally this information is now also available in GraphQL (previously not possible) using the _classification prop alongside the schema-defined props

    Underscore-Prop _interpretation in REST and GraphQL (#1156)
    Display meta information about how an object was interpreted during vectorization, i.e. which words were usable, how they were weighed and additional meta information about each concept, such as the occurrence frequency in the underlying contextionary.

    🐳 Note: This feature requires a contextionary version of at least ...-v0.4.12 which is used in the linked docker-compose files above

    The feature is available as an optional ("underscore") prop in REST using ?include=_interpretation as well as a GraphQL _interpretation prop alongside the schema-defined props.

    🛠 Fixes

    none

    🚀 WARNING: ACCIDENTAL BREAKING CHANGE IN THIS RELEASE

    🚀 This release accidentally introduced a regression that somehow made it past the test suite. See #1173 for details. Do not use this release, but instead use the next one (0.22.9) where the regression is fixed!

  • v0.22.7 Changes

    April 29, 2020

    🐳 Docker image/tag: semitechnologies/weaviate:0.22.7
    👀 See also: example docker compose files in English, German, Dutch, Italian and Czech.

    💥 Breaking Changes

    🆕 New Features

    👌 Improved Contextual classification algorithm (#1125)
    🚀 Prior to this released a contextual classification would often yield false positive for whichever label is closest to the "noise center". This means we would overweigh filler- and stop words and not pay enough attention to the most important words.

    As we compare a data object to its label in a contextual classification, rather data to other data as in a knn-type classification, this issue was far more prevalent in a contextual classification than in one of type knn. In the latter the noise would be present among all data objects, so it was likely to be cancelled out. However, in data objects with (long) texts the contextual classification suffered.

    🚀 This release introduces a complete rewrite of the classification algorithm. Instead of weighing each word purely on it's occurrence in the Contextionary, we know weigh (and even remove) words based on two new metrics: Information Gain and tf-idf.

    🚚 Information Gain is a custom measure to predict how likely a given word is going to influence the classification towards a specific target (label). For example imagine the data object "I love my new computer" with the possible labels "Technology", "Food", "Politics". When looking at each word in the source object Weaviate would identify "computer" as the word with the highest information gain as it would clearly move the vector towards one of the categories ("computers"). The other words might point to either of the categories without a clear favorite. Thus their information gain should be lower. As a result weaviate will weigh "computer" the highest in the data object.

    🔧 Tf-Idf, on the other hand, does not compare the data objects directly to a target (label), but rather to other objects. If multiple objects exist such as "My new computer is great!", "Who is the new president?", "New dishes on the menu!", the word "new"is identified to occur in every object, it thus has an Inverse Document Frequency of 0. Based on user configuration this word can be removed from vectorization entirely.

    👀 The new mechanisms are user-configurable. They come with reasonable defaults that will work for many datasets, but the get the most out of your classification, it might make sense to tweak them until you get the best possible results. For a detailed list and explanation of the newly introduced parameters, see this comment.

    Benchmark

    👀 In a benchmark based on the 20 news group data set we have seen a substantial improvement in success rates:

    Note that this benchmark was done using a contextual classification, i.e. without training data (labeled data). The success rates are therefore not comparable to other mechanisms which rely on training data. If you want to compare Weaviate's perfomance with other classifications mechanisms which require labelled data, please run a kNN classification instead.

    Main Category

    The posts were to be categorized as one of 6 categories (expected success rate for random distribution ~16,7%)

    Granular Category

    The posts were to be categorized as one of 20 categories (expected success rate for random distribution ~5%)

    Goal Previous (<0.22.7) Improved Algorithm (>= 0.22.7)
    Main Category 18% 58%
    Granular Category 10% 42%

    The following settings were used:

    # datasetn: 563 # randomly picked with a roughly equal size per category# configuration type: contextualinformationGainCutoffPercentile: 10informationGainMaximumBoost: 3tfidfCutoffPercentile: 80
    

    🛠 Fixes

    • 🛠 Fix unexpected behavior on geoCoordinates 0,0 (#825)
      🚀 GeoCoordinates of 0,0 - infamously known as Null Island - would lead to the geoCoordinates property disappearing entirely as 0 also happens to be the null/initial value for a property of type float in Golang. This release fixes this and we explicitly display a 0-Coordinate as such now.
  • v0.22.6 Changes

    April 06, 2020

    🐳 Docker image/tag: semitechnologies/weaviate:0.22.6
    👀 See also: example docker compose files in English, German, Dutch, Italian and Czech.

    💥 Breaking Changes

    none

    🆕 New Features

    Filter objects by count of references (#1101)
    🚀 Weaviate has already offered substantial "filter by references" capabilities in the past, such as "Find all Cities located in a Country with a population size larger than x". However, prior to this release it was not possible to filter for cases such as "Show all Cities not associated with a Country" or "Find all authors who wrote at least 2 articles".

    🚀 This release adds the ability to filter by reference count. To do so, simply provide one of the existing compare operators (Equal, LessThan, LessThanEqual, GreaterThan, GreaterThanEqual) and use it directly on the reference element. For example, the following GraphQL query:

    { Get { Things { Author( where:{ valueInt: 2operator:GreaterThanEqualpath: ["WroteArticles"] } ) { nameWroteArticles { ... on Article { title } } } } } }
    

    📚 Note: The example above uses the News Publication dataset.

    🛠 Fixes

    none

  • v0.22.5 Changes

    April 01, 2020

    🐳 Docker image/tag: semitechnologies/weaviate:0.22.5
    👀 See also: example docker compose files in English, German, Dutch, Italian and Czech.

    💥 Breaking Changes

    none

    🆕 New Features

    Hypertext Links on API root (#1108, #1103)
    Prior to this, accessing the path / would return 404 Not Found. This was changed as follows:

    • / redirects (301 Moved Permanently) to /v1 which is the api base. If the client does not automatcially follow redirects, a json is presented which contains the link to /v1
    • /v1shows a list of main APIs and links to documentation for each resource group. Note this is not a complete list, as the intention is not to list every possible option (We have the swagger document for this). Instead the links work like website links where on the root page you are a presented with a few main cateogories.

    - If the origin optioned is configured in the weavite config, an absolute URI is used. This can be helpful when weaviate is running behind a reverse proxy (which is most likely the case in a production setting). Then weaviate has no way of knowing how the user accesses it without it being explicitly configured. If the origin config is not set, links do not default to the listen/bind address as origin, instead relative links are presented.

    Hypertext Links cross-references (#1106)
    Similar to the API root links, all REST endpoints which can show cross-references now include a read-only field href alongside the existing beacon field. The field contains an HTTP Hypertext Reference to the resprective resources. The same behavior regarding origin in the config and absolute vs relative URIs as outline above applies to these links as well.

    🛠 Fixes

    🛠 Memory leak fixed in contextionary (semi-technologies/contextionary#25)
    We discovered a potential memory leak in a library used in the contextionary. In some cases after long import sessions the contextionary memory usage would keep growing without a limit. We have replaced the code from the external library with custom code in #26 thus fixing the issue.

    🐳 The docker-compose files linked above already reference the new version. If you are running your own setup or the K8s setup via the official helm chart, make sure you reference version <language>0.14.0-v0.4.8 or higher.

  • v0.22.4 Changes

    March 05, 2020

    🐳 Docker image/tag: semitechnologies/weaviate:0.22.4
    👀 See also: example docker compose files in English, German, Dutch, Italian and Czech.

    💥 Breaking Changes

    none

    🆕 New Features

    • 🆕 New contextionary languages added for contextionary version xx0.13.0-v0.4.7. See the links above for example docker-compose files for supported languages. You can use the linked contextionary images in other setups (Kubernetes, Helm) as well.

    🛠 Fixes

    none

  • v0.22.3 Changes

    March 03, 2020

    🐳 Docker image/tag: semitechnologies/weaviate:0.22.3
    👀 See also: example docker compose files in english and dutch.

    💥 Breaking Changes

    none

    🆕 New Features

    Return objects' vector position when meta=true (#1041)
    🖨 As part of the classification feature a meta option (passed as a query parameter) was added to the GET /v1/things and GET /v1/actions API. If the object was part of a classification, meta information about that classification is printed. Additionally, the meta object will now - regardless of classifications - also contains the objects vector position.

    Keep in mind that the 600-dimensional vector is about 2.4KB of size in the underlying storage and about twice that size when encoded as float numbers in json. So you will add about 5KB of data per object when setting meta=true. While this is negligible on single objects, the additional data to be transferred on long list queries might add up to a lot of additional traffic. So, only set this option if really necessary.

    🛠 Fixes

    🐛 Bug: ?meta=true ignored on list queries (#1099)
    🚀 Prior to this release setting the meta=true query param worked on GET /v1/things/{id} (single object), but not on GET /v1/things (list of objects). This releases fixes this and makes sure meta=true can now be set on both types of GET queries

    🐛 Bug: Numbers and other characters lead to error in /c11y/concepts endpoint (#1078)
    The requirements for class names and other schema fields have been loosened in the past. As of now any utf-8 letter or digit is an acceptable character. However, the /c11y/concepts endpoint. which can be used to inspect word concepts in the contextionary space, still validated a strict [A-Za-z]. This has been changed and now all utf-8 letters and digits are acceptable.

  • v0.22.2 Changes

    February 28, 2020

    🐳 Docker image/tag: semitechnologies/weaviate:0.22.2
    👀 See also: example docker compose files in english and dutch.

    💥 Breaking Changes

    none

    🆕 New Features

    ⬆️ Upgrade to Go 1.14 (#1090)
    ⚡️ No user-facing changes. Even for contributors it's very unlikely that this update introduced any changes. But we recommend updating your Go environment to the latest version if you plan on contributing to Weaviate. Thanks.

    🆕 New Data Type: phoneNumber (#1088 and #1087)
    A new data type with the name phoneNumber was added. This type is a primitive type like text, string, etc - as compared to reference type. Similar to the existing type geoProperties, the new type contains more than a single field.

    👀 The full type definition can be seen in the swagger.json definition

    Usage

    0️⃣ There are two user-settable sub-fields (input and defaultCountry). input must always be set when using the type, defaultCountry must only be set in specific situations:

    • When you enter an international number (e.g. +49 171 1234567) no defaultCountry must be entered, as the underlying parser will recognize that the above is a German number due to the +49 prefix
    • When you enter the same number as above in a national format (e.g. 0171 1234567), you need to specify the defaultCounty (in this case: "de"), so that the parse can correctly convert the number into all formats.

    Inputs and Formats

    • phoneNumber.input is of type string. You can enter any phone number. Optional digits, such as an optional 0 (e.g. +49 (0) 171 ....) will be automatically recognized and normalized. Furthermore all formatting helpers, such as dashes or spaces are being removed by the parser.
    • phoneNumber.defaultCountry is of type string. See "Usage" above on when this optional field is required. Content should be entered as ISO 3166-1 alpha-2 country codes.

    📜 Read-only fields after parsing

    When reading back a field of type phone number, the following (read-only) fields appear:

    • internationalFormatted (string): Phone number in international format, e.g. "+49 171 123456"
    • national (unsigned integer): National part of the phone number, eg. 171123456
    • nationalFormatted (string): Phone number in national format, eg. "0171 123456"
    • countryCode (unsigned integer): Country-code digits, e.g. 49
    • valid (boolean): Whether the parser recognized the phone number as valid
    • input (string): The raw phone number as put in by the user (helpful for debugging purposes), see Usage above
    • defaultCountry (string) The default country as put in by the user, only set if explicitly set by the user, see Usage above

    Limitations

    The following phone-related features are not yet part of the above release

    • Search by phone numbers (#1089)
    • Aggregate phone numbers

    🛠 Fixes

    none

  • v0.22.1 Changes

    February 04, 2020

    🐳 Docker image/tag: semitechnologies/weaviate:0.22.1
    👀 See also: example docker compose files in english and dutch.

    💥 Breaking Changes

    none

    🆕 New Features

    Override weights on vector creation (#1070 and #1074)
    🚀 Prior to this release the weight of each individual word when creating a vector from an object was out of the user's control. The contextionary uses an algorithm based on the general occurrence of the word in its training data, to suggest how each word should be weighted. The underlying assumption is that a rare word should take more precedence over a very common word, similar to tf-idf.

    This works well in most cases, but in some use-case specific domain languages common words get a new meaning and therefore their importance should change. Imagine the words "far" and "near". They are quite common in overall language, so - especially when mixed with rarer words - they wouldn't get a great weight. However, now assume you're in the domain of optometry or manufacturing glasses. In the terms "far-sighted" and "near-sighted", the words "near" and "far" make a very important distinction. Imagine you were trying to classify objects based on those terms. With the changes in 0.22.1 you can now influence - or even completely override - the weights of individual words when creating vectors.

    🛠 To do so, the field vectorWeights was introduced to the Thing and Action objects. The field is a key-value map where both the keys and the values must be strings. The keys are the words you want to influence and the value is a mathematical expression to set the new weight. You can use additions, subtractions, multiplications, divisions or simply overwrite the weight with a fixed number. To reference the original weights, use the single-letter variable w. Some examples:

    "vectorWeights": {"far": "10 * w"}
    Give the word "far" 10 times its original weight

    "vectorWeights": {"far": "w + 0.5", "near": "w - 0.5"}
    Give the word "far" an absolute boost of 0.5, while penalizing the word "near" by 0.5.

    "vectorWeights": {"sighted": "0.7", "glasses": "2 - 4 * w"}
    🛠 Let the word "sighted" have a fixed weight of 0.7 whereas the word "glasses" is calculated by subtracting 4 times the original weight from the number 2.

    Some important things to note:

    • For this feature to work you need a contextionary version of at least ...v0.4.7. The example docker-compose files linked above have already been updated to the required version.
    • Spaces in math expressions have no meaning.
    • A word that is not referenced in "vectorWeights" will simply use its original weight as returned by the contextionary.
    • Custom vectorWeights only affect the object which they are set on, there is no option to globally manipulate a specific word. If the same vectorWeights are required for multiple objects, simply attach them to all objects where needed.
    • Whenever the mathematical expression is not a fixed number (such as "17") an operator must be present. It is not valid to use implicit operators, such as "2w" which would mean "two times the original weight". In this case explicitly use the multiplication operator, e.g. "2 * w" or "w*2".

    Full example

    Here's a full example for importing a thing object

    POST /v1/things

    { "class": "Glasses", "schema": { "description": "These glasses are meant for far-sighted people" }, "vectorWeights": { "far": "5 \* w", "near": "5 \* w" } }
    

    The above example will boost the words "far" or "near" by a factor of 5. Note that the object does not contain the word "near", so only the word "far" is boosted. The other unreferenced words maintain their original weights.

    🛠 Fixes

    none