同步操作将从 Hutool/elasticsearch-definitive-guide-cn 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
Queries can only find terms that actually exist in the inverted index, so it is important to ensure that the same analysis process is applied both to the document at index time, and to the query string at search time so that the terms in the query match the terms in the inverted index.
While we say ``document'', really analyzers are determined per-field. Each field can have a different analyzer, either by configuring a specific analyzer for that field or by falling back on the type, index or node defaults. At index time, a field’s value is analyzed using the configured or default analyzer for that field.
For instance, let’s add a new field to my_index
:
PUT /my_index/_mapping/my_type
{
"my_type": {
"properties": {
"english_title": {
"type": "string",
"analyzer": "english"
}
}
}
}
Now we can compare how values in the english_title
field and the title
field are
analyzed at index time using the analyze
API to analyze the word "Foxes"
:
GET /my_index/_analyze?field=my_type.title (1)
Foxes
GET /my_index/_analyze?field=my_type.english_title (2)
Foxes
Field title
, which uses the default standard
analyzer will return the
term foxes
Field english_title
, which uses the english
analyzer will return the term
fox
.
This means that, were we to run a low-level term
query for the exact term
"fox"
, the english_title
field would match but the title
field would
not.
High-level queries like the match
query understand field mappings and can
apply the correct analyzer for each field being queried. We can see this
in action with the validate query
API:
GET /my_index/my_type/_validate/query?explain
{
"query": {
"bool": {
"should": [
{ "match": { "title": "Foxes"}},
{ "match": { "english_title": "Foxes"}}
]
}
}
}
which returns this explanation
:
(title:foxes english_title:fox)
The match
query uses the appropriate analyzer for each field to ensure
that it looks for each term in the correct format for that field.
While we can specify an analyzer at field level, how do we determine which analyzer is used for a field if none is specified at field level?
Analyzers can be specified at several levels. Elasticsearch works through each level until it finds an analyzer which it can use. At index time, the process is as follows:
the analyzer
defined in the field mapping, else
the analyzer defined in the _analyzer
field of the document, else
the default analyzer
for the type
, which defaults to
the analyzer named default
in the index settings, which defaults to
the analyzer named default
at node level, which defaults to
the standard
analyzer
At search time, the sequence is slightly different:
the analyzer
defined in the query itself, else
the analyzer
defined in the field mapping, else
the default analyzer
for the type
, which defaults to
the analyzer named default
in the index settings, which defaults to
the analyzer named default
at node level, which defaults to
the standard
analyzer
The two lines in bold above highlight differences in the index time sequence
and the search time sequence. The _analyzer
field allows you to specify a
default analyzer for each document (eg english
, french
, spanish
) while
the analyzer
parameter in the query specifies which analyzer to use on the
query string. However, this is not the best way to handle multiple languages
in a single index because of the pitfalls highlighted in [languages].
Occasionally, it makes sense to use a different analyzer at index and search
time. For instance, at index time we may want to index synonyms, eg for every
occurrence of quick
we also index fast
, rapid
and speedy
. But at
search time, we don’t need to search for all of these synonyms. Instead we
can just look up the single word that the user has entered, be it quick
,
fast
, rapid
or speedy
.
To enable this distinction, Elasticsearch also supports the index_analyzer
and search_analyzer
parameters, and analyzers named default_index
and
default_search
.
Taking these extra parameters into account, the full sequence at index time really looks like this:
the index_analyzer
defined in the field mapping, else
the analyzer
defined in the field mapping, else
the analyzer defined in the _analyzer
field of the document, else
the default index_analyzer
for the type
, which defaults to
the default analyzer
for the type
, which defaults to
the analyzer named default_index
in the index settings, which defaults to
the analyzer named default
in the index settings, which defaults to
the analyzer named default_index
at node level, which defaults to
the analyzer named default
at node level, which defaults to
the standard
analyzer
And at search time:
the analyzer
defined in the query itself, else
the search_analyzer
defined in the field mapping, else
the analyzer
defined in the field mapping, else
the default search_analyzer
for the type
, which defaults to
the default analyzer
for the type
, which defaults to
the analyzer named default_search
in the index settings, which defaults to
the analyzer named default
in the index settings, which defaults to
the analyzer named default_search
at node level, which defaults to
the analyzer named default
at node level, which defaults to
the standard
analyzer
The sheer number of places where you can specify an analyzer is quite overwhelming. In practice, though, it is pretty simple:
The first thing to remember is that, even though you may start out using Elasticsearch for a single purpose or a single application such as logging, chances are that you will find more use cases and end up running several distinct applications on the same cluster. Each index needs to be independent and independently configurable. You don’t want to set defaults for one use case, only to have to override them for another use case later on.
This rules out configuring analyzers at the node level. Additionally, configuring analyzers at node level requires changing the config file on every node and restarting every node which becomes a maintenance nightmare. It’s a much better idea to keep Elasticsearch running and to manage settings only via the API.
Most of the time, you will know what fields your documents will contain ahead of time. The simplest approach is to set the analyzer for each full-text field when you create your index or add type mappings. While this approach is slightly more verbose, it makes it easy to see which analyzer is being applied to each field.
Typically, most of your string fields will be exact-value not_analyzed
fields such as tags or enums, plus a handful of full-text fields which will
use some default analyzer like standard
or english
or some other language.
Then you may have one or two fields which need custom analysis: perhaps the
title
field needs to be indexed in a way that supports find-as-you-type.
You can set the default
analyzer in the index to the analyzer you want to
use for almost all full-text fields, and just configure the specialized
analyzer on the one or two fields that need it. If, in your model, you need
a different default analyzer per type, then use the type level analyzer
setting instead.
A common work-flow for time based data like logging is to create a new index per day on the fly by just indexing into it. While this work flow prevents you from creating your index up front, you can still use {ref}indices-templates.html[index templates] to specify the settings and mappings that a new index should have.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。