200_Language_intro/00_Intro.asciidoc · 脏小强/elasticsearch-definitive-guide-cn

Getting started with languages

Elasticsearch ships with a collection of language analyzers which provide good, basic, out-of-the-box support for a number of the world’s most common languages:

Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Kurdish, Norwegian, Persian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, and Thai.

These analyzers typically perform four roles:

Tokenize text into individual words:

The quick brown foxes → [The, quick, brown, foxes]
Lowercase tokens:

The → the
Remove common stopwords:

[`The`, quick, brown, foxes] → [quick, brown, foxes]
Stem tokens to their root form:

foxes → fox

Each analyzer may also apply other transformations specific to its language in order to make words from that language more searchable:

the english analyzer removes the possessive 's:

John’s → john
the french analyzer removes elisions like l' and qu' and diactrics like ¨ or ^:

l’église → eglis
the german analyzer normalizes terms, replacing ä and ae with a, or ß with ss, among others:

äußerst → ausserst

脏小强/elasticsearch-definitive-guide-cn

Getting started with languages

简介

发行版

贡献者

近期动态

脏小强/elasticsearch-definitive-guide-cn .gitee-modal { width: 500px !important; }

Getting started with languages

简介

发行版

贡献者

近期动态

搜索帮助

脏小强/elasticsearch-definitive-guide-cn