同步操作将从 Hutool/elasticsearch-definitive-guide-cn 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
The first challenge that had to be solved was how to make text searchable. Traditional databases store a single value per field, but this is insufficient for full text search. Every word in a text field needs to be searchable, which means that the database needs to be able to index multiple values — ``words'' in this case — in a single field.
The data structure that best supports the multiple-values-per-field requirement is the inverted index, which we introduced in [inverted-index]. The inverted index contains a sorted list of all of the unique values or terms that occur in any document and, for each term, a list of all the documents that contain it.
Term | Doc 1 | Doc 2 | Doc 3 | ... ------------------------------------ brown | X | | X | ... fox | X | X | X | ... quick | X | X | | ... the | X | | X | ...
When discussing inverted indices we talk about indexing documents'' because,
historically, an inverted index was used to index whole unstructured text
documents. A
document'' in Elasticsearch is a structured JSON document with
fields and values. In reality, every indexed field in a JSON document has its
own inverted index.
The inverted index may actually hold a lot more information than just the list of documents which contain a particular term. It may store a count of how many documents contain each term, how many times a term appears in a particular document, the order of terms in each document, the length of each document, the average length of all documents, etc. These statistics allow Elasticsearch to determine which terms are more important than others, and which documents are more important than others, as described in [relevance-intro].
The important thing to realise is that the inverted index needs to know about all documents in the collection in order for it to function as intended.
In the early days of full text search, one big inverted index was built for the entire document collection and written to disk. As soon as the new index was ready, it replaced the old index and recent changes became searchable.
The inverted index that is written to disk is immutable — it doesn’t change. Ever. This immutability has important benefits:
There is no need for locking. If you never have to update the index, you never have to worry about multiple processes trying to make changes at the same time.
Once the index has been read into the kernel’s file-system cache, it stays there because it never changes. As long as there is enough space in the file-system cache, most reads will come from memory instead of having to hit disk. This provides a big performance boost.
Any other caches (like the filter cache) remain valid for the life of the index. They don’t need to be rebuilt every time the data changes, because the data doesn’t change.
Writing a single large inverted index allows the data to be compressed, reducing costly disk I/O and the amount of RAM needed to cache the index.
Of course, an immutable index has its downsides too, primarily, the fact that it is immutable! You can’t change it. If you want to make new documents searchable, you have to rebuild the entire index.
This places a significant limitation either on the amount of data that an index can contain, or the frequency with which the index can be updated.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。