1 Star 0 Fork 0

木易/elasticsearch-definitive-guide

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
20_Tuning_best_field_queries.asciidoc 3.95 KB
一键复制 编辑 原始数据 按行查看 历史

Tuning Best Fields Queries

What would happen if the user had searched instead for `quick pets''? Both documents contain the word `quick, but only document 2 contains the word pets. Neither document contains both words in the same field.

A simple dis_max query like the following would choose the single best matching field, and ignore the other:

{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ]
        }
    }
}
{
  "hits": [
     {
        "_id": "1",
        "_score": 0.12713557, (1)
        "_source": {
           "title": "Quick brown rabbits",
           "body": "Brown rabbits are commonly seen."
        }
     },
     {
        "_id": "2",
        "_score": 0.12713557, (1)
        "_source": {
           "title": "Keeping pets healthy",
           "body": "My quick brown fox eats rabbits on a regular basis."
        }
     }
   ]
}
  1. Note that the scores are exactly the same.

We would probably expect documents that match on both the title field and the body field to rank higher than documents that match on just one field, but this isn’t the case. Remember: the dis_max query simply uses the score from the _single best-matching clause.

tie_breaker

It is possible, however, to also take the _score from the other matching clauses into account, by specifying the tie_breaker parameter:

{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ],
            "tie_breaker": 0.3
        }
    }
}

This gives us the following results:

{
  "hits": [
     {
        "_id": "2",
        "_score": 0.14757764, (1)
        "_source": {
           "title": "Keeping pets healthy",
           "body": "My quick brown fox eats rabbits on a regular basis."
        }
     },
     {
        "_id": "1",
        "_score": 0.124275915, (1)
        "_source": {
           "title": "Quick brown rabbits",
           "body": "Brown rabbits are commonly seen."
        }
     }
   ]
}
  1. Document 2 now has a small lead over document 1.

The tie_breaker parameter makes the dis_max query behave more like a halfway house between dis_max and bool. It changes the score calculation as follows:

  1. Take the _score of the best-matching clause.

  2. Multiply the score of each of the other matching clauses by the tie_breaker.

  3. Add them all together and normalize.

With the tie_breaker, all matching clauses count, but the best-matching clause counts most.

Note

The tie_breaker can be a floating-point value between 0 and 1, where 0 uses just the best-matching clause and 1 counts all matching clauses equally. The exact value can be tuned based on your data and queries, but a reasonable value should be close to zero, (for example, 0.1 - 0.4), in order not to overwhelm the best-matching nature of dis_max.

Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/yangguilong/elasticsearch-definitive-guide.git
git@gitee.com:yangguilong/elasticsearch-definitive-guide.git
yangguilong
elasticsearch-definitive-guide
elasticsearch-definitive-guide
master

搜索帮助