As time-based data ages, it becomes less relevant. It’s possible that we will want to see what happened last week, last month, or even last year, but for the most part, we’re interested in only the here and now.
The nice thing about an index per time frame is that it enables us to easily delete old data: just delete the indices that are no longer relevant:
DELETE /logs_2013*
Deleting a whole index is much more efficient than deleting individual documents: Elasticsearch just removes whole directories.
But deleting an index is very final. There are a number of things we can do to help data age gracefully, before we decide to delete it completely.
With logging data, there is likely to be one hot index—the index for today. All new documents will be added to that index, and almost all queries will target that index. It should use your best hardware.
How does Elasticsearch know which servers are your best servers? You tell it, by assigning arbitrary tags to each server. For instance, you could start a node as follows:
./bin/elasticsearch --node.box_type strong
The box_type
parameter is completely arbitrary—you could have named it
whatever you like—but you can use these arbitrary values to tell
Elasticsearch where to allocate an index.
We can ensure that today’s index is on our strongest boxes by creating it with the following settings:
PUT /logs_2014-10-01
{
"settings": {
"index.routing.allocation.include.box_type" : "strong"
}
}
Yesterday’s index no longer needs to be on our strongest boxes, so we can move
it to the nodes tagged as medium
by updating its index settings:
POST /logs_2014-09-30/_settings
{
"index.routing.allocation.include.box_type" : "medium"
}
Yesterday’s index is unlikely to change. Log events are static: what happened in the past stays in the past. If we merge each shard down to just a single segment, it’ll use fewer resources and will be quicker to query. We can do this with the [optimize-api].
It would be a bad idea to optimize the index while it was still allocated to
the strong
boxes, as the optimization process could swamp the I/O on those
nodes and impact the indexing of today’s logs. But the medium
boxes aren’t
doing very much at all, so we are safe to optimize.
Yesterday’s index may have replica shards. If we issue an optimize request, it will optimize the primary shard and the replica shards, which is a waste. Instead, we can remove the replicas temporarily, optimize, and then restore the replicas:
POST /logs_2014-09-30/_settings
{ "number_of_replicas": 0 }
POST /logs_2014-09-30/_optimize?max_num_segments=1
POST /logs_2014-09-30/_settings
{ "number_of_replicas": 1 }
Of course, without replicas, we run the risk of losing data if a disk suffers
catastrophic failure. You may want to back up the data first, with the
{ref}/modules-snapshots.html[snapshot-restore
API].
As indices get even older, they reach a point where they are almost never accessed. We could delete them at this stage, but perhaps you want to keep them around just in case somebody asks for them in six months.
These indices can be closed. They will still exist in the cluster, but they won’t consume resources other than disk space. Reopening an index is much quicker than restoring it from backup.
Before closing, it is worth flushing the index to make sure that there are no transactions left in the transaction log. An empty transaction log will make index recovery faster when it is reopened:
POST /logs_2014-01-*/_flush (1)
POST /logs_2014-01-*/_close (2)
POST /logs_2014-01-*/_open (3)
Flush all indices from January to empty the transaction logs.
Close all indices from January.
When you need access to them again, reopen them with the open
API.
Finally, very old indices can be archived off to some long-term storage like a
shared disk or Amazon’s S3 using the
{ref}/modules-snapshots.html[snapshot-restore
API], just in case you may need
to access them in the future. Once a backup exists, the index can be deleted
from the cluster.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。