A test setup must meet the following criteria:
The first two points are achieved through a small command-line interface
adapter script (a shim). The shim exposes the command-line interface of a Lua
interpreter, and internally it sets up environment variable to point Neovim at
a prepared blank directory structure. Neovim is then called with the -l
flag.
We do have to use some plugins though:
Both plugins are stored under the $XDG_DATA_HOME
directory, the former as a
symlink and the latter as a Git submodule.
As for process isolation, this is achieved inside the tests. We start a headless embedded Neovim instance which we control through MsgPack RPC from inside the test. We can control and probe this process only indirectly, which is awkward, but this is the best solution I could find.
We use busted for unit testing. A unit is a self-contained module which can
be used on its own independent of the editor. Execute make unit-test
to run
unit tests. The busted
binary must be available on the system $PATH
.
End-to-end tests run in a separate Neovim instance which we control via RPC.
These are tests which mutate the state of the editor, such as adding
highlighting on changes. Execute make e2e-test
to run all end to end tests.
To run tests the g:bustedprg
variable must be set to './test/busted'
, which
is the path to the shim script. If the exrc
option is set the variable will
be set automatically.
Strategies are given as a complex table, but a string identifier would have been much more pleasant on the eye. Which of these two is easier to read and write?
-- This?
settings = {
strategy = {
'global'
html = 'local'
}
}
-- Or this?
settings = {
strategy = {
require 'ts-rainbow.strategy.global'
html = require 'ts-rainbow.strategy.local'
}
}
Using strings might seem like the more elegant choice, but it it makes the code more complicated to maintain and less flexible for the user. With tables a user can create a new custom strategy and assign it directly without the need to "register" them first under some name.
More importantly though, we have unlimited freedom where that table is coming from. Suppose we wanted to add settings to a strategy. With string identifiers we now need much more machinery to connect a string identifier and its settings. On the other hand, we can just call a function with the settings are arguments which returns the strategy table.
settings = {
strategy = {
require 'ts-rainbow.strategy.global',
-- Function call evaluates to a strategy table
latext = my_custom_strategy {
option_1 = true,
option_2 = 'test'
}
}
}
Every query has to define a container
capture in addition to opening
and
closing
captures. As humans we understand the code at an abstract level, but
Tree-sitter works on a more concrete level. To a human the HTML tag <div>
is
one atomic object, but to Tree-sitter it is actually a container with further
elements.
Consider the following HTML snippet:
<div>
Hello
</div>
The tree looks like this (showing anonymous nodes):
element [0, 0] - [2, 6] start_tag [0, 0] - [0, 5] "<" [0, 0] - [0, 1] tag_name [0, 1] - [0, 4] ">" [0, 4] - [0, 5] text [1, 1] - [1, 6] end_tag [2, 0] - [2, 6] "</" [2, 0] - [2, 2] tag_name [2, 2] - [2, 5] ">" [2, 5] - [2, 6]
We want to highlight the lower-level nodes like tag_name
or start_tag
and
end_tag
, but we want to base our logic on the higher-level nodes like
element
. The @container
node will not be highlighted, we use it to
determine the nesting level or the relationship to other container nodes.
In order to correctly highlight containers we need to know the nesting level of
each container relative to the other containers in the document. We can use
the order in which matches are returned by the iter_matches
method of a
query. The iterator traverses the document tree in a depth-first manner
according to the visitor patter, but matches are created upon exiting a node.
Let us look at a practical example. Here is a hypothetical tree:
A ├─B │ └─C │ └─D └─E ├─F └─G
The nodes are returned in the following order:
We can only know how deeply nodes are nested relative to one another. We need to build the entire tree structure to know the absolute nesting levels. Here is an algorithm which can build up the tree, it uses the fact that the order of nodes never skips over an ancestor.
Start with an empty stack s = []
. For each match m
do the following:
s
up until we find a match m'
whose
@container
node is not a descendant of the container node of m
. Collect
the popped matches (excluding m'
) onto a new stack s_m
(order does not
matter)s_m
as the child match stack of m
m
to s
Eventually s
will only contain root-level matches, i.e. matches of nesting
level one. To apply the highlighting we can then traverse the match tree,
incrementing the highlighting level by one each time we descend a level.
The order of matches among siblings in the tree does not matter. The above
algorithm uses a stack when collecting children, but any unordered
one-dimensional sequence will do. The stack s
is important for determining
the relationship between nodes: since we know that no ancestors will be skipped
we can be certain that we can stop checking the stack for descendants of m
once we encounter the first non-descendant match. Otherwise we would have to
compare each match with each other match, which would tank the performance.
Consider the following bit of contrived HTML code:
<div id="Alpha">
<div id="Bravo">
<div id="Charlie">
</div>
</div>
<div id="Delta">
</div>
</div>
Supposed the cursor was inside the angle brackets of Bravo
, which tags
should we highlight? From eyeballing the obvious answer is Alpha
, Bravo
and Charlie
. Obviously Alpha
and Bravo
both contain the cursor within
the range, but how do we know that we need to highlight Charlie
? Charlie
is contained inside Bravo
, which contains the cursor, but on the other hand
Delta
is contained inside Alpha
, which also contains the cursor. We cannot
simply check whether the parent contains the cursor.
When working with the Tree-sitter API and iterating through matches and
captures we have no way of knowing that any of the captures within Charlie
are contained within Bravo
. However, due to the order of traversal we do
know that Bravo
is the lowest node to still contain the cursor.
Therefore we that the first match which contains the cursor is the lowest one. If a match does not contain the cursor we can check whether it is a descendant of the cursor container match.
The language tree of a buffer is a tree of parsers. Some languages like Markdown can contain other languages, which complicates things.
Extmarks move along with the text they belong to. This is generally a good thing, but it can become a problem if we move text from one language to another. Consider the following Markdown code:
Hello world
```lua
print {{{{}}}}
print {{{{}}}}
```
We can move the cursor to line 4 and move that line out of the Lua block by
executing :move 1
to move it to the second line. However, this will preserve
the extmarks and we will end up with Lua delimiter highlighting inside
Markdown.
My solution is on every change to delete all rainbow delimiter extmarks which do not belong to the current language.
Take the following Markdown code:
Hello world
```c
puts("This is an injected language")
{
{
{
{
{
return ((((((2)))))) + ((((3))))
}
}
}
}
}
```
If we put the cursor on the line with the puts
statement and move it up one
line (:move -2
) we get the following changes:
{ 2, 0, 3, 0 }
This means lines 3 and 4 of the Markdown tree have changed; we have changed the
contents of the fifth line and added one more line. This is all as expected.
However, let us now move the line back down by executing :move +1
. We get
the following changes:
{ 3, 0, 15, 0 }
{ 3, 0, 4, 0 }
The changes to the C tree are what we expect. However, the changes to the Markdown tree span the code block as well. This is a problem when we start deleting foreign extmarks (see above). If we work from the outside we wipe out all non-Markdown extmarks in the range, which includes the C extmarks. Then we apply the C extmarks inside the C block, but the C change does not span the entire C tree. Thus we will only apply highlighting to the changed C line, but not the remainder of the C block.
The solution at the moment is to overwrite the changes of nested languages. If the changes belong to a language tree with parent language we replace all the changes with a range that spans the entire tree for that language.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。