lexy
lexy
is a parser combinator library for C++17 and onwards.
It allows you to write a parser by specifying it in a convenient C++ DSL,
which gives you all the flexibility and control of a handwritten parser without all the manual work.
#include <lexy/dsl.hpp>
namespace dsl = lexy::dsl;
// Parse an IPv4 address into a `std::uint32_t`.
struct ipv4_address
{
// What is being matched.
static constexpr auto rule = []{
// Match a sequence of (decimal) digits and convert it into a std::uint8_t.
auto octet = dsl::integer<std::uint8_t>(dsl::digits<>);
// Match four of them separated by periods.
return dsl::times<4>(octet, dsl::sep(dsl::period));
}();
// How the matched output is being stored.
static constexpr auto value
= lexy::callback<std::uint32_t>([](lexy::times<4, std::uint8_t> octets) {
std::uint32_t result = 0;
for (auto o : octets)
{
result <<= 8;
result |= o;
}
return result;
});
};
Unlike parser generators that use some table driven magic for parsing, `lexy’s grammar is just syntax sugar for a hand-written recursive descent parser. The parsing algorithm does exactly what you’ve instructed it to do. No more ambiguities or weird shift/reduce errors!
No need to use an external grammar file, embed the DSL directly in your C++ using operator overloading and functions.
It will only backtrack when you say it should, and only lookahead when and how far you want it. Don’t worry about rules that have side-effects, they won’t be executed unnecessarily thanks to the user-specified lookahead conditions.
The input is parsed into the data structures you’ve provided. It will not do heap allocations to store output unless you’ve instructed it to do so. You can even evaluate the input on the fly, without storing anything.
On a parse error, it will invoke a user-defined callback with information about what went wrong and during which production.
Custom error messages can be injected using the special dsl::error
, dsl::require
and dsl::prevent
error.
Write parse rules that detect common mistakes and issue appropriate diagnostics!
You can parse UTF-8, UTF-16, or UTF-32.
lexy
takes care of code point encoding and decoding as necessary, as well as endianness and byte-order marks.
Want to match a string literal containing arbitrary Unicode code points or \u21D4
and store the result in a std::string
?
You can do so out of the box.
constexpr
parsingYou want to parse a string literal at compile-time? You can do so.
The core parsing library only depends on the required headers such as <type_traits>
or <cstddef>
.
Some input classes required <cstdio>
.
By necessity, not by choice — it’s constexpr
after all.
The following features are in various stages of development and will be added before the 1.0.0 release.
Figure out why the grammar isn’t working the way you want it to.
Parse operators with different precedences using Pratt parsing.
Reserve a set of keywords that won’t be matched as regular identifiers.
Log an error, recover, and continue parsing!
lexy
over XYZ?lexy
is closest to other PEG parsers.
However, they usually do more implicit backtracking, which can hurt performance and you need to be very careful with rules that have side-effects.
This is not the case for lexy
, where backtracking is controlled using branch conditions.
The main difference: it is not a Boost library.
Otherwise, it is just a different implementation with a different flavor.
Use lexy
if you like lexy
more.
PEGTL is very similar and was a big inspiration.
The biggest difference is that lexy
uses an operator based DSL instead of inheriting from templated classes as PEGTL does;
depending on your preference this can be an advantage or disadvantage.
Writing a handwritten parser is more manual work and error prone.
lexy
automates that away without having to sacrifice control.
You can use it to quickly prototype a parser and then slowly replace more and more with a handwritten parser over time.
They’re not as bad as you might expect (in debug mode, that is).
Compiling the example JSON parser with any of the lexy specific things removed,
i.e. just the datastructure built using std::variant
and std::map
, takes about one second one my machine.
The entire parser takes about two seconds if you disable force inline on the parse productions.
With force inline, it takes about five seconds.
Compile time benchmarks and optimizations are planned.
Keep in mind, that you can fully isolate lexy
in a single translation unit that only needs to be touched when you change the parser.
They’re certainly worse than the error message lexy
gives you.
The big problem here is that the first line gives you the error, followed by dozens of template instantiations, which end at your lexy::parse
call.
Besides providing an external tool to filter those error messages, there is nothing I can do about that.
The library is currently not optimized and does not feature benchmarks. However, as it just parses what you specify, performance should be comparable to the corresponding hand-written parser. In preliminary benchmarks, I can validate JSON in ~400MB/s.
I previously had a tokenizer library called foonathan/lex
.
I’ve tried adding a parser to it, but found that the line between pure tokenization and parsing has become increasingly blurred.
lexy
is a re-imagination on of the parser I’ve added to foonathan/lex
, and I’ve simply kept a similar name.
The library uses CMake as its build system.
Simply put it somewhere and use add_subdirectory()
to make the following targets available
foonathan::lexy::core
This target is required.
It is an INTERFACE
target that sets the required include path and C++ standard flags.
foonathan::lexy::file
Link to this library if you want to use the (not header only) lexy::read_file()
functionality.
foonathan::lexy
Umbrella target that links to all other targets.
Configuration is supported by providing a lexy_user_config.hpp
somewhere in the include search path,
or setting the LEXY_USER_CONFIG_HEADER
CMake option to a header path.
This header can then override many of the detections in lexy/_detail/config.hpp
.
Refer to that header for details.
The library is continuously tested on GCC 7 or higher, clang 6 or higher, as well as clang-cl. It requires C++17 support, but works best with C++20. Building the tests with MSVC fails, but simple examples might work. If you want to use it on Windows, it is recommended to use clang-cl instead.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。