| Age | Commit message (Collapse) | Author |
|
These patterns cause infinite loops, so warn about them and skip them.
|
|
Previously the "dirty" version of the pattern was used, which could
result in trying to match with multiple `^`, which failed valid matches.
|
|
|
|
|
|
* Allow empty groups as first match in tokenizer
* Avoid pushing tokens with empty strings
* Allow groups to be used in end delimiter in tokenizer
* Use the first entry of the type table for the middle part of a subsyntax
This applies to delimited matches with a table for `type` and without a
`syntax` field.
* Match only once if using `at_start` in tokenizer `find_text`
* Check if match is escaped in the "close" case too
Also allow continuing matching if the match was escaped.
|
|
|
|
`regex.match` now behaves like `string.match`.
This required changes in the `tokenizer` and in the `detectindent`
plugin.
|
|
|
|
|
|
* tokenizer: remove the limit of 3 subsyntaxes depth
Make the state a string of bytes instead of a 32bits integer to be able
to have deeper subsyntax support. Fixes issues with syntax files like
the one for PHP that was already hitting more than 3 subsyntaxes depth.
* remove unnecesary call to set_subsyntax_pattern_idx
* fixed wrong word on comments
|
|
Add more tokenizer errors/warnings
|
|
Check if "open" pattern is escaped
|
|
|
|
|
|
If the token type was a simple string (and not a table), the size of the
string was used instead of `1`.
|
|
Previously this check was only done for "close" patterns.
|
|
|
|
The number of results from a pattern with groups must never be greater
than the number of token types for that pattern.
Also if a token type was undefined, it's now pushed as a `normal` one.
|
|
|
|
Before, this was only supported by Lua patterns.
This expects the regex to use the same syntax used for patterns. That
is, the token should be split by empty groups.
|
|
|
|
* add utf8 support to tokenizer
* wrap utf8 functions in string table using a 'u' prefix
* document new utf8 functions
|
|
Before, syntax patterns/regexes that started with `^` didn't have the
desired effect of matching with the start of the line.
Now those patterns are used only when matching the whole line.
|
|
|
|
|
|
We must consume the whole UTF-8 character, not just a single byte.
|
|
|
|
|
|
When moving to the next character, we have to consider that the current
one might be multi-byte.
|
|
I have no idea unpack() is still used and how it still worked.
|
|
Use regular expressions instead of Lua patterns for find and replace editor commands.
Syntax files can now use regex or Lua patterns as before keeping backward compatibility for plugins.
|
|
* Cleaned up tokenizer to make subsyntax operations more clear.
* Explanatory comments.
* Made it so push_subsyntax could be safely called elsewhere.
* Unified terminology.
* Minor bug fix.
* State is an incredibly vaguely named variable. Changed convention to represent what it actually is.
* Also changed function name.
* Fixed bug.
|
|
|
|
|
|
|
|
This, along with the earlier rencache changes should resolve #64
|
|
* Only one highlighter state is kept per-document as opposed
to one per-docview
* Fixes a bug with retaining older highlighter state as a
DocView wasn't able to detect lines changing above it's viewport
* Renames `highlighter` module to more descriptive `tokenizer`
|