-
Notifications
You must be signed in to change notification settings - Fork 48
detect invalid regex in lexer #15
Comments
Hello :-), This will change with this PR #6. |
And additionnaly, this can be done separately in this block. |
PR #6 still has this issue on another side, I like the idea of detecting wrong regexes during pp parsing, like you suggest it, but I think having to test each regex at this stage has an overhead, and |
I'm adding the required check and benchmarking it. Here is the script I use to bench: <?php
require_once __DIR__ . '/vendor/autoload.php';
passthru(PHP_BINARY . ' -n -v');
$contents = <<<PP
#rule:
choice()
choice:
concatenation() ( ::or:: concatenation() #choice )*
concatenation:
repetition() ( repetition() #concatenation )*
repetition:
simple() ( quantifier() #repetition )? <node>?
simple:
::capturing_:: choice() ::_capturing::
| ::skipped:: <token> ( ::unification_:: <unification> ::_unification:: )?
::skipped:: #skipped
| ::kept_:: <token> ( ::unification_:: <unification> ::_unification:: )?
::_kept:: #kept
| <token> ::named::
( ::unification_:: <unification> ::_unification:: )? #named
quantifier:
<zero_or_one>
| <one_or_more>
| <zero_or_more>
| <n_to_m>
| <n_or_more>
| <exactly_n>
PP;
$bench = new Hoa\Bench\Bench();
$bench->one_iteration->start();
try {
$parser = \Hoa\Compiler\Llk\Llk::load(new \Hoa\File\Read(__DIR__ . '/Llk/Llk.pp'));
$parser->parse($contents);
} catch (\exception $e) {}
$bench->one_iteration->stop();
$bench->hundred_iterations->start();
for ($i = 100; $i > 0; $i--) {
try {
$parser = \Hoa\Compiler\Llk\Llk::load(new \Hoa\File\Read(__DIR__ . '/Llk/Llk.pp'));
$parser->parse($contents);
} catch (\exception $e) {}
}
$bench->hundred_iterations->stop();
echo $bench; Executing it with No PCRE check
With PCRE check
This was implemented using The check adds up to 30% computation time: this is too costly. I think this should only be enabled when running in a debug mode (see #14) |
Maybe we can check tokens once when “instanciating” the lexer. Is it where your placed it? |
ping @CircleCode? |
The regexes come from the tokens Lines 192 to 201 in fd6f3f9
These are supplied in Line 111 in fd6f3f9
The interesting places where
The first code calls Anyway i both cases it seems fine to place the check in lexMe and throw an exception. As for how to check if the regex is valid i created this feature request which describes it in more detail https://bugs.php.net/bug.php?id=77521 |
<?php
// returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred
// The @ is used (unfortunately) to stop the warnings in your error_log
// The ! revedrses the condition, so the real one will be true, the bad one false
function isRegex($string)
{
return !(@preg_match($string, null) === false);
}
var_dump(isRegex('/\w+/'));
var_dump(isRegex('thisis|64*%notRe3!ex/')); |
in
Lexer.php
,preg_match
could returnfalse
in case of regex error.the
false
return should be tested and explained at https://github.com/hoaproject/Compiler/blob/master/Llk/Lexer.php#L270Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: