mregexp is a small utf8 compatible regex library consisting of only two files written for C99/C++11.
MRegexp *re = mregexp_compile("[0-9]+");
If an error occurs mregexp_compile
returns NULL. To get the specific error code use mregexp_error
. Error values and their meaning can be found in mregexp.h.
// Lets find the first sequence of digits in a string
const char *s = "hello 12345 world";
MRegexp *re = mregexp_compile("\\d");
MRegexpMatch m;
if (mregexp_match(re, s, &m)) {
printf("Found digits at position %lu\n", m.match_begin);
} else {
printf("Could not find any digits\n");
}
// Compiled regular expressions are stored on the heap
// and must be freed
mregexp_free(re);
The MRegexpMatch
type looks somewhat like this:
typedef struct {
size_t match_begin;
size_t match_end;
} MRegexpMatch;
The match_begin
field represents a byte offset in the matched string to the first occurence of a pattern, so that s + m.match_begin
points to the beginning of the match. match_end
is a byte offset in the matched string to the first byte which did not match the pattern.
First of all, mregexp is still in a very early stage of development.
To use mregexp you will need two files: mregexp.c
and mregexp.h
. Include mregexp.h
wherever you wish to use it. mregexp.c
can be compiled independently into an object file and then be linked with your project.
mregexp comes with a few tests to ensure that changes won't break anything. To run the tests you'll need libcheck. Then just run
make test
Metacharacter | Description |
---|---|
c | Most characters (like c) match themselve literally |
\c | Some characters are used as metacharacters. To use them literally escape them |
\n \t \r | newline, tab, carriage return |
\d \s \w | digit, whitespace, alphanumeric character (a-z, A-Z, 0-9 and _) |
\D \S \W | do not match the groups described above |
. | Matches any character (including newline) |
* | Matches the preceding token as often as possible |
+ | Matches the preceding token at least once and as often as possible |
{m,n} | Matches the preceding token at least m times and at most n times. m and n may be ommited to ignore the min or max value. |
(c) | Matches the expression inside the parentheses. |
[c] | Matches all characters inside the brackets. Ranges like a-z may also be used |
[^c] | Does not match the characters inside the bracket. |
| | Matches either the expression before the | or the expression after it |