Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only find one match per position #1

Open
kareman opened this issue May 21, 2020 · 3 comments
Open

Only find one match per position #1

kareman opened this issue May 21, 2020 · 3 comments

Comments

@kareman
Copy link

kareman commented May 21, 2020

Hi, nice library, I especially like the syntax.

Is there any way of parsing only one match per position? Now when I run code like this:

let myany = LazilyRepeating(one() as Wildcard<String>)
let token = Token("."  myany  Literal(" ")  myany  " ")
let parser = myany  token
parser.forwardMatches(enteringFrom: Match<String>(over: #" .1  slkjdf.2 .3  "#)).forEach {
	print($0.captures(for: token))
}

it seems to find all possible matches for each position:

[".1  "]
[".1  slkjdf.2 "]
[".1  slkjdf.2 .3 "]
[".1  slkjdf.2 .3  "]
[".1  slkjdf.2 "]
[".1  slkjdf.2 .3 "]
[".1  slkjdf.2 .3  "]
[".1  slkjdf.2 .3 "]
[".1  slkjdf.2 .3  "]
[".1  slkjdf.2 .3  "]
[".2 .3 "]
[".2 .3  "]
[".2 .3  "]
[".3  "]

How do I tell LazilyRepeating to only find the shortest possible match?

@ctxppc
Copy link
Owner

ctxppc commented Jun 3, 2020

Thanks for your kind words!

I haven’t had much time recently to continue implementing a few more features I had in mind, and especially document a few things! (But I’m definitely planning to!)

Token is a pattern class that is supposed to be used at most once in a given pattern. When used on two or more places, it can capture multiple sequences per match but it’s not well-defined. The multiple-captures features is only well-defined when a single token object is used within some kind of repeating pattern (like LazilyRepeating): in that case it blindly follows the matching semantics of the repeating pattern.

I see multiple uses of the token within the pattern. Did you instead mean to use a back-reference, which matches a subsequence that a previous token matched? The read-me mentions Referencing but I haven’t gotten around to implementing it yet.

LazilyRepeating already tries to match its sub-pattern as few times as possible, or in your example, by applying the wildcard zero times, once, twice, thrice, and so. In your example output, it’s the first, second, third, … element. forwardMatches(…) returns a lazily evaluated array for every possible match, so you can just take the first match to get the match with the least repetitions.

However, forwardMatches(enteringFrom:) is a foundational method that returns partial matches (matches that might not cover the whole string yet). Larger patterns build on top of these partial matches. The matches(over:) method (in Pattern) does exclude all partial matches, and is the method for “client use”. :)

@kareman
Copy link
Author

kareman commented Jul 7, 2020

Hi, and sorry about the very late reply. I liked the syntax in this project so much I implemented something similar in my PEG parser at https://github.com/kareman/Patterns . It made the API far easier to work with (as soon as you find out how to type • and ¿ 😄) .

@ctxppc
Copy link
Owner

ctxppc commented Jul 7, 2020

Looks nice, especially its strong typing and its VM approach! 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants