How to use peg for streams of data - or how to implemented streaming in peg #326
Replies: 2 comments 4 replies
-
Parsing Expression Grammars aren't particularly compatible with streaming. Backtracking allows returning all the way to the beginning of the input (preventing discarding early input), while the parse function as a whole returns a single result (preventing returning an early result before reaching the end). It looks like you are parsing a sequence of messages. What you can do is wrap the PEG in a loop that parses a single message at a time. If your messages have a delimiter you can split on with Assuming the protocol marks the end of messages, you can detect an incomplete message when the error position returned is equal to the length of the input because matches will fail at that position. Or you could even maybe use a custom |
Beta Was this translation helpful? Give feedback.
-
Same here, thanks for the library.
So, in my case, this particular thing I'm parsing doesn't really have any framing (though I did work around that in the end) and I was hoping I'd be able to cheat by checking if the location reported by peg::parser! {
grammar parser() for [u8] {
#[no_eof]
pub rule test1() = "a" "b" "c"
#[no_eof]
pub rule test2() = "a" "bc"
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test1() {
assert!(matches!(
parser::test1(b"ab"),
Err(peg::error::ParseError { location: 2, .. })
))
}
#[test]
fn test2() {
assert!(matches!(
parser::test2(b"ab"),
Err(peg::error::ParseError { location: 1, .. }) // I hoped this would be 2.
))
}
} I guess the difference is to be expected given how the rules are declared but has the parser really no way to report it could have partially matched the token/rule? |
Beta Was this translation helpful? Give feedback.
-
Hi,
first of all let me thank you for your wonderful library. It saved me days of implementation (and lots of pain) for implementing a parser for a proprietary binary protocol.
I was wondering what it would take to use
peg
for streams of data. I am receiving a stream of potentially incomplete or corrupt data and I am interested in only the valid parts of that stream. Currently, I have implemented a very rudimentary parser for my protocol to detect beginning and end of a message (though quite erroneous). My questions arepeg
to filter out only the interesting parts of the stream? I can only speculate, but probably it would make sense to look at the location where errors occur to decide what to do (e.g. move on because error is in the end, skip these bytes because error is at the beginning, read more data because error is in the middle).I am very interested about your thoughts. Btw the project I am working on is located here: https://github.com/torfmaster/hackdose-sml-parser.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions