Implement Iterator for the query interface #83

stevepryde · 2022-01-09T13:12:32Z

A lot of the querying and filtering related to ElementQuery could probably be better expressed through an iterator (technically an async stream, but it may not be possible yet).

This opens up new possibilities as well. Suppose you don't just want to grab only the elements that exist right now. Suppose instead that you want to iterate over all elements matching a query that show up over the next 10 seconds (but you don't want to wait 10 seconds before starting). With an iterator we can do this.

First we run the query and add ALL results to an internal Vec.
Then on next() we can re-run the query and add any new (not yet seen) elements to the end of the Vec
Then we yield the next item from the front (actually a VecDeque might make more sense but that can be solved later).
It also needs to remember elements it has already yielded. A HashSet should work for this part.

I think this kind of thing would be pretty powerful. The challenge (as always) is to provide the maximum utility and minimum complexity.

A variation on this idea is to move the polling to a separate async task, so that (at least for a multithreaded executor) the polling can happen completely in parallel with the iterator. That would be nice 😃 Maybe overkill, but very cool!

The text was updated successfully, but these errors were encountered:

audioXD · 2022-02-02T12:48:20Z

So... you want something that works like this?

In ElementQuery

pub async fn all_stream(
    self,
) -> WebDriverResult<
    impl futures::TryStream<Ok = WebElement<'handle>, Error = WebDriverError> + 'by,
>
where
    'handle: 'by,
{
    use futures::stream::TryStreamExt;

    let mut set = std::collections::HashSet::<ElementId>::new();
    let stream = futures::stream::try_unfold(self, move |s| async move {
        let stream = futures::stream::iter(
            s.run_poller(false).await?.into_iter().map(|e| WebDriverResult::<_>::Ok(e)),
        );
        WebDriverResult::<_>::Ok(Some((stream, s)))
    })
    .try_flatten()
    .try_filter(move |el: &'_ WebElement<'_>| {
        futures::future::ready(set.insert(el.element_id.clone()))
    });

    Ok(Box::pin(stream))
}

I'm just uncertain on how the errors should be handled and how it should time out...

NOTE: this is just a quick implementation, it compiles but i don't know how weell it works

Use case:

{
    let mut stream = driver.query(By::Css("p")).all_stream().await?;
    while let Some(el) = stream.try_next().await? {}
}

It probably also isn't Send + Sync so thats has to be fixed

stevepryde · 2022-02-03T10:07:22Z

Yeah something like that. I suppose it doesn't really make sense to return an error when no more elements are found. In that case it should just end the iterator cleanly.

My hypothetical use-case was that instead of doing this:

let elem = driver.query(By::Css("p")).with_attribute("name", "test).and_displayed().first().await?;

you could do this:

let elem = driver.query_iter(By::Css("p")).filter(element_has_attribute("name", "test")).filter(element_is_displayed()).next().await?;

However something like this probably needs async iterators, which probably won't be available for some time.
There's also the fact that the original query() interface is more concise.

I like your implementation so far. That's probably the right approach I think.

stevepryde · 2022-07-24T12:08:58Z

Just updating this to match the current codebase:

    pub async fn all_stream(
        self,
    ) -> WebDriverResult<impl futures::TryStream<Ok = WebElement, Error = WebDriverError>> {
        use futures::stream::TryStreamExt;

        let mut set = std::collections::HashSet::<ElementRef>::new();
        let stream = futures::stream::try_unfold(self, move |s| async move {
            let stream = futures::stream::iter(
                s.run_poller(false).await?.into_iter().map(|e| WebDriverResult::<_>::Ok(e)),
            );
            WebDriverResult::<_>::Ok(Some((stream, s)))
        })
        .try_flatten()
        .try_filter(move |el: &'_ WebElement| futures::future::ready(set.insert(el.element_id())));

        Ok(Box::pin(stream))
    }

Still TODO: documentation, tests, and the initial concerns resolved:

I'm just uncertain on how the errors should be handled and how it should time out...

I think this should wait until at least one element is found (using the current poller semantics), and then continue until a find request is made that returns no new (not-yet-seen) elements.

It probably also isn't Send + Sync so thats has to be fixed

Something can be added to the unit test at the bottom to verify this

stevepryde · 2022-10-16T09:44:00Z

Just a small update to what I think the semantics should be.

I think if there is a use-case for this it would be for streaming items that may show up over time as they are loaded in. Otherwise it would make more sense to just do an all() query and iterate over the vec.

Thus, what this method should do (if we decide we want it) is keep polling over the full duration given by the poller, streaming elements as they are found, and ending the stream once the timeout is reached. It may make sense to use a custom poller here, and we may even want the ability to specify how long to wait between elements.

I think this probably needs a more concrete use-case before we proceed though.

stevepryde added the enhancement New feature or request label Jul 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Iterator for the query interface #83

Implement Iterator for the query interface #83

stevepryde commented Jan 9, 2022

audioXD commented Feb 2, 2022 •

edited

stevepryde commented Feb 3, 2022

stevepryde commented Jul 24, 2022

stevepryde commented Oct 16, 2022

Implement Iterator for the query interface #83

Implement Iterator for the query interface #83

Comments

stevepryde commented Jan 9, 2022

audioXD commented Feb 2, 2022 • edited

stevepryde commented Feb 3, 2022

stevepryde commented Jul 24, 2022

stevepryde commented Oct 16, 2022

audioXD commented Feb 2, 2022 •

edited