Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Iterator for the query interface #83

Open
stevepryde opened this issue Jan 9, 2022 · 4 comments
Open

Implement Iterator for the query interface #83

stevepryde opened this issue Jan 9, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@stevepryde
Copy link
Owner

A lot of the querying and filtering related to ElementQuery could probably be better expressed through an iterator (technically an async stream, but it may not be possible yet).

This opens up new possibilities as well. Suppose you don't just want to grab only the elements that exist right now. Suppose instead that you want to iterate over all elements matching a query that show up over the next 10 seconds (but you don't want to wait 10 seconds before starting). With an iterator we can do this.

  • First we run the query and add ALL results to an internal Vec.
  • Then on next() we can re-run the query and add any new (not yet seen) elements to the end of the Vec
  • Then we yield the next item from the front (actually a VecDeque might make more sense but that can be solved later).
  • It also needs to remember elements it has already yielded. A HashSet should work for this part.

I think this kind of thing would be pretty powerful. The challenge (as always) is to provide the maximum utility and minimum complexity.

A variation on this idea is to move the polling to a separate async task, so that (at least for a multithreaded executor) the polling can happen completely in parallel with the iterator. That would be nice 😃 Maybe overkill, but very cool!

@audioXD
Copy link
Contributor

audioXD commented Feb 2, 2022

So... you want something that works like this?

In ElementQuery

pub async fn all_stream(
    self,
) -> WebDriverResult<
    impl futures::TryStream<Ok = WebElement<'handle>, Error = WebDriverError> + 'by,
>
where
    'handle: 'by,
{
    use futures::stream::TryStreamExt;

    let mut set = std::collections::HashSet::<ElementId>::new();
    let stream = futures::stream::try_unfold(self, move |s| async move {
        let stream = futures::stream::iter(
            s.run_poller(false).await?.into_iter().map(|e| WebDriverResult::<_>::Ok(e)),
        );
        WebDriverResult::<_>::Ok(Some((stream, s)))
    })
    .try_flatten()
    .try_filter(move |el: &'_ WebElement<'_>| {
        futures::future::ready(set.insert(el.element_id.clone()))
    });

    Ok(Box::pin(stream))
}

I'm just uncertain on how the errors should be handled and how it should time out...

NOTE: this is just a quick implementation, it compiles but i don't know how weell it works

Use case:

{
    let mut stream = driver.query(By::Css("p")).all_stream().await?;
    while let Some(el) = stream.try_next().await? {}
}

It probably also isn't Send + Sync so thats has to be fixed

@stevepryde
Copy link
Owner Author

Yeah something like that. I suppose it doesn't really make sense to return an error when no more elements are found. In that case it should just end the iterator cleanly.

My hypothetical use-case was that instead of doing this:

let elem = driver.query(By::Css("p")).with_attribute("name", "test).and_displayed().first().await?;

you could do this:

let elem = driver.query_iter(By::Css("p")).filter(element_has_attribute("name", "test")).filter(element_is_displayed()).next().await?;

However something like this probably needs async iterators, which probably won't be available for some time.
There's also the fact that the original query() interface is more concise.

I like your implementation so far. That's probably the right approach I think.

@stevepryde
Copy link
Owner Author

Just updating this to match the current codebase:

    pub async fn all_stream(
        self,
    ) -> WebDriverResult<impl futures::TryStream<Ok = WebElement, Error = WebDriverError>> {
        use futures::stream::TryStreamExt;

        let mut set = std::collections::HashSet::<ElementRef>::new();
        let stream = futures::stream::try_unfold(self, move |s| async move {
            let stream = futures::stream::iter(
                s.run_poller(false).await?.into_iter().map(|e| WebDriverResult::<_>::Ok(e)),
            );
            WebDriverResult::<_>::Ok(Some((stream, s)))
        })
        .try_flatten()
        .try_filter(move |el: &'_ WebElement| futures::future::ready(set.insert(el.element_id())));

        Ok(Box::pin(stream))
    }

Still TODO: documentation, tests, and the initial concerns resolved:

I'm just uncertain on how the errors should be handled and how it should time out...

I think this should wait until at least one element is found (using the current poller semantics), and then continue until a find request is made that returns no new (not-yet-seen) elements.

It probably also isn't Send + Sync so thats has to be fixed

Something can be added to the unit test at the bottom to verify this

@stevepryde stevepryde added the enhancement New feature or request label Jul 31, 2022
@stevepryde
Copy link
Owner Author

Just a small update to what I think the semantics should be.

I think if there is a use-case for this it would be for streaming items that may show up over time as they are loaded in. Otherwise it would make more sense to just do an all() query and iterate over the vec.

Thus, what this method should do (if we decide we want it) is keep polling over the full duration given by the poller, streaming elements as they are found, and ending the stream once the timeout is reached. It may make sense to use a custom poller here, and we may even want the ability to specify how long to wait between elements.

I think this probably needs a more concrete use-case before we proceed though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants