Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewriter reuse example? #82

Open
jbr opened this issue Mar 16, 2021 · 7 comments
Open

Rewriter reuse example? #82

jbr opened this issue Mar 16, 2021 · 7 comments

Comments

@jbr
Copy link

jbr commented Mar 16, 2021

I'm trying to understand how and if users are intended to express a set of element handlers to be reused on multiple inputs, across multiple threads. In particular, I'm interested in applying a given transformation to a bunch of html byte streams, but the lifetimes and borrows in the public interface make it difficult to see how to achieve that. Is there example code out there, or a part of the documentation that's relevant to this usage pattern?

Thanks!

@jyn514
Copy link
Contributor

jyn514 commented Dec 17, 2021

Hmm, this seems to work fine as-is - I ran examples/mixed_content_rewriter on this webpage twice with the same rewriter and it produced two copies of the output. Is there something in particular giving you trouble?

@jbr
Copy link
Author

jbr commented Dec 18, 2021

The issue is when a user might want to define the rewriting rules on one thread and use them on another (or several others) — because even the rule definitions are !Send, it's unclear how to use this crate in a multithreaded context, or even a 'static context in which the rules need to be owned somewhere other than where the end user defines them.

An example. The workaround I ended up settling on was to have the user provide a Fn() -> Settings<'static, 'static> which is run on each thread. It's an awkward interface, but I imagine the end result of this issue is that lol-html is not the appropriate library to use for an async multithreaded html rewriter

It's also worth noting that at the time of filing this, #69 had not yet been released

@jyn514
Copy link
Contributor

jyn514 commented Dec 21, 2021

It's true that HtmlRewriter is !Send and !Sync, yes. I'm looking into whether it's possible to remove the Rc<RefCell> around the output sink.

@jyn514
Copy link
Contributor

jyn514 commented Dec 21, 2021

Oh hmm, I missed that Settings is also not Send or Sync, that seems especially unfortunate. I think it would be a breaking change to fix that unfortunately ... the issue is that ElementContentHandlers takes a user-provided function and we don't require that to be Send+Sync.

@jyn514
Copy link
Contributor

jyn514 commented Dec 21, 2021

@jbr hmm, I'm confused though - how would it be useful to share the rewriter between threads? How would you synchronize writing the input so you know the output makes sense?

@jbr
Copy link
Author

jbr commented Dec 22, 2021

There are a few distinct concerns here, and I'm not entirely sure that any of them are specific suggestions or feature requests

for any multithreaded application

A user defines 'static Settings on one thread and hands it to a library, which then is able to send the Settings or clones of the Settings to other threads or an Arc of the settings to other threads, which can then use those same settings in their own rewriters. The simplest version of this would create a new rewriter for each http request and drop it at the end of rewriting. Depending on how cheap creating rewriters is, it might make sense to use some sort of rewriter pool, either thread-locally or shared. This seems relevant to sync rust as well as async.

async use specifically

Ideally the rewriters would also be Send but I imagine that introduces unacceptable performance costs to the other users of this crate. The motivation behind the Send bound is that use with most of the async ecosystem requires being able to move a Future that contains the rewriter between work-sharing executor threads. Most async runtimes (tokio, smol, async-std) also offer local (non-work-sharing) executors, which is how lol-async works at all, but the Send bound is prevalent in async contexts, and spawn_local is unstable on async-std.

@jyn514
Copy link
Contributor

jyn514 commented Dec 23, 2021

Ah, so for rewriters you actually only need them to be Send, not Sync?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants