Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Draft iterable hooks implementation #98

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

goodboy
Copy link
Contributor

@goodboy goodboy commented Nov 12, 2017

Example/untested draft implementation for discussion in #50.
Ping @fschulze too.

pluggy/__init__.py Outdated Show resolved Hide resolved
res = hook_impl.function(*args)
if res is not None:
results.append(res)
yield res
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is literally the only thing added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that bit feeels really fishy because this is a massive mixup of concepts (sometimes minimal changes are the absolute antithesis of responsible software development)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uppon closer inspection tihs mixes iteration with result returning as object, instinctuvely this cant possibly be conceptually sound

Copy link
Contributor Author

@goodboy goodboy Nov 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed I'm just pushing out a quick version.
Obviously the final cut should get rid of anything from the original function which isn't necessary in the generator.

If you're pointing out that the return on the last line does nothing you'd be right.

Copy link
Member

@nicoddemus nicoddemus Nov 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uppon closer inspection tihs mixes iteration with result returning as object, instinctuvely this cant possibly be conceptually sound

I'm not sure what you mean, the implementation is very similar to the existing one except for the yield part and removing the return at the end as @goodboy commented.

except StopIteration:
pass

return outcome.get_result()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line currently does nothing and should be removed.

@goodboy goodboy force-pushed the iter_results branch 2 times, most recently from a7c2b02 to e6d7bd5 Compare November 12, 2017 23:19
@goodboy
Copy link
Contributor Author

goodboy commented Nov 12, 2017

Just added a more correct/complete integration with the manager and a test to verify.

Still rough but a starting point nonetheless.
I don't think I'm a big fan of having two sets of hooks pm.hook and pm.ihook I'd rather just have an explicit mark with indicates whether the hook is iterable or not but maybe it's good to have both?

Also what about supporting hooks as generators? Bad idea? We'd probably only support it for py3 anyway since we can use yield from.

@nicoddemus
Copy link
Member

@tgoodlet nice, thanks! This is a good starting point.

I don't think I'm a big fan of having two sets of hooks pm.hook and pm.ihook I'd rather just have an explicit mark with indicates whether the hook is iterable or not but maybe it's good to have both?

Currently hook calls either return a list (normal hooks) or a single value (hooks marked as firstresult), correct? I don't see the value of having hook and ihook, I believe we should change the semantics of the first case, normal hooks will always return an iterator. It is easy enough to call list(it) if one needs a list.

@goodboy
Copy link
Contributor Author

goodboy commented Nov 12, 2017

@nicoddemus so techincally we could jig this such that pluggy.callers._multicall() is always a generator and if a user wants a non-iterable call we just do:
return next(_multicall())
where internally there's just an if iterate: which skips the yield line.

But then regular calls will be slower. However, if all of this is cythonized it may not matter.

I believe we should change the semantics of the first case, normal hooks will always return an iterator. It is easy enough to call list(it) if one needs a list.

Yeah only problem is that's a low slower.

@nicoddemus
Copy link
Member

if a user wants a non-iterable call we just do

Hmm sorry what you mean, could you clarify?

But then regular calls will be slower

Oh is this the reason why we are having both hook and ihook since the beginning?

@goodboy
Copy link
Contributor Author

goodboy commented Nov 12, 2017

if a user wants a non-iterable call we just do
Hmm sorry what you mean, could you clarify?

We could theoretically make _multicall and _itercall the same function (which contains a yield making it a generator function) and simply skip that yield expression if the iterate flag is False. If you want to call a generator function and have it behave the same as a regular function you'd need to do next(gen_func(*args)) but this is about 4-5 times slower then a regular function call (at least according to my micro benches in py3.6.3).

Oh is this the reason why we are having both hook and ihook since the beginning?

Precisely.
But what I'm thinking now is that maybe we can just cythonize the function and have it remain a single definition, although I haven't ever tried using yield in a cython func.

That being said, regarding cython, I'd always thought a cythonized call loop would be an optional dependency.

@nicoddemus
Copy link
Member

Precisely.

OK thanks for confirming! 👍

but this is about 4-5 times slower then a regular function call (at least according to my micro benches in py3.6.3).

This is really strange. I remember your benchmark and it seemed correct.

I did a quick benchmark unrelated to pluggy, just to show the speed difference between constructing a list vs using a generator:

def f():
    r = []
    for i in range(100):
        r.append(i)
    return r

def g():
    for i in range(100):
        yield i
X:\>python --version
Python 3.6.1 :: Continuum Analytics, Inc.

X:\>python -m timeit -s "from perf import f, g" "f()"
100000 loops, best of 3: 6.79 usec per loop

X:\>python -m timeit -s "from perf import f, g" "for i in g(): pass"
100000 loops, best of 3: 4.59 usec per loop

And this results match my expectations. Perhaps is there something else in pluggy's code that might be causing this difference?

@nicoddemus
Copy link
Member

What if we time the two approaches to measure them? We can use the code from your test as basis:

from pluggy import PluginManager, HookspecMarker, HookimplMarker

hookspec = HookspecMarker("example")
hookimpl = HookimplMarker("example")

pm = PluginManager("example")

class Hooks(object):
    @hookspec
    def he_method1(self, arg):
        pass

pm.add_hookspecs(Hooks)


class Plugin1(object):
    @hookimpl
    def he_method1(self, arg):
        pass

class Plugin2(object):
    @hookimpl
    def he_method1(self, arg):
        pass

class Plugin3(object):
    @hookimpl
    def he_method1(self, arg):
        pass

class Plugin4(object):
    @hookimpl
    def he_method1(self, arg):
        pass

class PluginWrapper(object):
    @hookimpl(hookwrapper=True)
    def he_method1(self, arg):
        yield

pm.register(Plugin1())
pm.register(Plugin2())
pm.register(Plugin3())
pm.register(Plugin4())
pm.register(PluginWrapper())
X:\>python -m timeit -s "from perf2 import pm" "for x in pm.hook.he_method1(arg=None): pass"
100000 loops, best of 3: 8.19 usec per loop

@nicoddemus
Copy link
Member

nicoddemus commented Nov 13, 2017

Duh, why don't I do it myself? 😁

Forked your branch and get nearly the same timings:

{env35} X:\>python -m timeit -s "from perf2 import pm" "for x in pm.hook.he_method1(arg=None): pass"
100000 loops, best of 3: 8.15 usec per loop

{env35} X:\>python -m timeit -s "from perf2 import pm" "for x in pm.ihook.he_method1(arg=None): pass"
100000 loops, best of 3: 8.28 usec per loop

Unless I'm missing something, their performance is nearly identical, so IMHO we should go to just change hook to always return an iterable.

@goodboy
Copy link
Contributor Author

goodboy commented Nov 13, 2017

@nicoddemus ahh yeah I was testing it the opposite way...
I was benching calling a generator which skips yielding as opposed to just embracing it and capturing the output in a list. Your way is better.

The only thing I wonder is how does it perform under different numbers of hooks/wrappers as in the benchmark test suite. I think we should try that out first while there's separate implementations.
You can make the changes in testing/benchmark.py if you're so bold ;)

@RonnyPfannschmidt
Copy link
Member

i would like to bring attention to the fact that hookwrappers are unable to semantically correctly operate if we stream partial results out without giving them a chance to operate/alter the stream

@nicoddemus
Copy link
Member

i would like to bring attention to the fact that hookwrappers are unable to semantically correctly operate if we stream partial results out without giving them a chance to operate/alter the stream

Hmm indeed that is a problem, because the post-hookwrappers can't alter the outcome, because the outcome has already been processed by the caller.

Not sure what we should do here, it seems to be a big blocker.

@RonnyPfannschmidt
Copy link
Member

i beleive for this feature we have the choice between massive cost in terms of conceptual complexity for implementation or glaring conceptual holes in the api

i leaning towards avoiding it due to inconsistency

@goodboy
Copy link
Contributor Author

goodboy commented Nov 13, 2017

Hmm indeed that is a problem, because the post-hookwrappers can't alter the outcome, because the outcome has already been processed by the caller.

@RonnyPfannschmidt @nicoddemus yes agreed. This is why I was implementing it separate from the main hook call set. I think @RonnyPfannschmidt is right you can't combine to the 2 ideas which makes me wonder if the only way we could support it is by having seperately marked iterable hooks which are also called from a different _HookRelay. We can support iterable hooks but wrappers in that case aren't a thing?

@fschulze
Copy link
Contributor

For the use cases I have in mind, making the hook explicitly iterable would be fine, if not even preferred. If such hooks then don't support hookwrappers would be fine by me.

@nicoddemus
Copy link
Member

nicoddemus commented Nov 13, 2017

From a user standpoint, we would mark it as:

    @hookspec(iterable=True)
    def myproject_get_password(self, arg1, arg2):
        """
        """

?

@goodboy
Copy link
Contributor Author

goodboy commented Nov 14, 2017

@nicoddemus yeah I think that could work.
The question is do we just call such hooks the same way as others - pm.hook.myproject_get_password()? I guess that's fine yeah?

@goodboy
Copy link
Contributor Author

goodboy commented Nov 14, 2017

@nicoddemus, @RonnyPfannschmidt @fschulze just had a convo with @vodik about this and I think we could make an alternative version of what a wrapper is for iterable hooks that's even more powerful.

Instead of having setup and teardown before and after all hookimpl calls are complete, instead define a wrapper as more of an intercepting generator.

That is a hookwrapper is a generator which looks as follows:

@hookimpl(wrapper=True):
def myproject_get_password(arg1, arg2):
    iter_results = yield
    for result in iter_results:
        # intercept and do something with each result
        yield result   # delivers the hookimpl call result to the hook caller (or next hookwrapper in line)
        # do something after the result has been delivered but before the next 
        # hook call has been invoked inside _itercall

Here iter_results is the generator returned from _itercall().

We could also do some even more interesting stuff in py3.3+. We can actually support the original wrapper semantics in a cleaner way with yield from:

@hookimpl(wrapper=True):
def myproject_get_password(arg1, arg2):
    iter_results = yield
    # do the normal pre-call stuff
    try:
        results = yield from iter_results  # yields every hookcall upwards to caller; returns full list of yielded results
    except BaseException as err:
        if raise_on_err:
            raise err
    else:
        # do the normal post-call stuff
        results.clear()
        results.append('my_override_result')
        return results  # this could still be used for the normal (non-iterable) hook calls

Which has the extra nice nicety that we really wouldn't need the pluggy.callers._Result type any more.
We of course would offer these types of iterable hooks (and their corresponding wrappers) alongside the existing API and could encourage migration if we like the new version better?

Let me know what you guys think!

@RonnyPfannschmidt
Copy link
Member

it has a simple question - what does this do to python2.7

@nicoddemus
Copy link
Member

@tgoodlet about your first example, when you say yield result, what does the caller (which is calling the wrapper) does with that result? Replace the original result?

We of course would offer these types of iterable hooks (and their corresponding wrappers) alongside the existing API and could encourage migration if we like the new version better?

At first it does seem the new version is more flexible but I think it is a little more complex to write/understand, so I'm not sure it is better. It might be just a matter of getting used to it though. Don't know, have to think about it a little better.

it has a simple question - what does this do to python2.7

IIRC it should work the same, except yield from would have to be replaced by an explicit loop. Instead of:

results = yield from iter_results 

In py27 one has to write:

results = []
for result in iter_results:
    yield result
    results.append(result)

@tgoodlet please correct me if I'm wrong.

@goodboy
Copy link
Contributor Author

goodboy commented Nov 14, 2017

your first example, when you say yield result, what does the caller (which is calling the wrapper) does with that result? Replace the original result?

@nicoddemus yes exactly. So a wrapper is able to intercept each value before delivered to the caller. This provides a way to execute before/after each iteration as well as modify the value or, not deliver it at all.

For example the wrapper could iterarate all hook calls but not deliver any results up to the caller until it's prepossessed them:

@hookimpl(wrapper=True):
def myproject_get_password(arg1, arg2):
    iter_results = yield
    results = []
    for result in iter_results:
        results.append(result)
    # do some checks over all results
    if 'myresult' in results:
        pass
    else:
        results.append('myresult')
    # do the preprocessing of each result
    for result in filter(preprocess, results):
        yield result   # finally delivered to caller

At first it does seem the new version is more flexible but I think it is a little more complex to write/understand, so I'm not sure it is better.

hookimpl still are iterated the same way. The only difference is that wrappers now must be implemented as generator functions.

IIRC it should work the same, except yield from would have to be replaced by an explicit loop.

Yes of course. I was just pointing out how nice it would look. Also, there's some optimizations python does with yield from which are pretty slick. But yes of course you can do the loop in py2.7.

@RonnyPfannschmidt
Copy link
Member

@tgoodlet as far as i can tell thats a glefully fractioned controll flow using the same structural mechanism for absolutely distinct meanings, im reasonably sure this will break people necks from partial/complete misuse

im pretty opposed to is as of now - if you cant demonstrate a reasonable api for the hookwrappers, just disallow them for now

from my pov this cant possibly have a reasonable api without async await/async for/

@nicoddemus
Copy link
Member

I agree with @RonnyPfannschmidt, my gut feeling is that this is creating a complex control flow which might be problematic in the future (it is a fun thought exercise though).

As @RonnyPfannschmidt said, I think we should go ahead with the original idea of iterable hook calls but without hookwrapper support; if we come up with a nice solution in the future, it is just a matter to allow hookwrappers from that point on.

@goodboy
Copy link
Contributor Author

goodboy commented Nov 14, 2017

@nicoddemus @RonnyPfannschmidt cool I'll see if I can finalize it this week.

@goodboy
Copy link
Contributor Author

goodboy commented Nov 24, 2017

Hey guys I'll crank this out over the weekend but I need #101 to land first to avoid conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants