Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pain-points of the new Operation hooks #482

Closed
6 of 8 tasks
JonnyBGod opened this issue Mar 4, 2015 · 33 comments
Closed
6 of 8 tasks

Pain-points of the new Operation hooks #482

JonnyBGod opened this issue Mar 4, 2015 · 33 comments
Assignees

Comments

@JonnyBGod
Copy link

I would like to point out that the new operation hooks implementation is completely useless for several reasons.

  1. The "after delete" hook is useless without passing the deleted document. I would like to know a single use case for this hook without the deleted model. All "after" hooks need the manipulated document!
  2. The "before" hooks need again some way to ALWAYS get the instance. What is the point on having the "where" clause if this is not consistent? Should we have to account to all infinite possibilities?

Really... Please study long existing implementations for this kind of behavior (try mongoose) and just copy them.

DO NOT DEPRECATE old code before the new one is COMPLETELY tested! And I mean MONTHS of production test...

One last advice that has come to my mind on many occasions with strongloop. Do proper real life use case testing on all your products. If you keep using simpleton use cases you will continue making this kind of mistakes over and over again.


We are gathering list of uses cases to get a better understanding of what LoopBack users need from hooks, please add your use case to the list if it's not there yet.

https://github.com/strongloop/loopback-datasource-juggler/wiki/Use-cases-for-intent-based-CRUD-hooks.


List of specific known issues:

@fabien
Copy link
Contributor

fabien commented Mar 4, 2015

Regarding 2. I disagree - this is not always possible. However, whenever operating on an instance, it should definitely be available. This PR fixes the most glaring issues:

#483

@JonnyBGod
Copy link
Author

Nice now we are going in the right direction. Could you go a bit into more detail on which cases where it is not possible to have the instance?

@fabien
Copy link
Contributor

fabien commented Mar 4, 2015

Any operation that operates on multiple records at the same time, would not have the instance(s) available. While it would be possible to fetch from the db to provide the instance in all cases, it would not be efficient and very wasteful.

See the last comment here: #481

This is what I propose to handle such scenario's when needed.

@Jeff-Lewis
Copy link

IMHO, I feel SL has huge potential and I like where it's going, although I get a sense that some of the work, especially on design/discussion of events on model changes, we are re-creating the wheel. ORMs have been around for decades now and there are many very popular ORMs for different platforms that should be used to have great influence over LB-juggler's design.

Taking a couple other ORMs couple for example, see their implementation of model events.

Doctrine Lifecycle Events

/** @PostRemove */
    public function postRemoveHandler(User $user, LifecycleEventArgs $event) { // ... }

In both cases you get passed a LifecycleEventArgs instance which has access to the entity and the entity manager.

Doctrine-mongodb Events

 public function postRemove(\Doctrine\ODM\MongoDB\Event\LifecycleEventArgs $eventArgs)
    {
    }

Laravel Events

  • Model Event:
User::deleted(function($user)
  • Observers: public function deleted($model)

Waterline Callbacks

afterDestroy: function(deleted_record, next){
    Cache.sync(next);
  }

Bookshelf.js Events

"destroyed" (model, resp, options) — after a model is destroyed.

sequelizejs Events

User.beforeCreate(function(user) {
  if (user.accessLevel > 10 && user.username !== "Boss") {
    throw new Error("You can't grant this user an access level above 10!")
  }
})

Bulk Events for @fabien:

Model.beforeBulkCreate(function(records, fields, fn) {
  // records = the first argument sent to .bulkCreate
  // fields = the second argument sent to .bulkCreate
})

Model.beforeBulkDestroy(function(whereClause, fn) {
  // whereClause = first argument sent to Model.destroy
})

NHibernate Intercepter Interface

public void OnDelete(object entity,
                             object id,
                             object[] state,
                             string[] propertyNames,
                             IType[] types)

LLBLGen EntityValidation

/// <summary>
    /// Method to validate the containing entity right before the save sequence for the entity will start. LLBLGen Pro will call this method right after the
    /// containing entity is selected from the save queue to be saved.
    /// </summary>
    /// <param name="involvedEntity">The involved entity.</param>
    public override void ValidateEntityBeforeSave( IEntityCore involvedEntity )
    {
        CustomerEntity toValidate = (CustomerEntity)involvedEntity;
        if( toValidate.BillingAddress == null  )
        {
            throw new ORMEntityValidationException( "BillingAddress can't be null", toValidate );
        }
        if( toValidate.VisitingAddress == null )
        {
            throw new ORMEntityValidationException( "VisitingAddress can't be null", toValidate );
        }

        base.ValidateEntityBeforeSave( involvedEntity );
    }

SqlAlchemy Events and here

from sqlalchemy import event

# standard decorator style
@event.listens_for(SomeClass, 'after_delete')
def receive_after_delete(mapper, connection, target):
    "listen for the 'after_delete' event"

    # ... (event handling logic) ...

Standing on the shoulders of giants

For much of the 90's I saw project after project deal with the object/relational mapping problem by writing their own framework - it was always much tougher than people imagined. Usually you'd get enough early success to commit deeply to the framework and only after a while did you realize you were in a quagmire - this is where I sympathize greatly with Ted Neward's famous quote that object-relational mapping is the Vietnam of Computer Science[1]. - Martin Fowler

Last thought

The ORM becomes the foundation for the entire application. Design of its interfaces and features even at this early stage should not be taken lightly. I apologize for ending with a negative fear inducing thought; I feel the power of the SL Process Manager won't matter much if developers are walking away from the ORM.

@fabien
Copy link
Contributor

fabien commented Mar 4, 2015

Thanks for gathering all this info. However, to me it only illustrates that every ORM is doing it differently, and rightly so - each has its own way of interacting with the underlying datasource and such (not to mention the different platforms and languages you've gathered). Quite some time ago we first discussed this, and did look around for existing solutions - but those were already Node/JS based event/publish/subscribe-like solutions.

I think SL is on the right track with this, it just needs to be fleshed out some more.

@Jeff-Lewis
Copy link

The similarity in some of these examples is that the Model is passed into the eventhandler on the post-delete event, which adds some color to @JonnyBGod's point. The one exception is sequelize's beforeBulkDestroy where it passes in the where clause which LB does do it appears.

Its been common for other orms to separate operations that do not work on Models in memory as 'Batch' or 'Bulk' operations with their own event handlers for more clarity of purpose of the api for the developer. It appears the first cut of the new event hooks push both in-memory model and database record/doc change events into the same hooks. Is that correct?

Btw, I just re-read the docs on Operation Hooks and I think it looks great for single model event instance event and observers.

@JonnyBGod
Copy link
Author

I am not an expert on the matter of implementing an ORM, but from a user perspective I have to press the point that the returned data must be consistent.

The problem with the "where" clause approach is that it is not consistent. ie: if you update by id you get and id but if you update by "path" you get a path. To account for this inconsistency we would have to create very static logics.

A possible solution that I thought of would be to return an array of "ids" for all manipulated documents.
This way you would have a consistent way of knowing and if needed getting these documents.

Of course for "after delete" events the full documents must always be passed.

@fabien
Copy link
Contributor

fabien commented Mar 4, 2015

@JonnyBGod the where approach is the most efficient - we don't have all the affected id's available for all the operations. I think you're missing the point with this in regard to all the different kinds of operations available, notably those affecting more than one object. You definitely don't want to load all of those (or fetch their ids) for each hook.

Also, I think having a single handler function adapt to single as well as batch operations makes the API in itself, the declaration of the hooks, very consistent. The availability of context.instance is a flag in itself, denoting whether it's a single or a batch operation.

@Jeff-Lewis
Copy link

@fabien If having only a single handler for both single and bulk operations is the way it's going to be, then can we please update the docs to specifically state that certain bulk operations will not pass in an instance? It probably would have saved @JonnyBGod some frustration and many other developers who come from other ORMs.

Something along the lines of ActiveRecord's docs section #5 on "Skipping Callbacks"

@JonnyBGod
Copy link
Author

@fabien ok I realise the problems you mention and believe we can cope with that. Will wait for #483 to check all our use cases.

@Jeff-Lewis Yes better explicit documentation would help in times like this.

@Jeff-Lewis
Copy link

The most common use cases I've seen for model events are for validation, caching and auditing. If any app needed cache consistency or confidence in auditing of all changes and deletes to all docs, what model operations would an app need to avoid or be limited to use?

For reference and to get a sense of demand on the topic, here are some MongoDB/Mongoose plugins that were built to audit and track changes to docs, although I'm not familiar with all of their features.
Document versioning with MongoDB
mongoose-history
mongoose-version
mongoose-trackable

Also, this topic and the developer attitudes towards it is not unique. Please read this and this.

limbo written specifically to get around this issue stated above in Mongoose.

There's a lot of material out there regarding this topic and plenty of examples of it's implementation. SL-LB has an opportunity to learn from them and be as good or better given that it's a green-field app.

@fabien
Copy link
Contributor

fabien commented Mar 5, 2015

@Jeff-Lewis http://docs.strongloop.com/display/public/LB/Operation+hooks already states exactly what the context will be, for each of the two scenarios (single/multiple). Unlike the Rails docs you mention (perhaps I missed something obvious there).

Finally, I really don't see where all the complaints are coming from - we've already came a long way from having limited, simplified hooks that you had to implement, override and/or monkey-patch into a chain yourself.

Also, did you see #483 ? Based on roughly 25 real-world demands (in plugins ~ mixins) I developed internally over the past half-year, I've set out to resolve any remaining issues. This includes versioning, auditing, i18n/translations, cascading deletes, file uploads ... I was able to pull these off with the old, very bare-bones hooks, and with the new operational hooks they will be much more reliable, because they can also act upon batch operations. Before, we had to mix events like deleted with hooks, and it only worked on single instances.

I'm in the middle of updating these mixins (see strongloop/loopback-boot#79 for a related discussion - way overdue, so hopefully we'll merge this soon), and I'll aim to improve the operation hooks along the way.

To be clear, I don't think it makes sense to have actual instances or the affected ids available to any of the hooks in batch mode. You'll have to implement this yourself, if you need this. Check this proposal for an example of how this would work:

#481 (comment)

(the after delete hook is somewhat special, because you cannot use the where clause to get back instances to work with anymore)

So yes, I agree with you and @JonnyBGod - the first implementation of operation hooks does not fully adhere to what's needed in real-world scenarios, but the basis of the implementation (observe and notifyObservers - which you can also use for custom hooks) is highly flexible, while keeping with Node's async nature.

@bajtos
Copy link
Member

bajtos commented Mar 5, 2015

The current design is the result of several iterations on how to keep the hooks simple while supporting all DAO methods provided by LoopBack, see #403 for more details.

The Operations hooks are not panacea, there are situations where a different tool makes more sense -see #397.

The "after delete" hook is useless without passing the deleted document.

In short, because the database drivers don't return the list of deleted ids, it's not possible to provide this information in a generic way without a significant performance overhead. Please join the discussion in #474.

The "before" hooks need again some way to ALWAYS get the instance. What is the point on having the "where" clause if this is not consistent? Should we have to account to all infinite possibilities?

Consider the following call: MyModel.updateAll({ color: 'red' }, { color: 'green' }, cb). There is no "instance" to be passed to the hook. We could convert the query { color: 'red' } into a list of instance (ids), but that would incur a high performance cost that all LoopBack users would have to pay, even if they don't need the list of instances.

There is one case where we have the instance but are passing the pair where+data to the hook instead: updateAttributes of a single model instance. Please read and join the discussion in #481.

Please study long existing implementations for this kind of behavior (try mongoose) and just copy them.

If you can make your suggestion specific and actionable then I am happy to consider it. A pull request would be even better.

DO NOT DEPRECATE old code before the new one is COMPLETELY tested! And I mean MONTHS of production test...

This is a chicken-and-egg problem. If we don't deprecate the old code, then people don't have any reason to migrate to the new code and thus nobody is going to test the new code.

Deprecations is not removal, just a notice. You can continue using deprecated hooks, they are not going away at least until 3.x is released. You can suppress the warnings by setting NO_DEPRECATION=loopback-datasource-juggler in your environment, see depd's READM.

@bajtos
Copy link
Member

bajtos commented Mar 5, 2015

I wrote my comment above based on the issue description, I did not notice the discussion already started while I was thinking and writing my response.

Thank you all for taking the time to write the comments, your feedback is important to us. Also thank you @fabien for clarifying the common misunderstandings.

We all want to make the new operation hooks work well for as many common scenarios as possible.

I am supper happy that you @fabien managed to convert almost all of your mixins to use these new hooks and that it allowed you to simplify your implementation (if I read your comments correctly). For me, it is confirming that the new hook design is going in the right direction.

Now let's focus on fixing any specific use cases which are not supported by the new hooks. We already have #481 for updateAttributes, #474 for passing the list of deleted ids/instances to "after delete" hook.

Is there anything else? Please open a new github issue for each use-case that's not supported well by the current implementation and leave a comment here linking to the issue created.

@fabien
Copy link
Contributor

fabien commented Mar 5, 2015

@bajtos to clarify, I am in the middle of converting a massive list of mixins. Fixing the first ones lead me to provide the PR, however incomplete it now seems. I am determined to fix this ASAP. Because I need this done .... today!

And yes, I really like the new implementation - thanks for providing this!

@bajtos
Copy link
Member

bajtos commented Mar 5, 2015

Another enhancement that was not captured by a standalone issue yet: #484 Shared state for hooks invoked for a single operation

@bajtos
Copy link
Member

bajtos commented Mar 5, 2015

@fabien I am determined to fix this ASAP. Because I need this done .... today!

I'll do my best to review your changes to the point when we both agree the patch is ready to be merged. However, considering how opinionated this issue and the discussion is, I really need to hear from @raymondfeng and @ritch before we can land the changes.

@bajtos
Copy link
Member

bajtos commented Mar 5, 2015

Added: strongloop/loopback#838 Add ctx. isNewInstance to "before save" and "after save" hooks

@bajtos bajtos changed the title New observer hooks are completely useless! Pain-points of the new Operation hooks Mar 5, 2015
@bajtos
Copy link
Member

bajtos commented Mar 5, 2015

Added #480 Embedded relations don't invoke operation hooks

@fabien
Copy link
Contributor

fabien commented Mar 5, 2015

@JonnyBGod check #488

@bajtos
Copy link
Member

bajtos commented Mar 6, 2015

Added #487 updateOrCreate/upsert does not set ctx.instance

@raymondfeng
Copy link
Contributor

@bajtos
Copy link
Member

bajtos commented Mar 9, 2015

I have a plea to anybody following this issue: please check the wiki page linked by Raymond and do add your use-case if it's not described yet. We would like to make hooks easy to use for as many users as possible, which we can't do hearing from you, the users of this feature.

Please be as specific as possible, include a code snippet of your current implementation where possible.

@bajtos
Copy link
Member

bajtos commented Mar 9, 2015

@raymondfeng Hash the password for the user record.

IMO there should be a different way how to implement this feature that won't involve hooks, see #441 (comment)

@Jeff-Lewis
Copy link

Guys, thanks for opening it up like this. Hopefully we'll get some useful feedback. I'll try to add some cases. Although I have a small app already in production using SL, the very first iteration does not use hooks. I have other apps that use hooks of other ORMs which I would like to port to SL which is part of the reason why I've been following these changes.

As mentioned earlier, the most common use cases I've used and relied on hooks from ORMs for the last 15+ years fall into 3 buckets:

  1. Auditing
  2. Caching and Data Sync
  3. Validation

Auditing is one that I feel can often be under appreciated by developers of B2C apps. For B2B and internally built enterprise applications, well instrumented auditing is usually the gem that keeps an application living in the enterprise past the typical 5 year life span of enterprise apps. It provides clarity and transparency to the data and data is always king.

One simple use case is a CRM, many contacts are shared by many users. Any good CRM is going to track and report on all changes to contact data when, by whom, etc.

Without going into specifics, I feel some of the pain in determining optimal design and with the PR @fabien is working on, is because we are trying to push to much functionality and different behaviors into a over-simplified ORM API interface. Creating a simple API is a noble exercise as we all know KISS is best, but in this case, I feel simplicity of function purpose and intent is more important and would provide easier adoption and usage of LB ORM API.

For example, many other ORMs have in-memory model instance operations and bulk-operations (ones that operate directly on the database). Many developers are used to working with instance Models/Entities with a single, simple interface, regardless if the operation is taking place on one or many model instances. Yes, CRUD with thousands of in memory instance models in one operation is going to perform like a dog, but that's what experienced ORM developers a used to. In these cases, they usually switch then to the bulk operations knowing for full well the difference in behavior and how the bulk operations need to be handled differently in their application. In some way, it's analogous to a 'basic' and 'advanced' version of the ORM API.

Separating in this fashion allows developers to adapt and use the in memory model instances for 80% maybe 90% of their app. Then the remaining 10-20% the developer can use bulk (DB Direct) operations to optimize performance when needed.

Thus I feel separating the CRUD and Operation Hooks into (In-Memory) Instance vs (DB Direct) Bulk operations makes the most sense.

@bajtos
Copy link
Member

bajtos commented Mar 11, 2015

The following issues have been fixed, though not all of the changes were released yet:

@doublemarked
Copy link

I appreciate the progress being made with the hooks. I want to throw something small into this discussion regarding after delete which may or may not have been brought up in one of these issue threads.

The database won't don't give us much information after a delete, but it typically at least gives us the count of records removed. It seems natural that this information would make its way to the after delete hook, if not also to the model callback. Currently the connector is passing it back but it's dropped somewhere in dao.js.

Beyond that, I'd like to give a hearty +1 to what @Jeff-Lewis wrote. It feels like too much is being crammed into one place, and perhaps in a bit of a rush. I also concur, for a large majority of the code being written, performance is less important than data integrity and avoiding unlikely but easy to overlook race conditions.

A separate, lower level interface for bulk or special operations would be a reasonable solution, accompanied by clear documentation regarding the limits of the general purpose interface.

@bajtos
Copy link
Member

bajtos commented Mar 18, 2015

Few thoughts to shed more light on the current design and issues in the juggler implementation.

1) SQL transactions versus NoSQL

Most ORMs are depending on SQL transactions to provide atomicity of operations. For example, an update operation may be implemented like this:

BEGIN TRANSACTION
  SELECT ... FROM ... // (1) load the model instance
  // (2) user code modifies the instance
  // and calls save()
    ORM HOOK "before update" // (3)
    UPDATE ... // (4)
    ORM HOOK "after update" // (5)
COMMIT TRANSACTION

Because all steps 1-5 are running in the same transaction, the database guarantees that no other client will modify the row while our operation is in progress.

The situation is very different in the NoSQL world. There are no transactions and each database provide different set of tools to deal with that, for example MongoDB has atomic field operations like $inc.

LoopBack and loopback-datasource-juggler provide a single ORM-like API for both SQL and NoSQL databases, thus many compromises must be made to cater to the needs of these different worlds.

2) Richness of the data-manipulation API provided by default

By default, each LoopBack model provides (and exposes via REST) a rich API mixing both pure-ORM-like methods like "create", NoSQL-like partial updates like "prototype.updateAttributes", complex atomic operations like "updateOrCreate" and batch operations like "updateAll".

The old hooks like beforeCreate and beforeUpdate were called only by the pure-ORM-like methods, to a big confusion of loopback users, who were expecting that "updateOrCreate" or "updateAttributes" should call these hooks too.

And I agree with that, a generic hook like "beforeCreate" or "before save" should be called by all public methods that create new models and that are exposed by default.

That's why I came with the design of more generic intention/operation hooks that are guaranteed to be called whenever the intention/operation is executed. The complexity of these hook is mirroring the complexity of the data-access API we are providing out of the box.

3) The ORM-like subset of the API is incomplete

Ideally, if LoopBack was providing a full ORM API, applications that don't care about partial/batch updates could write hooks in the following way, effectively disabling these non-ORM methods:

if (ctx.instance) { /* run the hook */ }
else return next(new Error('Partial/batch mode not supported.'));

Unfortunately, we don't have a method that would update a whole model instance, therefore this approach is not feasible at the moment. Both prototype.updateAttributes and updateOrCreate are performing partial database updates only. Therefore we cannot provide a model instance to hooks, only the partial data supplied by the caller.

If there were methods like prototype.replace and replaceOrCreate, then these methods could pass ctx.instance to the "before save" hook and the approach outlined in the code snippet above would be feasible.

@bajtos
Copy link
Member

bajtos commented Mar 18, 2015

The database won't don't give us much information after a delete, but it typically at least gives us the count of records removed. It seems natural that this information would make its way to the after delete hook, if not also to the model callback. Currently the connector is passing it back but it's dropped somewhere in dao.js.

See #511 and #321.

@vkarpov15
Copy link

@Jeff-Lewis I haven't been following this debate closely, but just FYI re: the mongoose issues you referenced, mongoose will NOT pass the affected document to pre('update') or post('update') hooks. These middlewares will run with the Query object as the context and post('update') will get the update response the db sent back, which doesn't include the doc. Mongoose uses the doc for middleware associated with instance functions, like save(), but for update() the doc(s) being updated may not even be in the server's memory and middleware shouldn't be doing an extra query.

@bajtos
Copy link
Member

bajtos commented Apr 8, 2015

Added few more items to the list:

Note that ctx. isNewInstance has been rolled out in loopback-datasource-juggler, now we need to implement support in all connectors. Are there any volunteers willing to help us?

@bajtos
Copy link
Member

bajtos commented Sep 16, 2015

FYI: #441 "loaded" a.k.a "after access" hook and #559 "persist" a.k.a. "afterValidate" hook were implemented quite some time ago.

@ritch
Copy link
Contributor

ritch commented Nov 29, 2016

Closing this as we've created other issues to track the work.

@ritch ritch closed this as completed Nov 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants