Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup changes table #5

Open
nevf opened this issue Jan 9, 2017 · 9 comments
Open

cleanup changes table #5

nevf opened this issue Jan 9, 2017 · 9 comments

Comments

@nevf
Copy link

nevf commented Jan 9, 2017

The read.me says:

cleanup changes table -> Can only do that after Dexie.Syncable supports the clear flag

Can you provide any more info on this. Is there a Dexie issue on the clear flag etc.

The ability to cleanup the changes table along with any other tables will be important in production with larger databases.

@nponiros
Copy link
Owner

First off: no there is no Dexie.Syncable github issue regarding this flag (at least I didn't see any). The flag is the 4th parameter to the applyRemoteChanges function and it is currently not implemented in Dexie.Syncable. The only "issue" is a TODO in the Dexie.Syncable code and I think a mention in the Dexie wiki.

The main question is: what do we do if a client's revision is too old and the changes table was cleaned? More detailed questions/challenges:

  • How do we detect "revision too old"?
  • Should we recreate the client DB by sending CREATE changes for everything?
  • How do we know which changes the client already has and which not? (And do we even care if we reacreate everything?)
  • How do we know that some changes are not available anymore? (might help answer the 1st question)

@nevf any ideas are appreciated. I would love to have the clear/resync flag implemented but didn't have a good idea on how to do this yet. Maybe we should open an issue on the Dexie tracker. Perhaps @dfahlander already has some ideas regarding this.

@dfahlander
Copy link

dfahlander commented Jan 10, 2017

Just to clarify, changes table is automatically cleared from old changes once they have been synced and accepted by the remote server. It is not there to just clear the _changes table, but to clear the entire database prior to adding server changes.

The 'clear' option was intended to use when the client has been offline for a long time so that the server has discarded the revision that the client refers to (servers may discard old revisions to save space). In that case, the server must send the entire database to the client in "CREATE" changes.

In current status, as the clear flag is ignored, the result will be that the server changes will still overwrite the local data, but if client had objects that isn't part of the server database, those object will still persist in the client, so this could lead to an inconsistent state.

One workaround would be for the syncProtocol to call the following code before calling applyRemoteChanges():

const observableTables = db.tables.filter(table = table.schema.observable);
db.transaction('rw', observableTables.concat(db._changes), ()=> {
    return Promise.all(observableTables.map(table => table.clear())
        .then(()=> {
            // Get local revision after clearing the tables
            return db._changes.orderBy('rev').last();
        }).then(currentRevision => {
            // Reset remoteBaseRevisions to current revision so that
            // client wont re-send the delete changes:
            return db._syncNodes
                .where({url: connectedUrl})
                .modify({myRevision: currentRevision});
        });
}).then(() => {
    // Now call applyRemoteChanges to apply the CREATEs
    return applyRemoteChanges(...);
}).catch(err => {
    console.error (err.stack || err);
});

This transaction will generetate lots of "DELETE" changes in _changes table followed by lots of CREATE changes from server though, but it would be temporary as the _changes table would automatically get cleaned up as no node would refer to the old revision anymore.

@nponiros
Copy link
Owner

@dfahlander thanks for the input. Just to be clear my questions in the previous post are for the changes table in the server. I guess if the client sends a baseRevision greater than null or 0 and the server does not find changes with that revision and the latest db revision on the server is greater than the baseRevision from the client, the server knows that it needs to send CREATE changes for all tables/records in the db. If this is correct the server must trigger a changes event for the socket client to receive with the create changes. The poll pattern case is simple I would say since we can just answer with the create changes.

@nevf
Copy link
Author

nevf commented Jan 11, 2017

@nponiros @dfahlander I don't profess to understand how the ISyncProtocol implementations work in detail. That said I do have a suggestion re. clearing out the changes table on the server. First it should be up to the developer to decide when a client has been offline for too long a period that we won't bother to keep changes any longer.

However once the changes table on the server has been cleared I'd like to see something like "send all server doc's that have been added or updated in all tables since that client was last online" and "get the client to delete doc's that no longer exist on the server".

This will present a much better experience to that user than deleting their entire database and then recreating it from scratch. For example the user has a Tablet that hasn't been used for 4 months, they travel overseas, get online with our app and simply need say 100 new/updated documents vs. deleting their 500MB database and waiting for it to download and recreate on some slow cellular phone data connection.

@nponiros
Copy link
Owner

@nevf well in some sence this is what the changes table does. If a delete change is no longer in the table then how do I inform the client about the deletion? Similar with update. Either we keep some meta data on the server or the client must send its db so we can compare it to the server. Or we send creates and the client db gets deleted as @dfahlander said. But maybe I didn't understand exactly what you mean.

An alternative might be to compact the changes table. So basically not delete changes but merge them (what client/server already do before sending changes)

@nevf
Copy link
Author

nevf commented Jan 11, 2017

@nponiros If all documents include a last update timestamp, then a client can request the server to send all doc's updated since a certain time, ie. since the last connected 4 months ago. This is only needed if the changes table has been deleted.

As for deleted documents the server could keep these forever as they only need to contain a document id, not the entire document, so the space needed to store them would be minimal.

@nponiros
Copy link
Owner

@nevf this is what I did before both on the server and the client (older versions of the sync server and sync client). So if I understand correctly you want to combine revisions with a timestamp. So if the server says revision too old the client sends the timestamp.

Saving the timestamp on the client should not be an issue. We can use the syncnode for that or even do it without syncable in the protocol implementation. But there are a few issues.

  • Even if we have a timestamp we cannot send an update change since we don't know what changed so we would send a create change.
  • We need to save extra data (the timestamp).
  • We need a place to save deleted IDs (I assume that an ID will never be reused).
  • What do we do with any old client changes that were never synced? It might be complex to merge those and resolve conflicts the same way changes in the changes table are merged.

Compacting the changes table would solve the "saving delete ids issue". When we compact we would also merge all updates of a document with the create change. To avoid data duplication we might be able to reference the id of the real doc in the changes table instead of duplicating it. Unless I am missing something compacting the changes would mean that we don't need the clear flag since we always keep all the changes. It is not ideal since we might send too much data if we compact too many revisions but probably still better than recreating the whole client db. One advantage of this approach is that it is server side only. Whoever implements the database connector can decide how to save the changes table. I choose the current approach because it was the easiest.

@nevf
Copy link
Author

nevf commented Jan 12, 2017

@nponiros In my use case document ID's will never be reused as they are UUID's.

From a UX perspective it would be nice if the server could notify (email) users who had been offline for some period and let them know they need to get the specific device/browser back online to ensure their db is efficiently kept up to date. That is outside the scope of ISyncProtocol, but something I may well implement in my app.

If all documents include a last update timestamp, then a client can request the server to send all doc's updated since a certain time, ie. since the last connected 4 months ago.

In this situation the client would either replace their document if it existed, otherwise create a new document.

What do we do with any old client changes that were never synced? It might be complex to merge those and resolve conflicts the same way changes in the changes table are merged.

You could create client doc's that don't exist on the server. If they do exist you could replace the server doc with the client doc if the client doc is newer than the server doc. I haven't yet looked at how Dexie Syncable handles conflict resolution.

HTH a little.

PS. It would be great to get @dfahlander input on this.

@nponiros
Copy link
Owner

If the server already has an object and we don't have information about updates (effectively we have a create change) which will be the case if the changes table is compacted or deleted, the server change wins. Also a client delete will be ignored. So basically with the current conflict resolution any old client changes will be ignored. resolve_conflicts.js implements the algorithm. Syncable doesn't resolve conflicts. Only the server does.

What I want to know is: do we really need the clear flag or would compacting the changes table + efficiently storing the changes of the changes table be good enough?

@nevf Don't forget the database is implementation specific so you can use a more efficient way to store your data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants