Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add superuser API endpoints to export and import org data #1394

Merged
merged 66 commits into from
Jul 2, 2024

Conversation

tw4l
Copy link
Contributor

@tw4l tw4l commented Nov 20, 2023

Fixes #890

This PR introduces new org import and export API endpoints, as well as new Administrator deployment documentation on how to manage the process of exporting and importing orgs, including copying files between s3 buckets or from an s3 bucket to local directory as necessary.

The import endpoint supports query parameters for ignoring mismatches between the database version in the export JSON and the new cluster's database, as well as for updating org and file storage refs to point to a new storage name.

A sample JSON export is attached. An accompanying zipped directory of S3 objects that match the export is available on request (it's too large to attach here).

1798648a-d717-45e3-a717-23132ed4030b-export.json

I am leaving testing instructions intentionally bare to see if the new docs can stand on their own.

We likely eventually want to move export/import processes to async processes kicked off by the API endpoint rather than handling them within the request/response cycle. I haven't gone down that road yet as I wanted to see how the current implementation fares against existing larger organizations before committing to the additional development.

@tw4l tw4l marked this pull request as draft November 20, 2023 19:30
@tw4l tw4l changed the title Add superuser org API endpoints to export and import org data Add superuser API endpoints to export and import org data Nov 20, 2023
@tw4l tw4l requested a review from ikreymer November 21, 2023 23:15
@tw4l tw4l marked this pull request as ready for review November 21, 2023 23:16
@tw4l tw4l assigned tw4l and Shrinks99 and unassigned tw4l Nov 22, 2023
@tw4l
Copy link
Contributor Author

tw4l commented Nov 22, 2023

Assigning @Shrinks99 for docs copy check :)

Copy link
Member

@Shrinks99 Shrinks99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! Bunch of small changes, one discussion about documentation I have been putting off having! Overall nice guide!

I've committed the navigation to the branch but I think we might need to make some changes after my suggestions are applied / discussed? Ideally this would fit under the Administration section as an article titled "Org Export & Import" or similar, and not have all the administration guides in one document? This would require renaming some files, so I will leave that to you. Should probably move the current admin.md into an "administration" folder and re-title the document accordingly? Hopefully that all makes sense? 🙃

docs/deploy/admin.md Outdated Show resolved Hide resolved
docs/deploy/admin.md Outdated Show resolved Hide resolved
docs/deploy/admin.md Outdated Show resolved Hide resolved
docs/deploy/admin.md Outdated Show resolved Hide resolved
docs/deploy/admin.md Outdated Show resolved Hide resolved
docs/deploy/admin.md Outdated Show resolved Hide resolved
docs/deploy/admin.md Outdated Show resolved Hide resolved
docs/deploy/admin.md Outdated Show resolved Hide resolved
docs/deploy/admin.md Outdated Show resolved Hide resolved
@tw4l
Copy link
Contributor Author

tw4l commented Nov 27, 2023

Docs updates complete:

  • Org import/export moved into an admin directory
  • Language updated to match rest of documentation

@tw4l tw4l force-pushed the issue-890-org-export-import branch from f0d6fbf to 59665a5 Compare November 27, 2023 16:26
Copy link
Member

@Shrinks99 Shrinks99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Docs look good! :D

@tw4l tw4l force-pushed the issue-890-org-export-import branch 3 times, most recently from 14c4561 to be416e0 Compare November 29, 2023 15:41
ikreymer pushed a commit that referenced this pull request Jan 15, 2024
Closes #1434 

### Changes
#### Developer
- Adds the K3S playbook guide to the navigation
- Adds note about restarting MKDocs when adding new icons
- Adds note about concise language to the styleguide ([see previous
discussion](#1394 (comment)))
- Adds a note about noun usage to the styleguide
#### User guide
- Adds tables for archived item and workflow statuses
- Adds custom styles for displaying statuses with their icons like we do
in the app
- Fixes capitalization issues
---------

Co-authored-by: Tessa Walsh <[email protected]>
Co-authored-by: sua yoo <[email protected]>
@Shrinks99
Copy link
Member

@tw4l tw4l force-pushed the issue-890-org-export-import branch from 05d3c4e to 5a412c0 Compare May 6, 2024 18:56
@tw4l
Copy link
Contributor Author

tw4l commented May 6, 2024

Rebased on latest main

@tw4l
Copy link
Contributor Author

tw4l commented May 8, 2024

@ikreymer This will need to be merged prior to org deletion, as the latter is dependent on it. Has been rebased against latest main.

@ikreymer
Copy link
Member

Nice! Great work, with the latest changes, we also need to add pages to it.
One concern is the size of the export file, since it all needs to be assembled in memory (especially with including the page list). I wonder if json is the right format for this. Perhaps we version this with /export/json, in case we want to add other options down the line?

@ikreymer
Copy link
Member

Perhaps we version this with /export/json, in case we want to add other options down the line?

One option we could consider is CBOR via https://cbor2.readthedocs.io/en/latest/usage.html, which seems to support
streaming output (though not necessarily asyncio friendly), but maybe doesn't matter if we're writing one object at a time?
Needs a bit more more thought on how we'd structure the data to be fully streaming download and streaming upload friendly...

@tw4l tw4l force-pushed the issue-890-org-export-import branch from a523d02 to 7ce0658 Compare June 12, 2024 16:05
@ikreymer
Copy link
Member

Or, perhaps we just keep this as is, but consider switching to https://pypi.org/project/json-stream/ without changing the format. But perhaps versioning under /json still makes sense?

@tw4l tw4l force-pushed the issue-890-org-export-import branch from d6ab28a to 618b9c0 Compare July 1, 2024 21:46
@tw4l tw4l marked this pull request as ready for review July 1, 2024 22:54
@tw4l
Copy link
Contributor Author

tw4l commented Jul 1, 2024

@ikreymer This is rebased on main and tests added, with streaming support for import and export! Ready for review.

backend/btrixcloud/models.py Outdated Show resolved Hide resolved
@ikreymer
Copy link
Member

ikreymer commented Jul 2, 2024

Nice work!! Tested with very large export from dev, import locally. It worked, though ran into some minor issues:

  • max scale on import can be lower than on export, so just need clamp it to be MAX_CRAWL_SCALE
  • some crawls were missing crawlerChannel and that caused a validation error, should set it to default on import

@ikreymer
Copy link
Member

ikreymer commented Jul 2, 2024

Also had the local nginx time out on import, but that can be fixed with adding:

      proxy_http_version 1.1;
      proxy_read_timeout 600;
      proxy_request_buffering off;

Maybe should also do that for the ingress (again, this is for larger imports).
Another option is gzip content-encoding for the export, though that can be added later.

@tw4l
Copy link
Contributor Author

tw4l commented Jul 2, 2024

Nice work!! Tested with very large export from dev, import locally. It worked, though ran into some minor issues:

* max scale on import can be lower than on export, so just need clamp it to be MAX_CRAWL_SCALE

* some crawls were missing `crawlerChannel` and that caused a validation error, should set it to `default` on import

Should be good now as of latest commit!

Copy link
Member

@ikreymer ikreymer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work, I think we can finally merge this in!

@tw4l tw4l merged commit f076e7d into main Jul 2, 2024
4 checks passed
@tw4l tw4l deleted the issue-890-org-export-import branch July 2, 2024 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for exporting/importing an org and all its data from db
3 participants