Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add markdown/notion importer #258

Merged
merged 15 commits into from
Nov 7, 2024
Merged

add markdown/notion importer #258

merged 15 commits into from
Nov 7, 2024

Conversation

cloverich
Copy link
Owner

@cloverich cloverich commented Oct 1, 2024

Adds an experimental Notion importer in support of #132 / #134. Once a Notion export is unzipped, use a new button in preferences screen to import the directory and referenced files:

Screenshot 2024-11-07 at 6 32 56 AM

The routine will import all markdown files, re-name and re-structure them to Chronicles format (one-level deep sub-folder for all notes, filename uuid), update note links, and import and update any referenced files

Screenshot 2024-11-07 at 6 33 53 AM

It's listed as experimental because I only tested it against my own export (~400 notes) and haven't worked through all edge cases. It uses staging tables (imports, import_notes, import_files) in a two-pass approach; the database entries can be used to understand what was imported (tracks source path and destination chronicles journal / id), but this is not exposed in the app (Its a SQLite database, easily viewed, see settings file). Once imported, the sync routine is called to synchronize the database with the new files.

Changes:

  • add importer for my notion export
  • add staging tables (imports, import_items, import_files)
  • (refactor) re-name prior importer client to sync client
  • add diff dependency for temporary hacky testing helper
  • allow creating documents without indexing, to support import-then-sync approach
  • types to es2020 for a misc lib function I needed

Todo:

  • Add Notion import in preferences
  • Re-work to import-then-sync approach
  • Re-name journals and notes (remove notion id)
  • Parse front matter if present
  • Maintain created / updated at based on front matter or ctime / mtime
  • Import tags (if in front matter as Tags)
  • Import basic note links
  • Import images / files and update references
  • Track or log failed import items / references

Closes #134

Out of scope:

- add importer for my notion export
- re-name prior importer client to sync client
- add diff dependency for temporary hacky testing helper
- allow creating documents without indexing, to support import-then-sync approach
- types to es2020 for a misc lib function I needed
- add table to track import items and import links to support importing in steps
- resolve notion note links while importing; if good update them to chronicles format so they work as note links

very hacky / messy
- re-work tests to separate known issues with my current Notion export, vs hypothetical ones and non-Notion import issues
- handle remaining cases
- clean-up prasing related logging
- improve name-generation by using just folder name, and allowing root folder to be a journal name; slicing to handle length. Tons of edge cases but pretty decent overall
- now referenced files are imported to the _attachments directory
- import_item status is now updated to help with debugging

next steps are to debug some lingering failed imports (~5/200 notes), then track all links to confirm everything is imported (or not) as expected, then move on to clean-up
- fix some missing awaits that resulted in race conditions; all notes now import with note and file references
- add basic status tracking on import items; very messy

needs refactoring and cleanup
- if file link points to a url, dont try to import it (as a file); ignore instead
- if file link has query params (e.g. ?size=800), strip them

Fixes about 25 file import errors (all valid files now import)
- move faux tests into own routine, add button for it
- move front-matter code into isolated module
- light clean-up re-naming part 2 / ?
- use a staging table for file references when importing; its slower but easier to debug validate
- remove import links table / tracking, unused at this point
- some misc. clean-up (still v. messy, but improving)
- move all db code to knex
- simplify note links updating
- organize note links / file links logic so its grouped in importer
- clean-up many comments
- drop error tracking on import items except for final step
@cloverich cloverich merged commit 4fc3ae7 into master Nov 7, 2024
2 checks passed
@cloverich cloverich deleted the import_markdown branch November 7, 2024 14:48
@cloverich cloverich mentioned this pull request Nov 24, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Experimental Notion import
1 participant