Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental re-indexing #7

Open
samheutmaker opened this issue Mar 25, 2023 · 5 comments
Open

Incremental re-indexing #7

samheutmaker opened this issue Mar 25, 2023 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@samheutmaker
Copy link
Contributor

Autodoc should support only indexing files and folders that have changed since the last index. At high-level, I think it looks something like this:

  1. Track the git sha at time of index.
  2. When indexing, compare files at last sha to current repository state.
  3. Calculate which branches have changes.
  4. Re-index changes branches.

If you're interested on this, please reach out.

@slavakurilyak
Copy link

Great progress!

@andrewhong5297
Copy link
Contributor

this should be close now @samheutmaker

@diegofornalha
Copy link

I'm reading the README and asked GPT-4 to help me with improvements, and it returned these adjustments:

Optimize change detection: In addition to using the "git sha," you can explore other ways to track changes in files and folders to make the change detection process more efficient.

Improve the granularity of reindexing: Instead of reindexing all branches with changes, you can identify and reindex only the specific files that have been altered.

Cache storage and reuse of indexing information: To reduce the time and resources required for reindexing, you can cache previous indexing information and reuse it when appropriate.

Integrate with CI/CD systems: Selective indexing can be integrated into CI/CD pipelines so that reindexing occurs automatically whenever there is a change in the source code.

I plan to study a bit more to contribute in a more assertive way.

@dahifi
Copy link

dahifi commented Apr 8, 2023

Regarding CI/CD, I've been using a gpt-cli tool to pipe git diff output and get summaries, and am hoping to build it into a pre-commit hook or CI/CD job as part of the PR process. Using the SHA is a good way to detect changes, we might be able to save token count by doing a diff against the last known commit hash, or a full reindex if it's too much context.

@andrewhong5297
Copy link
Contributor

(Just noting this has already been implemented and the issue should be closed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants