Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add help file to crawl github repos #51

Open
zackees opened this issue Nov 22, 2023 · 6 comments
Open

Add help file to crawl github repos #51

zackees opened this issue Nov 22, 2023 · 6 comments

Comments

@zackees
Copy link

zackees commented Nov 22, 2023

I would love to create a gpt out of a github repo. Can you please add this?

K thx bai

@haydenwhayne
Copy link

haydenwhayne commented Nov 22, 2023

@zackees i found it very difficult to get it to work with github repos, so i actually created my own repo based on this focused on crawling GitHub repos using GitHub's api if you want to check it out https://github.com/haydenwhayne/gpt-github-crawler

@nicholascross
Copy link

I was after the same thing and had similar difficulty, I think my problem was it wouldn't atomically traverse the subfolders, perhaps because the content is not loaded until it is interacted with 🤔🤷‍♂️

I experimented with these selectors #repo-content-turbo-frame, #read-only-cursor-text-area, #repos-file-tree.

Thinking on it a bit more since the repository is fully retrievable perhaps this kind of thing could be done effectively by cloning the repo and then traversing the file system. Perhaps a web crawler is not really required for this.

@haydenwhayne
Copy link

@nicholascross Yep I agree, the github repo I linked above allows for crawling both remote and local repos. This way you can clone the repository if you want and run it in local mode to traverse the file system. This would allow you to still add match patterns so you can specify which file and file types you want.

@nicholascross
Copy link

I found local filesystem crawling has been requested here so maybe go upvote if you want it.

#92

Unlikely to be useful for anyone unless they are a Swift dev experimenting in this space but I ended up going down the local checkout path myself.

https://github.com/nicholascross/SourceCrawler

I found it interesting that once I had the first version of this which used heuristic regexes for type extraction I was able to use the crawling output with a GPT agent to add AST based type extraction using a "third party" library I had no experience with. 🤯

@granmoe
Copy link

granmoe commented Dec 6, 2023

I ended up creating a repo crawler as well. Mine supports either crawling a public repo based on its URL, or crawling the locally checked out repo:

https://github.com/granmoe/github-repo-gpt-scraper

@LanDeQuHuXi
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants