-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving on Filesystem #77
Comments
I'm not too sure I see how that example you gave would work. My guess is that checking out a full git branch is probably more easy/effective than pulling out a list of files from a specific commit. Overall, I don't see what we'd gain. If you can expose your idea a little more, maybe I'll understand what you're trying to achieve with it. |
Assuming that
Analyzing those in one go, that's a lot of disk writes that are wasting time and I/O. For example here: https://github.com/tomzx/gitter/blob/php-semver-checker/lib/Gitter/Repository.php#L628-L639 Keeping the assumptions in mind, I'm sure running Adding You could even analyze repositories without ever checking anything out, keeping a bare repository which is quite handy. Added bonus: Users can analyze without modifying their working directory at all making it friendlier. Anyway, I'm more interested in allowing to do that by using a tiny Filesystem interface/abstraction so that I'm not forced to work with actual files. |
First, the reason I ask for the list of modified files is not to save time on checking out (which is generally very fast with git). The goal is to avoid READING/SCANNING all the files in the repository, which would take a lot of time and is essentially why this was put in place. On project like Symfony it would save a considerable amount of time from not having to parse ~5k files per compare operation. Your idea sounds interesting in theory, but I'm skeptical we'll see any performance improvement in practice. It also seems to put a lot of burden on us to have to code a system which will do all these git operations while we can "already" just checkout the desired commits. I'm however open to the idea, but I'd have to see it implemented and benchmarked. Although, reading your last sentence, it sounds like you want to replace the |
Well, it doesn't need to be a whole abstraction layer like Flysystem. I'll think about it some more and mess around with the -git version to see what might make sense. |
But by checking them out, you are reading all the files from the repository AND writing them to disk. On a related note: IMO both |
I should probably make it more clear that when I said read/scan, I meant the process of actually opening the file for reading and using I do think you are right that there could probably be some improvement done in order not to scan all the files in the included directory, but only those that are modified. We currently rely on the |
Currently the Filesystem class is only used by JsonReporter.
However I'd like to suggest extracting an interface from it and letting any scanner related classes use that.
I think it will definitely come in handy with the git version. For example you could avoid checking out entire working directories before scanning them. Instead just implement a
GitFilesystem
that has access to a repository and implements at leastread
andfileExists
.That way, scanning huge histories should become quite easy. Use
getModifiedFiles
ofGitter\Repository
to find changed files only. Then useGitFilesystem
to retrieve only modified files.Should there be files that were not changed but still need analyzing (like super classes),
GitFilesystem
can easily fetch them when this is implement: #73I'm not too set on "not checking the repository" out, so an alternative could be letting
GitFilesystem
handle the checkout or clone to a temp directory.What do you think?
The text was updated successfully, but these errors were encountered: