-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New storage backend model #1214
Comments
I think the HTTP and Redis backend should also be kept as-is (due to not requiring external libraries), but that the HTTPS and Rediss (TLS) backends would have a different way of loading so that they can depend on OpenSSL and similar libraries... But I thought that it would use There were some thoughts about cmake implementation in #894 (comment) |
Could you expand a bit on why you think that would be a good idea? I'm thinking that it would be better to focus on a unified http+https implementation and a unified redis+rediss implementation. From my point of view, my proposal would solve all issues I'm aware of with the current framework (and I wish that I had thought of that approach in #414). But it's of course so far only an untested idea, so it will need some testing to see if it flies.
Since dynamically loading code won't solve the problem with keeping sessions alive, I don't think that there is a need for a |
I would like to see them "included" by default, otherwise I think they will just be unconfigured and uninstalled... But I suppose that is already happening*, so it wouldn't change much from the current situation either way ? * i.e. when using It seems unlikely that anything will replace NFS, at least for the enterprise.
It seemed like a simpler solution, even if it only solved half the problem (making life easier when not using it) My thinking that there was room for both options, loading some plugins and setting up a storage backend proxy... The current workaround was defining different backends in different binaries. i.e. |
Yes. As long as HTTP and Redis backends are kept in the ccache source tree, http and redis support would be just as enabled or disabled as they are with the current backend model.
Why do you believe that? And do you mean that this has any implications on how non-file ccache backends should work?
OK. I think that sounds more complex than than my proposal, not simpler.
Right. Just to be clear: what I'm trying to describe in this issue is a design that I feel would be a "real" solution, not a workaround. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This connects our CMake builds to a [ccache](https://ccache.dev/) hosted in a GCS bucket. `ccache` newly (ish) supports using remote storage for the cache! Currently it only supports Redis, FTP, and HTTP. HTTPS is *not* supported right now, but there are plans to add an HTTPS backend, as well as potentially a direct GCS backend (see ccache/ccache#1214). I think this adds a little bit of overhead for the network requests, potentially increasing the time for building with a completely cold cache. An example `build_all` job with a completely cold cache took 13.2 minutes for the entire job, 10 minutes for just the build step, of which 6.1 minutes was spent in the actual `cmake --build` command (not including builds of the `install` or `iree-test-deps` targets, which don't involve building C++): https://github.com/iree-org/iree/actions/runs/3562697821/jobs/5984663663 Going through that commit's ancestors on the main branch, this looks like it's adding about 30±30 seconds to the build, using the statistical technique of "eyeballing". We get wins on the flip side though, where with a fully cached build, the times are 6.3m, 3.8m, 1.6m. The impact is even bigger with asan, where we see the same ~50% improvement on the already-slower build. Unfortunately, since ccache is a language-specific cache, we can't do the same trick with all the test artifacts. The lack of HTTPS support does present somewhat of a problem because GCP doesn't allow using unsecured HTTP for many API access scopes. I ran into trouble with this when trying to get things to work locally because the local gcloud credentials for a user account usually have very broad scope (see discussion in ccache/ccache#1001). But it *does* work fine on our GCP VMs since those service accounts have much more limited permissions. Luckily, we don't actually want users writing to the cache, so this mostly just impacted me setting it up. I also tried [sccache](https://github.com/mozilla/sccache), which has a GCS backend, but configuring the backend locally was pretty janky (see mozilla/sccache#144 (comment)). I ultimately went with ccache since it's the much more established project and it seems like there's quite a bit of design work going in to making it work well. ccache also supports two caching layers (indeed this is the standard setup), so devs could make use of the remote cache by setting a single config/env variable to point at it and continue using their local ccache as well. This will of course only work as long as their local machine is sufficiently similar to the docker containers or they choose to build within docker containers. Co-authored-by: Scott Todd <[email protected]>
My suggestion was to use msgpack, which was an alternative to jsonrpc or protobuf.
It didn't require any special library, but a header-only implementation: (see languages) By defining custom serialization for the ccache classes*, it was quite efficient to use.
It seems like using Looking forward to seeing the new implementation, here was the old PoC one that I did: |
Hi! I came upon this issue when thinking about something similar. I recently added support for our internal CDN/Cache to ccache internally, which works just fine - but would never be anything we could or would upstream. There is also a problem with our init cost is pretty high. So I was thinking in line with the suggestion above would solve both my problems:
For windows - it seems like AF_UNIX is supported in latest versions of Windows 10 or Windows 11. That seems like it should be usable, but I have never used it myself. Named pipes are the other option. I can vouch for rpclib though, it was initially developed by a former co-woker of mine and I know it's pretty solid. The upside with this approach is that the socket abstraction is in rpclib instead of having to implement all that, which can be pretty messy. Let me know if there is anything I can do to help pushing this initiative forward. |
A small update from our side here: I wrote a small webserver using httplib that integrates with our internal CDN. With a few simple methods I know have basically what we outlined above but over TCP/HTTP instead of a UNIX socket. This works fine, performance seems to be decent because the big latency in my case will be to pull from the CDN in any case. Just a option to consider since it was really easy to do and a "skeleton" using httplib could easily be developed for this purpose. |
Background
Ccache currently has file, HTTP and Redis remote storage backends. The file and HTTP backends do not depend on any external libraries. The Redis backend depends on the small and ubiquitous Hiredis library. The remote storage backends are part of the ccache source tree and are compiled and linked statically with the ccache executable.
It would be very nice to support more protocols, like HTTPS (#890, #894), Redis over TLS (#902), Azure Blob Storage (#1152), AWS S3 (#1201), Google Cloud Storage (in case the HTTP/HTTPS backend does not suffice) and other cloud services and custom backends.
My approach has been to start out with only bundling backends that have no external dependencies, or only external dependencies that are ubiquitous and small enough. One reason for this is that I want to keep the startup of the ccache executable fast. For instance, linking with libcurl makes the startup a factor 4 slower on my system. Another aspect is that I would prefer not to have to maintain code that I can't easily test myself, such as backends for various cloud services. It would be much better if the people who are interested in a backend are the ones who maintain the code. (It's currently only me who is maintaining ccache and my spare time is not exactly abundant.) And a third aspect is one of distribution: I would like to be able to distribute a ccache package (for instance as part of a Linux distribution) that does not depend on libraries that are not needed for the basic use case (i.e., not using remote storage) and then have support for different remote storage backends in optional add-on packages. This is partly why I have been reluctant to add optional (at compile time) HTTPS support since I want a solution that does not depend on compile-time choices.
Another problem with the current backend framework is that ccache can't keep connections alive and thus not reuse sessions that are costly to set up.
Proposal
As mentioned in #894 (reply in thread), I propose that we make ccache automatically start a long-lived protocol-specific helper process (if not already started) and communicate with it over a Unix socket.
Here is a rough design sketch of how it could work, taking HTTPS as an example protocol:
remote_storage = https://user:[email protected]/path|param=value
.${CACHE_TEMPDIR}/backend-<name>.sock
where<name>
is a unique hash of the URL and applicable parameters.ccache-backend-https
in some (configurable) libexec location. Maybe also check in$PATH
?ccache-backend-https
as a background (daemon) process and passes the socket path, the URL and other configuration as environment variables.Advantages:
ccache-backend-<protocol>
executable, there is no need to install, configure, start and monitor a separate daemon process. Things will Just Work with the sameremote_storage
configuration as before.If this is implemented, the existing HTTP and Redis backends would be converted to the new mechanism. The file backend would still be kept as is.
The text was updated successfully, but these errors were encountered: