Use cache before running "go list" #1511

linzhp · 2019-12-24T01:16:29Z

Is your feature request related to a problem? Please describe.
When Athens handles "list" or "latest" commands, it always runs "go list -m -versions" to get the list of version (https://github.com/gomods/athens/blob/master/pkg/module/go_vcs_lister.go#L44). The "go list" command in turn will reach out to the VCS to get the list. This is not optimal because:

It doesn't provide protection in case the VCS is down.
It introduces unnecessary traffic to VCS. Versions are released infrequently, in the order of weeks for most packages. Most VCS calls to get the list of versions will return the same result in between.
It slows down the performance of Athens. Git ls-remote operations are often slow.

Describe the solution you'd like
We can have Athens to cache the list of version for each package. Upon receiving a list request for a specific module, Athen will always look up the version cache first. It will return the list from cache if the list of versions of that module is updated recent enough. Athen only run go list -m -versions if there is no version cache for that module, or that cache is older than a configurable age. Getting the list of versions can be configured as an async operations, i.e., it happens after returning the old list from the cache. When a VCS server is down, Athens still serves the reasonably up-to-date list of versions to the client, with logging of the unavailability of VCS.

Describe alternatives you've considered
Alternatively, we can populate module cache for all semantic versions of a module that "go list -m -versions" returns, and record the timestamp of the most recent update for the module. Then storage.List() will serve as version cache. The downside is the module cache may store some versions that users never requested.

Additional context
Privately hosted VCS may not be able to handle as much traffic as public hosting sites like Github. In addition, we need protection from any VCS issues.

We may help implementing this feature.

The text was updated successfully, but these errors were encountered:

marwan-at-work · 2020-01-10T17:06:58Z

@linzhp that sounds good to me.

A couple of implementations I have in mind:

The cache can live in the storage itself
The cache can just be in memory using something like https://github.com/dgraph-io/ristretto (since we can memory-bind it)
The cache can be a new storage type that is cache-oriented such as Redis and group/mem-cache

Option 1 is the most straight forward but requires updating our storage interface and all of its implementations.

Option 2 sounds the easiest and also logical since list-data is discardable. However, memory management becomes an issue.

Option 3 is probably my least favorite although I'd be curious to know if anyone thinks that it has some benefits that are not obvious.

xytan0056 · 2020-01-14T00:22:43Z

The cache can just be in memory using something like https://github.com/dgraph-io/ristretto (since we can memory-bind it)

One downside of this is it won't work in distributed environment. Because memory cache doesn't consult peers, there's a chance that users see different results of the same request.

linzhp · 2020-01-14T14:49:56Z

That's a good point. Also Option 2 means each node needs to request a copy of the list from VCS, which involves more VCS traffic than Option 1. However, updating the storage interface and all its implementations sounds scary...

marwan-at-work · 2020-01-15T00:00:11Z

One thing to realize is the following:

The "go list" command only runs in two endpoints: <import-path>/@v/list and <import/path>/@latest.

In other words, if you have a resolved go.mod file and you run a go build/get/etc, the go list should never be called.

Those two endpoints are only ever called when you are onboarding a new module to your system.

In a CI/CD environment, go list might only be called if you're installed a go binary on the fly, something like go get github.com/static/analysis/vet -- But IMHO, even in those cases you should probably just use a specific version such as go get github.com/static/analysis/[email protected].

Therefore, the remaining use case, as far as I can see, is when you're adding a new module during development and in that case you might want to have the most up to date list and not an outdated-but-cached list.

So this has been mainly the reason go list is not being cached yet. At least personally, I rarely need go list to be cached and by the time I do need go list, I certainly hope that it's not cached.

Furthermore, the TTL on go list has to considerably short (nothing more than 5 minutes), because it can be a really annoying experience to introduce a new version of your module, and not have it show up for more than 5 minutes.

I do recognize that it's a slow operation and caching it can lead to a nicer experience. But I'd love to hear a compelling reason for why speeding up this endpoint is a significant win

linzhp · 2020-01-15T00:39:18Z

A bit context:

We generate Go code on the fly at build time. When we generate Go packages from the proto files of Apache Mesos , the import path will become github.com/apache/mesos/include/mesos. With Go 1.12, go mod tidy will download the latest version of github.com/apache/mesos without adding it to the go.mod file, because it doesn't contain any Go files. Next time we run go mod tidy, it still looks for the latest version. As a result, Athens will keep calling the go list again and again. In order to check the sanity of our go.mod file, we run go mod tidy on CI jobs, resulting a large traffic to the VCS.

With Go 1.13, go mod tidy will return an error for github.com/apache/mesos/include/mesos. We have to figure out a solution to handle these generated packages. So go mod tidy will no longer requesting latest version of github.com/apache/mesos. We can wait until we upgrade to Go 1.13 and deploy our solution, and monitor the VCS traffic.

Furthermore, the TTL on go list has to considerably short (nothing more than 5 minutes), because it can be a really annoying experience to introduce a new version of your module, and not have it show up for more than 5 minutes.

A new commit will take quite a while, maybe an hour, to be available from proxy.golang.org though. We can also make this cache feature configurable, so users can turn in off in their Athens instance

marwan-at-work · 2020-01-15T00:51:19Z

@linzhp thanks for the context. The go mod tidy situation you described is very interesting. I don't have a full grasp of what's going there but I'd love to dig a bit more, probably outside of the context of this issue.

But to stay on topic, yes I imagine the cache will be configurable and most likely opt-in. What we can do is introduce it as an optional interface, similar to how http.ResponseWriter might also be an http.Flusher.

This way, we can check if a storage does implement a ListCacher and if so, we use it. If not, and the user has opted-in, we can fail or warn.

avivdolev · 2020-01-21T10:56:42Z

Another consideration - completely offline environment.
We described it here #1506

This way, we can check if a storage does implement a ListCacher and if so, we use it. If not, and the user has opted-in, we can fail or warn.

@marwan-at-work I think that even without implementing the ListCacher interface-to-be , these endpoints should return the local catalog of versions instead of an error when the VCS is unavailable, or at least have an optional configuration to do so.
That will enable pre-loading the storage manually.
Could be implemented by calling the storage.Cataloger interface (if exist) and manipulate its output?

marwan-at-work · 2020-02-07T19:49:37Z

Sorry everyone, work has been busy. I'll get back to reading and answering as soon as I can.

But I just thought of an idea that would also unblock anyone here that needs "offline mode" or "partial responses":

You can create a side-car GOPROXY that you set "GoBinaryEnvVars" to point to.

The GOPROXY will do 1 of the 2 things:

For offline mode, never run "go list", just return an empty response always. For "go mod download" return an error, because if something is not in storage, and you're offline, there's no way to get something upstream.
For partial answers, always run "go list" but return an empty array if "go list" fails.

Now the question becomes: is that solution too much to implement for users?

@linzhp this should definitely unblock you from using an Athens fork, but would you find that too much to maintain and prefer a config instead?

The implementation for the side-car GOPROXY should be fairly trivial as far as I can tell.

linzhp · 2020-02-07T22:09:02Z

After upgrading to Go 1.13 and properly handling packages like github.com/apache/mesos, we no longer have this performance issue. So we removed our custom logic related to the list API. Closing this one.

arschles · 2020-02-10T23:50:18Z

@linzhp are you able now to get onto the mainline Athens branch?

linzhp · 2020-02-11T00:22:26Z

Not yet. We still have to fork in order to:

work around proxy private ssh-only repo #1450
collect metrics (we may be able to remove this, @xytan0056)
work with our internal storage API (this is less intrusive)

arschles · 2020-02-11T01:57:13Z

Got it. For #1450, I'll have another look and see if we can fix it faster. For metrics, if we can help, let me know. For your internal storage API, in the past we've tried to implement a generic storage driver. One try was based on gRPC and the other was an HTTP based implementation. Would either of those help?

linzhp · 2020-02-11T06:00:52Z

Our internal storage API comes with its own Go client, so we implemented a storage.Backend to call that client and added a case here. I am not sure if there is a way to plugin a new storage.Backend without changing Athens' codebase.

I will work with @xytan0056 to identify gaps on metrics

xytan0056 · 2020-02-11T07:03:03Z

We currently use tally to directly emit metrics data to m3db server. However, the latter doesn't seem to have a compatible exporter. We can probably do some tweaks around our infra though. Need some time to research.

arschles · 2020-02-15T02:13:17Z

@linzhp we have had some folks try to build new & more generic backends so they could also run Athens with their specific backend without recompiling Athens. Tons of reading on this if you're interested: #1110, #1131, #1459, #1130, #1353. Somewhere in one of those PRs, we discussed a gRPC based API that Athens can use to performantly talk to a storage server.

If that would work for you, we can try to make something like that happen. I know that one of those PRs would also welcome it.

linzhp · 2020-02-17T23:33:29Z

I think it's a good idea to have some plugin architecture to support custom storage backends without forking Athens. Keep me posted on the progress

arschles · 2020-02-19T23:54:35Z

@linzhp will do. We would like some details on what specific functionality you would need to be able to use it at your company. For example, if we did a straightforward HTTP API that matches the Go proxy download API, would that be enough for you?

xytan0056 · 2020-02-20T17:52:32Z

@arschles #1131 is actually what we considered previously, however, when calling our internal storage using raw HTTP, we must specify some specific params in header or query param. This involves modifying go code with pieces specific to our company. It's possible to make the header from athens configurable, but I'm not sure how to do that right. (Least is make it customized in toml)
Another approach will be directly using S3 client, but then we'll need to hardcode the credentials in config.toml, which raises security concern.

arschles · 2020-02-21T17:39:22Z

@xytan0056 what if you used the environment variables to configure s3 credentials?

linzhp mentioned this issue Jan 8, 2020

Who's using Athens in production? #1323

Open

linzhp closed this as completed Feb 7, 2020

linzhp mentioned this issue Feb 14, 2020

Availability vs. correctness in list API #1532

Closed

arschles mentioned this issue Feb 15, 2020

Add option to set encryption key for storage #1539

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cache before running "go list" #1511

Use cache before running "go list" #1511

linzhp commented Dec 24, 2019 •

edited

marwan-at-work commented Jan 10, 2020

xytan0056 commented Jan 14, 2020

linzhp commented Jan 14, 2020

marwan-at-work commented Jan 15, 2020

linzhp commented Jan 15, 2020

marwan-at-work commented Jan 15, 2020

avivdolev commented Jan 21, 2020

marwan-at-work commented Feb 7, 2020 •

edited

linzhp commented Feb 7, 2020

arschles commented Feb 10, 2020

linzhp commented Feb 11, 2020

arschles commented Feb 11, 2020

linzhp commented Feb 11, 2020

xytan0056 commented Feb 11, 2020

arschles commented Feb 15, 2020 •

edited

linzhp commented Feb 17, 2020

arschles commented Feb 19, 2020

xytan0056 commented Feb 20, 2020 •

edited

arschles commented Feb 21, 2020

Use cache before running "go list" #1511

Use cache before running "go list" #1511

Comments

linzhp commented Dec 24, 2019 • edited

marwan-at-work commented Jan 10, 2020

xytan0056 commented Jan 14, 2020

linzhp commented Jan 14, 2020

marwan-at-work commented Jan 15, 2020

linzhp commented Jan 15, 2020

marwan-at-work commented Jan 15, 2020

avivdolev commented Jan 21, 2020

marwan-at-work commented Feb 7, 2020 • edited

linzhp commented Feb 7, 2020

arschles commented Feb 10, 2020

linzhp commented Feb 11, 2020

arschles commented Feb 11, 2020

linzhp commented Feb 11, 2020

xytan0056 commented Feb 11, 2020

arschles commented Feb 15, 2020 • edited

linzhp commented Feb 17, 2020

arschles commented Feb 19, 2020

xytan0056 commented Feb 20, 2020 • edited

arschles commented Feb 21, 2020

linzhp commented Dec 24, 2019 •

edited

marwan-at-work commented Feb 7, 2020 •

edited

arschles commented Feb 15, 2020 •

edited

xytan0056 commented Feb 20, 2020 •

edited