Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more VCS (gitlab, gitea, bitbucket) support to Zenith and Daggerverse #7218

Open
grouville opened this issue Apr 29, 2024 · 5 comments
Open
Assignees

Comments

@grouville
Copy link
Member

grouville commented Apr 29, 2024

Problem

Dagger modules and Daggerverse do not work with other SCM than GitHub.

For the past few months, several discussions have occurred around the best way to solve it. This is now a top-priority on the backlog. As no perfect solution exists, your opinion is extremely valuable to assess what behavior you would expect as a user

Context

The current way of calling a module ref is tightly tide to Github. The v1 followed the Go convention, but for speed and ease of implementation, was just implemented with GitHub in mind.

The main reason for that is because, with GitHub, it is easy to differentiate the root of the repo and the following subdir

Since then, a proposal was set for ref v2. This proposal has a wider scope than just the SCM distinction. It:

  • allows other SCM than GitHub

But, also tries to unify/englobe other related open questions:

To my current understanding, handling shorthand refs is the true unlock of the current ref v2 proposal. However, this introduces a new DX for the end user.

Solution(s)

There are currently two potential implementations solving the SCM and the unified DX for passing sub-directories of a git repo:

  1. the ref v2 (and any variation)

For which the full spec is defined here: https://github.com/vito/dagger/blob/refs-v2/rfcs/001-refs-v2.md

  • The main advantage is that it is a proposal with the potential to solve all above issues at once.

  • The main drawback is the introduction of the new UX for ref. We currently have received no complaints / requests for the long ref format, maybe keeping it as is for now is ok.

Some unknowns remain, especially finding an agreement on the previous blocking points:

  • the DX of the separator between the source and the path. In the proposal, it is : //
  • the scope of the implementation: do we want to follow the full spec ? If not, which parts first ?
  1. Piggybacking even more on the Go spec

Another alternative, if the shorthand ref is out of scope here, is to piggyback even more on Go: rely on the regex that Go does + the dynamic discovery (also used for the vanity URLs). It is well explained in https://github.com/vito/dagger/blob/refs-v2/rfcs/001-refs-v2.md#why-not-just-copy-go.

  • The main advantage of this alternative is to extend to new SCM without changing the current DX for refs.

Also, relying on the metadata used for the Go packages seems a sustainable bet. It is important enough for Go and so embedded in the language's logic that it is unlikely to change often.

Once the implementation to know where the split between the source and the subpath is found out, it could also be applicable for the git directory issue above, as proposed by Justin

  • The main drawback is that we don't know (yet) how complicated / feasible it is, and how much maintenance this would add us ? Also, this does not solve the shorthand implementation, and would tackle it in a subsequent iteration.

Related issues / PRs

Open Questions

  • What is your feeling around the current DX to reference a module with a given tag ?
  • Would you be comfortable changing for the proposed DX of the new ref:

We might not need all these, just throwing ideas around:

  • no scheme
  • A ref's scheme may be omitted, in which case it will be inferred based on
    whether the ref's source appears to refer to a remote authority.
    * For a source like github.com/... the remote default scheme will be
    used, e.g. git: or dv:.
  • For a source like foo the local default scheme will be used, e.g.
    file:, though it would also make sense to default this to something
    like oci: for image refs.
  • ., .., ./foo, ../foo, /foo, and foo are all interpreted as
    local refs.
  • file://foo/bar/baz
  • refers to a path on the local filesystem
  • git://github.com/foo/bar//baz
    * refers to the /baz subdirectory of the https://github.com/foo/bar
    repository
  • gh:vito/daggerverse/apko
  • shorthand for git://github.com/vito/daggerverse//apko
  • /apko is inferred as subpath as all GitHub repos are owner/repo
  • gh://git.acme.com/vito/daggerverse/apko
  • shorthand for git://git.acme.com/vito/daggerverse//apko
  • //git.acme.com indicates alternative GitHub host
  • /apko is inferred as subpath as all GitHub repos are owner/repo
  • dv:vito//apko
  • shorthand for git://github.com/vito/daggerverse//apko
  • dv:aweris/gale//gale
  • shorthand for git://github.com/aweris/gale//daggerverse/gale
  • dv://git.acme.com/vito//apko
  • shorthand for git://git.acme.com/vito/daggerverse//apko
  • dv://gitlab.com/group-1/group-2/group-3//apko
  • shorthand for git://gitlab.com/group-1/group-2/group-3/daggerverse//apko

Current status (04/29/2024)

Given the potential of the second solution to be a nice stopgap as well as keeping the DX pretty similar, I'm currently exploring this option the next two days. Basically a POC to see if this option is viable and not a rabbithole

@grouville grouville self-assigned this Apr 29, 2024
@sagikazarmark
Copy link
Contributor

As far as I can tell, the two solutions can co-exist (ie. if there is no scheme provided, use the Go-style automagic).

I like the simplicity of Go, but I wouldn't close the door on the other addressing scheme.

If the short-term goal is to quickly add support for other VCSs, going with the simple solution probably makes more sense, especially if it can be kept backwards compatible with the other one.

One thing that is important to me: whichever you go with, it should be backwards compatible.

BTW I really like the ref v2 proposal, it's well thought out, but it may not be necessary just yet.

@aweris
Copy link
Contributor

aweris commented Apr 30, 2024

Both solutions are valid and also think they can coexist together.

I would also choose solution 2 as low-hanging fruit to support other VCS and put ref v2 to a more long/medium term goal.

@jedevc
Copy link
Member

jedevc commented Apr 30, 2024

Personally, I'm very much onboard with trying option 2 first (with go-spec piggybacking) - and then potentially opening the gates to mod refs with option 1 later. Agreed with @sagikazarmark, I can see these co-existing, it also makes refs v2 way easier, since we wouldn't need some of the syntax to distinguish between projects/paths.

Implementing "something" that does the go-style "splitting" should be pretty easy, we'd just have some function that splits a url into project addr + path parts (and we might just be able to borrow the go impl directly). The problem to me is where such a function needs to live - if we just do this in the CLI/engine naively, then we get some non-optimal caching behavior (where we might need to make multiple HTTP requests to work out where the URL lives).

Ideally we'd have a way of getting this into our custom implementation of a GitSource:

type gitSource struct {
src source.Source
cache cache.Accessor
locker *locker.Locker
dns *oci.DNSConfig
}

That way, we should be able to get some better caching + scheduling behavior - but that could definitely be split out to be a separate piece of work, and doesn't feel necessary to a first pass.

@nipuna-perera
Copy link

I am curious,

Is this #6113 higher priority than this? I ask because 2 reasons

a. Most enterprise dagger use depends on it
b. A lot of other VCS provider code is typically not as public as GH. Enterprises heavily utilize BB, Gitlab etc and they are all behind auth.

I would go as far as to say #6113 is a prerequisite for this work to be effective.

As for the solution, I would like to see an implementation of solution 2 first and then 1.

@grouville
Copy link
Member Author

I would go as far as to say #6113 is a prerequisite for this work to be effective.

Ongoing work is happening in parallel too for that issue: #7067 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants