Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unix domain socket support to HTTP transport #1681

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lcfyi
Copy link

@lcfyi lcfyi commented Feb 21, 2024

Changes since v1:

  • Updated test to use Perl instead of nc to proxy between UDS and TCP socket; I chose not to split this out into a library since its use is hyper-specific and has a dependency on lib-httpd.sh

cc: Eric Wong [email protected]
cc: Leslie Cheng [email protected]

Copy link

Welcome to GitGitGadget

Hi @lcfyi, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that either:

  • Your Pull Request has a good description, if it consists of multiple commits, as it will be used as cover letter.
  • Your Pull Request description is empty, if it consists of a single commit, as the commit message should be descriptive enough by itself.

You can CC potential reviewers by adding a footer to the PR description with the following syntax:

CC: Revi Ewer <[email protected]>, Ill Takalook <[email protected]>

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this. Note that any reviewers CC'd via the list in the PR description will not actually be sent emails.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

Copy link

There are issues in commit 40c6598:
Add CURLOPT_UNIX_SOCKET_PATH support to http.c
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 44fc8b9:
Add tests for unixsocket.
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 3c8aa87:
Update documentation.
Commit checks stopped - the message is too short
Commit not signed off

@lcfyi lcfyi force-pushed the lcfyi/add-unix-socket-support branch 2 times, most recently from ff2e060 to 3e53163 Compare February 21, 2024 08:20
@dscho
Copy link
Member

dscho commented Feb 21, 2024

/allow

Copy link

User lcfyi is now allowed to use GitGitGadget.

@lcfyi
Copy link
Author

lcfyi commented Feb 21, 2024

/preview

Copy link

Preview email sent as [email protected]

@lcfyi
Copy link
Author

lcfyi commented Feb 21, 2024

/submit

Copy link

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1681/lcfyi/lcfyi/add-unix-socket-support-v1

To fetch this version to local tag pr-git-1681/lcfyi/lcfyi/add-unix-socket-support-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1681/lcfyi/lcfyi/add-unix-socket-support-v1

Copy link

On the Git mailing list, Eric Wong wrote (reply to this):

Leslie Cheng via GitGitGadget <[email protected]> wrote:

> Subject: Re: [PATCH] Add unix domain socket support to HTTP transport.

No need for trailing `.' in commit message titles

<snip>

> @@ -455,6 +458,20 @@ static int http_options(const char *var, const char *value,
>  		return 0;
>  	}
>  
> +	if (!strcmp("http.unixsocket", var)) {
> +#ifdef GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH
> +#ifndef NO_UNIX_SOCKETS
> +		return git_config_string(&curl_unix_socket_path, var, value);
> +#else
> +		warning(_("Unix socket support unavailable in this build of Git"));
> +		return 0;
> +#endif
> +#else
> +		warning(_("Unix socket support is not supported with cURL < 7.40.0"));
> +		return 0;
> +#endif
> +	}

Personally, I'd hoist the #ifdef part into a standalone function
since I find mixing CPP and C conditionals confusing.

disclaimer: I'm an easily confused person and don't usually
program in C, though.

<snip>

> --- /dev/null
> +++ b/t/t5565-http-unix-domain-socket.sh

<snip>

> +    nc -klU "$socket_path" <tcp_to_uds >uds_to_tcp &
> +    UDS_PID=$!
> +
> +    nc "$host" "$port" >tcp_to_uds <uds_to_tcp &

`nc' isn't widely installed, its supported flags vary between
implementations, and our test suite doesn't currently use it.
I suggest either using a small bit of Perl or writing a t/helper
program to do its job.

Finally, hard tabs should be used for indentation throughout.

I'll wait on others to comment since I haven't looked at git
hacking in a while.

Anyways, I think this feature could be useful for me, too :>
Thanks.

Copy link

User Eric Wong <[email protected]> has been added to the cc: list.

@lcfyi lcfyi force-pushed the lcfyi/add-unix-socket-support branch from 3e53163 to e4e0418 Compare February 22, 2024 02:43
@lcfyi lcfyi changed the title Add unix domain socket support to HTTP transport. Add unix domain socket support to HTTP transport Feb 22, 2024
@lcfyi lcfyi force-pushed the lcfyi/add-unix-socket-support branch from e4e0418 to 8548efc Compare February 22, 2024 03:03
Copy link

On the Git mailing list, Leslie Cheng wrote (reply to this):

> No need for trailing `.' in commit message titles

Will fix in the next patch, sorry!


> Personally, I'd hoist the #ifdef part into a standalone function
> since I find mixing CPP and C conditionals confusing.
> > disclaimer: I'm an easily confused person and don't usually
> program in C, though.

I considered extracting it out, but the other conditionals in this function follow a similar pattern so I didn't want to change it. However, my use here is also the first time there's both an #ifdef and nested #ifndef, which I agree makes it a bit confusing to grok.

I'm open to changing it, but I'll let it sit and marinate for a bit.


> `nc' isn't widely installed, its supported flags vary between
> implementations, and our test suite doesn't currently use it.
> I suggest either using a small bit of Perl or writing a t/helper
> program to do its job.
> > Finally, hard tabs should be used for indentation throughout.
> > I'll wait on others to comment since I haven't looked at git
> hacking in a while.
> > Anyways, I think this feature could be useful for me, too :>
> Thanks.

Good catch, I'll fix in the next patch. I've subbed `nc` out for a simple Perl script to pipe back and forth, just making sure CI is happy about this before submitting.

Copy link

User Leslie Cheng <[email protected]> has been added to the cc: list.

This changeset introduces an `http.unixSocket` option so that users can
proxy their git over HTTP remotes to a unix domain socket. In terms of
why, since UDS are local and git already has a local protocol: some
corporate environments use a UDS to proxy requests to internal resources
(ie. source control), so this change would support those use-cases. This
proxy can occasionally be necessary to attach MFA tokens or client
certificates for CLI tools.

The implementation leverages `--unix-socket` option [0] via the
`CURLOPT_UNIX_SOCKET_PATH` flag available with libcurl [1].

`GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH` and `NO_UNIX_SOCKETS` were kept
separate so that we can spit out better error messages for users if git
was compiled with `NO_UNIX_SOCKETS`.

[0] https://curl.se/docs/manpage.html#--unix-socket
[1] https://curl.se/libcurl/c/CURLOPT_UNIX_SOCKET_PATH.html

Signed-off-by: Leslie Cheng <[email protected]>
@lcfyi lcfyi force-pushed the lcfyi/add-unix-socket-support branch from 8548efc to 2af5cc8 Compare February 23, 2024 01:22
@lcfyi
Copy link
Author

lcfyi commented Feb 23, 2024

/submit

Copy link

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1681/lcfyi/lcfyi/add-unix-socket-support-v2

To fetch this version to local tag pr-git-1681/lcfyi/lcfyi/add-unix-socket-support-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1681/lcfyi/lcfyi/add-unix-socket-support-v2

Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Leslie Cheng via GitGitGadget" <[email protected]> writes:

> Subject: Re: [PATCH v2] Add unix domain socket support to HTTP transport

Perhaps

	Subject: [PATCH] http: enable proxying via unix-domain socket

to follow the usual "<area>: <description>" format?

> From: Leslie Cheng <[email protected]>
>
> This changeset introduces an `http.unixSocket` option so that users can

"This changeset introduces" -> "Introduce".  There may be other
gotchas that might use help from Documentation/SubmittingPatches,
but I didn't read too carefully.

Besides, it is a single patch, not a set of changes ;-).

`http.unixSocket` is a configuration variable.  It may be confusing
to use the word "option".  Speaking of options, shouldn't there be a
command line option that overrides the configured value?

We should honor the usual http.<url>.VARIABLE convention where
http.<url>.VARIABLE that is destination-specific overrides a more
generic http.VARIABLE configuration variable.

> proxy their git over HTTP remotes to a unix domain socket. In terms of
> why, since UDS are local and git already has a local protocol: some
> corporate environments use a UDS to proxy requests to internal resources
> (ie. source control), so this change would support those use-cases. This

"ie." -> "i.e.,"?

> proxy can occasionally be necessary to attach MFA tokens or client
> certificates for CLI tools.
>
> The implementation leverages `--unix-socket` option [0] via the
> `CURLOPT_UNIX_SOCKET_PATH` flag available with libcurl [1].

There is a feature in libcURL library, that is enabled by setting
the CURLOPT_UNIX_SOCKET_PATH option via the curl_easy_setopt() call,
and their command line utility.  You do the same to implement this
feature.  But when you are not adding "--unix-socket" option to any
of our commands, mention of that option name makes it more confusing
than necessary.

The usual way to compose a log message of this project is to

 - Give an observation on how the current system work in the present
   tense (so no need to say "Currently X is Y", just "X is Y"), and
   discuss what you perceive as a problem in it.

 - Propose a solution (optional---often, problem description
   trivially leads to an obvious solution in reader's minds).

 - Give commands to the codebase to "become like so".

in this order.

How about following that convention, perhaps like:

    In some corporate environments, the proxy server listens to a
    local unix domain socket for requests, instead of listening to a
    network port.  Even though we have http.proxy (and more
    destination specific http.<url>.proxy) configuration variables
    to specify the network address/port of a proxy, that would not
    help if your proxy does not listen to the network.

    Introduce an `http.unixSocket` (and `http.<url>.unixSocket`)
    configuration variables that specify the path to a unix domain
    socket for such a proxy.  Recent versions of libcURL library
    added CURLOPT_UNIX_SOCKET_PATH to support "curl --unix-socket
    <path>"---use the same mechanism to implement it.

> `GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH` and `NO_UNIX_SOCKETS` were kept
> separate so that we can spit out better error messages for users if git
> was compiled with `NO_UNIX_SOCKETS`.

Unlike NO_UNIX_SOCKETS, GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH is
entirely internal to your implementation and not surfaced to neither
the end-users or the binary packagers.  Because of that, I suspect
that any description that has to use that name probably falls on the
other side of "too much implementation details" to be useful to help
future developers..

Besides, I suspect that GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH might
not be the optimum approach.  See below.

> diff --git a/Documentation/config/http.txt b/Documentation/config/http.txt
> index 2d4e0c9b869..bf48cbd599a 100644
> --- a/Documentation/config/http.txt
> +++ b/Documentation/config/http.txt
> @@ -277,6 +277,11 @@ http.followRedirects::
>  	the base for the follow-up requests, this is generally
>  	sufficient. The default is `initial`.
>  
> +http.unixSocket::
> +	Connect through this Unix domain socket via HTTP, instead of using the
> +	network. If set, this config takes precendence over `http.proxy` and
> +	is incompatible with the proxy options (see `curl(1)`).

Talking about precedence between this and http.proxy is good thing,
but one very important piece of information is missing.  What value
does it take?

	The absolute path of a unix-domain socket to pass the HTTP
	traffic over, instead of using the network.

or something, perhaps?

> diff --git a/git-curl-compat.h b/git-curl-compat.h
> index fd96b3cdffd..f0f3bec0e17 100644
> --- a/git-curl-compat.h
> +++ b/git-curl-compat.h
> @@ -74,6 +74,13 @@
>  #define GIT_CURL_HAVE_CURLE_SSL_PINNEDPUBKEYNOTMATCH 1
>  #endif
>  
> +/**
> + * CURLOPT_UNIX_SOCKET_PATH was added in 7.40.0, released in January 2015.
> + */
> +#if LIBCURL_VERSION_NUM >= 0x074000
> +#define GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH 1
> +#endif

The "HAVE" part in GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH is a
statement of a fact.  If the version of cURL library we have is
certain value, we have it.  OK.

> diff --git a/http.c b/http.c
> index e73b136e589..8cfdcaeac82 100644
> --- a/http.c
> +++ b/http.c
> @@ -79,6 +79,9 @@ static const char *http_proxy_ssl_ca_info;
>  static struct credential proxy_cert_auth = CREDENTIAL_INIT;
>  static int proxy_ssl_cert_password_required;

It might make the code easier to follow if you did:

	#if !defined(NO_CURLOPT_UNIX_SOCKET_PATH) && !defined(NO_UNIX_SOCKETS)
	#if defined(GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH)
        #define USE_CURLOPT_UNIX_SOCKET_PATH
	#endif
	#endif
        
The points are

 (1) the users can decline to use CURLOPT_UNIX_SOCKET_PATH while
     still using unix domain socket for other purposes, and

 (2) you do not have to care if you HAVE it or not most of time;
     what matters more often is if the user told you to USE it.

Hmm?

> +#if defined(GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH) && !defined(NO_UNIX_SOCKETS)
> +static const char *curl_unix_socket_path;
> +#endif

The guard here would become "#ifdef USE_CURLOPT_UNIX_SOCKET_PATH" if
we wanted this to be conditional, but I think it is easier to make
the variable unconditionally available; see below.

> @@ -455,6 +458,20 @@ static int http_options(const char *var, const char *value,
>  		return 0;
>  	}
>  
> +	if (!strcmp("http.unixsocket", var)) {
> +#ifdef GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH
> +#ifndef NO_UNIX_SOCKETS
> +		return git_config_string(&curl_unix_socket_path, var, value);
> +#else
> +		warning(_("Unix socket support unavailable in this build of Git"));
> +		return 0;
> +#endif
> +#else
> +		warning(_("Unix socket support is not supported with cURL < 7.40.0"));
> +		return 0;
> +#endif
> +	}

In general, it is inadvisable to issue a warning in the codepath
that parses configuration variables, as the values we read may not
be necessarily used.  We could instead accept the given path into a
variable unconditionally, and complain only before it gets used,
near the call to curl_easy_setopt().

>  	if (!strcmp("http.cookiefile", var))
>  		return git_config_pathname(&curl_cookie_file, var, value);
>  	if (!strcmp("http.savecookies", var)) {
> @@ -1203,6 +1220,12 @@ static CURL *get_curl_handle(void)
>  	}
>  	init_curl_proxy_auth(result);
>  
> +#if defined(GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH) && !defined(NO_UNIX_SOCKETS)
> +	if (curl_unix_socket_path) {
> +		curl_easy_setopt(result, CURLOPT_UNIX_SOCKET_PATH, curl_unix_socket_path);
> +	}
> +#endif

Here, the guard may become more like

		if (curl_unix_socket_path) {
	#ifdef USE_CURLOPT_UNIX_SOCKET_PATH
			curl_easy_setopt(...);
	#elif defined(NO_CURLOPT_UNIX_SOCKET_PATH) || defined(NO_UNIX_SOCKETS)
			warning(_("this build disables the unix-domain-socket feature"));
	#elif
			warning(_("your cURL library is too old"));
	#endif
		}

Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <[email protected]> writes:

> "Leslie Cheng via GitGitGadget" <[email protected]> writes:
>
>> Subject: Re: [PATCH v2] Add unix domain socket support to HTTP transport
>
> Perhaps
>
> 	Subject: [PATCH] http: enable proxying via unix-domain socket
>
> to follow the usual "<area>: <description>" format?
>
>> From: Leslie Cheng <[email protected]>
>>
>> This changeset introduces an `http.unixSocket` option so that users can
>
> "This changeset introduces" -> "Introduce".  There may be other
> gotchas that might use help from Documentation/SubmittingPatches,
> but I didn't read too carefully.
>
> Besides, it is a single patch, not a set of changes ;-).
>
> `http.unixSocket` is a configuration variable.  It may be confusing
> to use the word "option".  Speaking of options, shouldn't there be a
> command line option that overrides the configured value?
>
> We should honor the usual http.<url>.VARIABLE convention where
> http.<url>.VARIABLE that is destination-specific overrides a more
> generic http.VARIABLE configuration variable.

Clarification.  I know the above is automatically achieved, given
the way we have laid urlmatch foundation to allow easy parsing for
configuration variables structured this way.  I did not mean that
you'd need to do anything special; rather, I meant that we should
advertise that we do in the commit log message.

Copy link

On the Git mailing list, Leslie Cheng wrote (reply to this):

On 2024-02-23 12:37 a.m., Junio C Hamano wrote:
> How about following that convention, perhaps like:
>
>     In some corporate environments, the proxy server listens to a
>     local unix domain socket for requests, instead of listening to a
>     network port.  Even though we have http.proxy (and more
>     destination specific http.<url>.proxy) configuration variables
>     to specify the network address/port of a proxy, that would not
>     help if your proxy does not listen to the network.
>
>     Introduce an `http.unixSocket` (and `http.<url>.unixSocket`)
>     configuration variables that specify the path to a unix domain
>     socket for such a proxy.  Recent versions of libcURL library
>     added CURLOPT_UNIX_SOCKET_PATH to support "curl --unix-socket
>     <path>"---use the same mechanism to implement it.

This is excellent, thanks for the guidance (and all the other
suggestions prior)! I'll update in the next patch.


> Unlike NO_UNIX_SOCKETS, GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH is
> entirely internal to your implementation and not surfaced to neither
> the end-users or the binary packagers.  Because of that, I suspect
> that any description that has to use that name probably falls on the
> other side of "too much implementation details" to be useful to help
> future developers..

That's reasonable, I figured it would fit as a cover letter detail but I
agree it's not relevant as a commit message that lives in the history of
the project. I'll also update this in the next patch.


> Talking about precedence between this and http.proxy is good thing,
> but one very important piece of information is missing.  What value
> does it take?
>
> 	The absolute path of a unix-domain socket to pass the HTTP
> 	traffic over, instead of using the network.
>
> or something, perhaps?

I like that wording, I'll update in the next patch.


> It might make the code easier to follow if you did:
>
> 	#if !defined(NO_CURLOPT_UNIX_SOCKET_PATH) && !defined(NO_UNIX_SOCKETS)
> 	#if defined(GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH)
>         #define USE_CURLOPT_UNIX_SOCKET_PATH
> 	#endif
> 	#endif
>
> The points are
>
>  (1) the users can decline to use CURLOPT_UNIX_SOCKET_PATH while
>      still using unix domain socket for other purposes, and
>
>  (2) you do not have to care if you HAVE it or not most of time;
>      what matters more often is if the user told you to USE it.
>
> Hmm?

Do you think this functionality is worth adding another macro to
conditionally include it in the build? It felt out-of-the-way enough
that we could just use the same `NO_UNIX_SOCKETS` macro to control for
environments that don't support unix domain sockets.


>> +#if defined(GIT_CURL_HAVE_CURLOPT_UNIX_SOCKET_PATH) &&
!defined(NO_UNIX_SOCKETS)
>> +static const char *curl_unix_socket_path;
>> +#endif
>
> The guard here would become "#ifdef USE_CURLOPT_UNIX_SOCKET_PATH" if
> we wanted this to be conditional, but I think it is easier to make
> the variable unconditionally available; see below.

Agreed in general, I was looking to other patterns for conditional
variables in file, e.g.
https://github.com/git/git/blob/3c2a3fdc388747b9eaf4a4a4f2035c1c9ddb26d0/http.c#L66-L68

But, revisiting, this looks like an exception rather than the norm.


> In general, it is inadvisable to issue a warning in the codepath
> that parses configuration variables, as the values we read may not
> be necessarily used.  We could instead accept the given path into a
> variable unconditionally, and complain only before it gets used,
> near the call to curl_easy_setopt().

Similar to above, I followed what was already done for certain
configuration variables (e.g.
https://github.com/git/git/blob/3c2a3fdc388747b9eaf4a4a4f2035c1c9ddb26d0/http.c#L485-L501),
but I agree with your feedback since this would result in constant warnings.


To summarize, I'll do the following in the next patch:
 - change the wording of the commit message
 - move the conditional variables and parsing to a check at
`curl_easy_setopt()` time

I'm still undecided on whether I should introduce another macro
specifically for this functionality, and I'd like to hear your thoughts
on why it might make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants