Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Components with control characters don't appear in --json output, and non-urlencoded --get fails #262

Open
emanuele6 opened this issue Dec 1, 2023 · 3 comments

Comments

@emanuele6
Copy link
Collaborator

$ ./trurl 'http://example.org/%18' --json | jq -c .
[{"url":"http://example.org/%18","parts":{"scheme":"http","host":"example.org"}}]
$ ./trurl 'http://example.org/%18' --urlencode --json | jq -c .
[{"url":"http://example.org/%18","parts":{"scheme":"http","host":"example.org","path":"/%18"}}]
$ ./trurl 'http://example.org/%18' -g {path}
trurl note: URL decode error, most likely because of rubbish in the input (path)


$ ./trurl 'http://example.org/%18' -g {:path}
/%18
@jacobmealey
Copy link
Contributor

Something interesting I noticed is that is works for queries. I wonder if we're missing a memdupdec somewhere?

I'd bet I broke this in this PR #214 but maybe it's been broken the whole time.

@jacobmealey
Copy link
Contributor

jacobmealey commented Jan 22, 2024

This looks like it's behavior from libcurl. I was able to get the same result with the following code. Should we open a ticked over there or are we just overlooking something simple?

#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    CURL *curl;
    CURLU *url;
    CURLUcode uc;
    // do it how trurl does it
    char *array= calloc(32, sizeof(char));
    const char *url_string = "http://example.org/%18";
    curl = curl_easy_init();
    url = curl_url();
    uc = curl_url_set(url, CURLUPART_URL, url_string, 0);

    uc = curl_url_get(url, CURLUPART_PATH, &array, CURLU_URLDECODE);
    if(uc) {
        printf("%s\n", curl_url_strerror(uc));
    } else {
        printf("%s\n", array);
    }
    // try with curl easy unescape 
    int decode_len;
    char *decoded = curl_easy_unescape(curl, url_string, strlen(url_string), &decode_len);
    printf("%s\n", decoded);
    printf("length: %ld, amount decoded: %d\n", strlen(url_string), decode_len);
    curl_url_cleanup(url);
    curl_easy_cleanup(curl);
    free(array);
    return 0;
}

@jacobmealey
Copy link
Contributor

jacobmealey commented Jan 22, 2024

Ahh it could also be that %18 maps to the ASCII character CAN (cancel), I'd bet curl doesn't play nice with decoding most control characters in the path. If you do it with %21 (either trurl or the example above you get the following:

$ trurl http://example.org/%21 --get "{path}"    
/! 

After some more testing I think you are just supposed to pass --urlencode for this scenario. We could do something to try and hint at this to the user?

$ trurl http://example.org/%18 --get "{path}"     
trurl note: URL decode error, most likely because of rubbish in the input (path)
                  try again with --urlencode 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants