Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL without filename fails #4

Closed
ryandesign opened this issue Jul 4, 2024 · 5 comments
Closed

URL without filename fails #4

ryandesign opened this issue Jul 4, 2024 · 5 comments

Comments

@ryandesign
Copy link
Contributor

With wcurl 2024-07-02:

% wcurl https://github.com/Debian/     
curl: Remote file name has no length
curl: (23) Failed writing received data to disk/application

However with wget 1.24.5:

% wget https://github.com/Debian/
--2024-07-04 09:53:47--  https://github.com/Debian/
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

index.html                                                             [  <=>                                                                                                                                                          ] 245.03K  1.17MB/s    in 0.2s    

2024-07-04 09:53:48 (1.17 MB/s) - ‘index.html’ saved [250913]
@bagder
Copy link
Member

bagder commented Jul 4, 2024

There's a plan to fix this in curl, although saving it with a different filename than what wget picks: curl/curl#13988

@ryandesign
Copy link
Contributor Author

ryandesign commented Jul 4, 2024

This works for me but please test:

https://salsa.debian.org/debian/wcurl/-/merge_requests/4

Note there is a new dependency on trurl.

@samueloph
Copy link
Member

I'll try to keep the discussion about this issue on salsa, but if anyone would like to reply and doesn't have an account, feel free to do it here.

@BrianInglis
Copy link

BrianInglis commented Sep 15, 2024

Works for me but saves file Debian not Debian.html.
With just a host name, e.g. curl.se saves curl_response, not even curl-response.html, or better curl[-.]se.html or curl[-.]se[-.]index.html, which would be better than wget/2 anonymous index.html! Added similar comment to @curl #13988
Just packaged wcurl as part of Cygwin distribution standard main package curl 8.10 so trying to get ahead of users trying it out!
I describe wcurl and mention your home page in the announcement, so they could come here ;^>
No other Cygwin packagers had any comments on whether I should include it in curl, make it a subpackage of curl source package, or package wcurl source and "binary" separately, so thought I would help out most users by giving out a free wcurl script and docs with every curl command line package. ;^>

@BrianInglis
Copy link

Could translate back from response content-type: header media-type/mime-type, for example:

$ curl -I curl.se
...
HTTP/2 200
server: nginx/1.21.1
content-type: text/html
...

to file type suffix extension using shared-mime-info data in /usr/share/mime/packages/freedesktop.org.xml which gives a list of glob patterns for each mime-type, for example:

$ awk '/<mime-type\stype="text\/html">/,/<\/mime-type>/' /usr/share/mime/packages/freedesktop.org.xml
  <mime-type type="text/html">
    <comment>HTML document</comment>
    <comment xml:lang="zh_TW">HTML 文件</comment>
    <comment xml:lang="zh_CN">HTML 文档</comment>
...
    <comment xml:lang="en_GB">HTML document</comment>
...
    <acronym>HTML</acronym>
    <expanded-acronym>HyperText Markup Language</expanded-acronym>
    <sub-class-of type="text/plain"/>
    <magic>
      <match type="string" value="&lt;!DOCTYPE HTML" offset="0:256"/>
...
    </magic>
    <magic priority="40">
      <match type="string" value="&lt;!--" offset="0"/>
      <match type="string" value="&lt;TITLE" offset="0:256"/>
      <match type="string" value="&lt;title" offset="0:256"/>
    </magic>
    <glob pattern="*.html" weight="80"/>
    <glob pattern="*.htm" weight="80"/>
  </mime-type>

The code could be something equivalent to this awk command:

$ awk '/<mime-type\s+type="[^"]+"[^>]*>/,/<\/mime-type>/ {
  if (!found) found = match( $0, "<mime-type type=\"" mime_type "\"");
  if (found && /<glob\s+pattern="/) {
    sub( /^\s*<glob\s+pattern="\*/, "");
    sub( /".*$/, "");
    print;
    exit; # exit on first match
  }
}' mime_type="text/html" /usr/share/mime/packages/freedesktop.org.xml
.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants