urlparse is a URL parser compatible to url_parser_parse_url() from nodejs/http-parser which has been archived since 2022. urlparse only implements the strict mode.
There is a slight difference in a return code when they fail.
url_parser_parse_url() returns nonzero if it fails.
urlparse_parse_url() returns the negative error code
URLPARSE_ERR_PARSE
if it fails.
A caller needs to call http_parser_url_init() before
http_parser_parse_url(). urlparse does not need a similar function
because urlparse_parser_url() initializes urlparse_url
before
its use.
url_parser_parse_url() historically does not follow any standards like RFC 3986. Here is the allowed characters in each URL component:
- scheme:
A-Za-z
- userinfo:
A-Za-z!$%&'()*+,-.:;=_~
- host:
a-zA-Z0-9-.
- IPv6 host:
A-Fa-f0-9.:
- optionally followed by zone info which starts
%
and can contain:A-Za-z0-9%-._~
- and IPv6 host must be enclosed by
[
and]
- optionally followed by zone info which starts
- port:
0-9
- path:
A-Za-z0-9!"$%&'()*+,-./:;<=>@[\]^_`{|}~
- query:
A-Za-z0-9!"$%&'()*+,-./:;<=>?@[\]^_`{|}~
- fragment:
A-Za-z0-9!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
- all consecutive
#
characters that precede a fragment are treated as a single#
. A fragment cannot start with#
.
- all consecutive