v1.0.3
Changelog
All notable changes to this project will be documented in this file. Dates are displayed in UTC.
Demo video: https://www.youtube.com/watch?v=qEiSTpb66nA
v1.0.3
- cache/storage: better race-condition handling in a situation where several coroutines could write the same folder at one time, then mkdir reported 'File exists'
be543dc
v1.0.2
10 November 2023
- version: 1.0.2.20231110 + changelog
230b947
- html report: added aria labels to active/important elements
a329b9d
- version: 1.0.1.20231109 - changelog
50dc69c
v1.0.1
9 November 2023
- version: 1.0.1.20231109
e213cb3
- offline exporter: fixed case when on https:// website is link to same path but with http:// protocol (it overrided proper *.html file just with meta redirect .. real case from nextjs.org)
4a1be0b
- html processor: force to remove all anchor listeners when NextJS is detected (it is very hard to achive a working NextJS with offline file:// protocol)
2b1d935
- file exporters: now by default crawler generates a html/json/txt report to 'tmp/[report|output].%domain%.%datetime%.[html|json|txt]' .. i assume that most people will want to save/see them
7831c6b
- security analysis: removed multi-line console output for recommendations .. it was ugly
310af30
- json output: added JSON_UNESCAPED_UNICODE for unescaped unicode chars (e.g. czech chars will be readable)
cf1de9f
- mailer: do not send e-mails in case of interruption of the crawler using ctrl+c
19c94aa
- refactoring: manager stats logic extracted into ManagerStats and implemented also into manager of content processors + stats added into 'Crawler stats' tab in HTML report
3754200
- refactoring: content related logic extracted to content processors based on ContentProcessor interface with methods findUrls():?FoundUrls, applyContentChangesForOfflineVersion():void and isContentTypeRelevant():bool + better division of web framework related logic (NextJS, Astro, Svelte, ...) + better URL handling and maximized usage of ParsedUrl
6d9f25c
- phpstan: ignore BASE_DIR warning
6e0370a
- offline website exporter: improved export of a website based on NextJS, but it's not perfect, because latest NextJS version do not have some JS/CSS path in code, but they are generated dynamicly from arrays/objects
c4993ef
- seo analyzer: fixed trim() warning when no <h1> found
f0c526f
- offline export: a lot of improvements when generating the offline version of the website on NextJS - chunk detection from the manifest, replacing paths, etc.
98c2e15
- seo and og: fixed division by zero when no og/twitter tags found
19e4259
- console output: lots of improvements for nice, consistent and minimal word-wrap output
596a5dc
- basic file/dir structure: created ./crawler (for Linux/macOS) and ./crawler.bat for Windows, init script moved to ./src, small related changes about file/dir path building
5ce41ee
- header status: ignore too dynamic Content-Disposition header
4e0c6fd
- offline website exporter: added .html extensions to typical dynamic language extensions, because without it the browser will show them as source code
7130b9e
- html report: show tables with details, even if they are without data (it is good to know that the checks were carried out, but nothing was found)
da019e4
- tests: repaired tests after last changes of file/url building for offline website .. merlot is great!
7c77c41
- utils: be more precise and do not replace attributes in SVG .. creative designers will not love you when looking at the broken SVG in HTML report
3fc81bb
- utils: be more precise in parsing phone numbers, otherwise people will 'love' you because of false positives .. wine is still great
51fd574
- html parser: better support for formatted html with tags/attributes on multiple lines
89a36d2
- utils: don't be hungry in stripJavaScript() because you ate half of my html :) wine is already in my head...
0e00957
- file result storage: changed cache directory structure for consistency with http client's cache, so it looks like my.domain.tld-443/04/046ec07c.cache
26bf428
- http client cache: for better consistency with result storage cache, directory structure now contains also port, so it looks like my.domain.tld-443/b9/b989bdcf2b9389cf0c8e5edb435adc05.cache
a0b2e09
- http client cache: improved directory structure for large scale and better orientation for partial cache deleting.. current structure in tmp dir: my.domain.tld/b9/b989bdcf2b9389cf0c8e5edb435adc05.cache
10e02c1
- offline website exporter: better srcset handling - urls can be defined with or without sizes
473c1ad
- html report: blue color for search term, looks better
cb47df9
- offline website exporter: handled situation of the same-name folder/file when both the folder /foo/next.js/ and the file /foo/next.js existed on the website (real case from vercel.com)
7c27d2c
- exporters: added exec times to summary messages
41c8873
- crawler: use port from URL if defined or by scheme .. previous solution didn't work properly for localhost:port and parsed URLs to external websites
324ba04
- heading analysis: changed sorting to DESC by errors, renamed Headings structure -> Heading structure
dbc1a38
- security analysis: detection and ignoring of URLs that point to a non-existent static file but return 404 HTML, better description
193fb7d
- super table: added escapeOutputHtml property to column for better escape managing + updated related supertables
bfb901c
- headings analysis: replace usage of DOMNode->textContent because when the headings contain other tags, including <script>, textContent also contains JS code, but without the <script> tag
5c426c2
- best practices: better missing quotes detection and minimizing false positives in special cases (HTML/JS in attributes, etc.)
b03a534
- best practices: better SVG detection and minimizing false positives (e.g. code snippets with SVG), improved look in HTML report and better descriptions
c35f7e2
- headers analysis: added [ignored generic values] or [see values below] for specific headers
a7b444d
- core options: changed --hide-scheme-and-host to --show-scheme-and-host (by default is hidden schema+host better)
3c202e9
- truncating: replaced '...' with '…'
870cf8c
- accessibility analyzer: better descriptions
514b471
- crawler & http client: if the response is loaded from the cache, we do not wait due to rate limiting - very useful for repeated executions
61fbfab
- header stats: added missing strval in values preview
9e11030
- content type analyzer: increased column width for MIME type from 20 to 26 (enough for application/octet-stream)
c806674
- SSL/TLS analyzer: fixed issues on Windows with Cygwin where nslookup does not work reliably
714b9e1
- text output: removed redundant whitespaces from banner after .YYYYMMDD was added to the version number
8b76205
- readme: added link to #ready-to-use-releases to summary
574b39e
- readme: added section Ready-to-use releases
44d686b
- changelog: added changelog by https://github.com/cookpete/auto-changelog/tree/master + added 'composer changelog'
d11af7e