Skip to content

v1.0.3

Compare
Choose a tag to compare
@janreges janreges released this 10 Nov 01:03
· 151 commits to main since this release

Changelog

All notable changes to this project will be documented in this file. Dates are displayed in UTC.

Demo video: https://www.youtube.com/watch?v=qEiSTpb66nA

v1.0.3

  • cache/storage: better race-condition handling in a situation where several coroutines could write the same folder at one time, then mkdir reported 'File exists' be543dc

v1.0.2

10 November 2023

  • version: 1.0.2.20231110 + changelog 230b947
  • html report: added aria labels to active/important elements a329b9d
  • version: 1.0.1.20231109 - changelog 50dc69c

v1.0.1

9 November 2023

  • version: 1.0.1.20231109 e213cb3
  • offline exporter: fixed case when on https:// website is link to same path but with http:// protocol (it overrided proper *.html file just with meta redirect .. real case from nextjs.org) 4a1be0b
  • html processor: force to remove all anchor listeners when NextJS is detected (it is very hard to achive a working NextJS with offline file:// protocol) 2b1d935
  • file exporters: now by default crawler generates a html/json/txt report to 'tmp/[report|output].%domain%.%datetime%.[html|json|txt]' .. i assume that most people will want to save/see them 7831c6b
  • security analysis: removed multi-line console output for recommendations .. it was ugly 310af30
  • json output: added JSON_UNESCAPED_UNICODE for unescaped unicode chars (e.g. czech chars will be readable) cf1de9f
  • mailer: do not send e-mails in case of interruption of the crawler using ctrl+c 19c94aa
  • refactoring: manager stats logic extracted into ManagerStats and implemented also into manager of content processors + stats added into 'Crawler stats' tab in HTML report 3754200
  • refactoring: content related logic extracted to content processors based on ContentProcessor interface with methods findUrls():?FoundUrls, applyContentChangesForOfflineVersion():void and isContentTypeRelevant():bool + better division of web framework related logic (NextJS, Astro, Svelte, ...) + better URL handling and maximized usage of ParsedUrl 6d9f25c
  • phpstan: ignore BASE_DIR warning 6e0370a
  • offline website exporter: improved export of a website based on NextJS, but it's not perfect, because latest NextJS version do not have some JS/CSS path in code, but they are generated dynamicly from arrays/objects c4993ef
  • seo analyzer: fixed trim() warning when no <h1> found f0c526f
  • offline export: a lot of improvements when generating the offline version of the website on NextJS - chunk detection from the manifest, replacing paths, etc. 98c2e15
  • seo and og: fixed division by zero when no og/twitter tags found 19e4259
  • console output: lots of improvements for nice, consistent and minimal word-wrap output 596a5dc
  • basic file/dir structure: created ./crawler (for Linux/macOS) and ./crawler.bat for Windows, init script moved to ./src, small related changes about file/dir path building 5ce41ee
  • header status: ignore too dynamic Content-Disposition header 4e0c6fd
  • offline website exporter: added .html extensions to typical dynamic language extensions, because without it the browser will show them as source code 7130b9e
  • html report: show tables with details, even if they are without data (it is good to know that the checks were carried out, but nothing was found) da019e4
  • tests: repaired tests after last changes of file/url building for offline website .. merlot is great! 7c77c41
  • utils: be more precise and do not replace attributes in SVG .. creative designers will not love you when looking at the broken SVG in HTML report 3fc81bb
  • utils: be more precise in parsing phone numbers, otherwise people will 'love' you because of false positives .. wine is still great 51fd574
  • html parser: better support for formatted html with tags/attributes on multiple lines 89a36d2
  • utils: don't be hungry in stripJavaScript() because you ate half of my html :) wine is already in my head... 0e00957
  • file result storage: changed cache directory structure for consistency with http client's cache, so it looks like my.domain.tld-443/04/046ec07c.cache 26bf428
  • http client cache: for better consistency with result storage cache, directory structure now contains also port, so it looks like my.domain.tld-443/b9/b989bdcf2b9389cf0c8e5edb435adc05.cache a0b2e09
  • http client cache: improved directory structure for large scale and better orientation for partial cache deleting.. current structure in tmp dir: my.domain.tld/b9/b989bdcf2b9389cf0c8e5edb435adc05.cache 10e02c1
  • offline website exporter: better srcset handling - urls can be defined with or without sizes 473c1ad
  • html report: blue color for search term, looks better cb47df9
  • offline website exporter: handled situation of the same-name folder/file when both the folder /foo/next.js/ and the file /foo/next.js existed on the website (real case from vercel.com) 7c27d2c
  • exporters: added exec times to summary messages 41c8873
  • crawler: use port from URL if defined or by scheme .. previous solution didn't work properly for localhost:port and parsed URLs to external websites 324ba04
  • heading analysis: changed sorting to DESC by errors, renamed Headings structure -> Heading structure dbc1a38
  • security analysis: detection and ignoring of URLs that point to a non-existent static file but return 404 HTML, better description 193fb7d
  • super table: added escapeOutputHtml property to column for better escape managing + updated related supertables bfb901c
  • headings analysis: replace usage of DOMNode->textContent because when the headings contain other tags, including <script>, textContent also contains JS code, but without the <script> tag 5c426c2
  • best practices: better missing quotes detection and minimizing false positives in special cases (HTML/JS in attributes, etc.) b03a534
  • best practices: better SVG detection and minimizing false positives (e.g. code snippets with SVG), improved look in HTML report and better descriptions c35f7e2
  • headers analysis: added [ignored generic values] or [see values below] for specific headers a7b444d
  • core options: changed --hide-scheme-and-host to --show-scheme-and-host (by default is hidden schema+host better) 3c202e9
  • truncating: replaced '...' with '…' 870cf8c
  • accessibility analyzer: better descriptions 514b471
  • crawler & http client: if the response is loaded from the cache, we do not wait due to rate limiting - very useful for repeated executions 61fbfab
  • header stats: added missing strval in values preview 9e11030
  • content type analyzer: increased column width for MIME type from 20 to 26 (enough for application/octet-stream) c806674
  • SSL/TLS analyzer: fixed issues on Windows with Cygwin where nslookup does not work reliably 714b9e1
  • text output: removed redundant whitespaces from banner after .YYYYMMDD was added to the version number 8b76205
  • readme: added link to #ready-to-use-releases to summary 574b39e
  • readme: added section Ready-to-use releases 44d686b
  • changelog: added changelog by https://github.com/cookpete/auto-changelog/tree/master + added 'composer changelog' d11af7e