Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for json output #618

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ doccheck:
py-check-docstrings --force $(MYPY_FILES_DIRS)

filescheck: localbuild
for out in text html gml sql csv xml gxml dot sitemap; do \
for out in text html json gml sql csv xml gxml dot sitemap; do \
./$(LAPPNAME) -o$$out -F$$out --complete -r1 -C $(FILESCHECK_URL) || exit 1; \
done

Expand Down
1 change: 1 addition & 0 deletions linkcheck/command/arg_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@
html Log URLs in keyword: argument fashion, formatted as HTML.
Additionally has links to the referenced pages. Invalid URLs have
HTML and CSS syntax check links appended.
json Log check results in JSON format.
csv Log check result in CSV format with one URL per line.
gml Log parent-child relations between linked URLs as a GML sitemap
graph.
Expand Down
4 changes: 4 additions & 0 deletions linkcheck/data/linkcheckerrc
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,10 @@
#colorok=#3ba557
#parts=all

# JSON logger
[json]
#indent=4

# failures logger
[failures]
#filename=~/.linkchecker/failures
Expand Down
109 changes: 109 additions & 0 deletions linkcheck/logger/json.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Copyright (C) 2022 Mark Ferrell
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
"""
JSON logger.
"""

import json
from . import _Logger


class JSONLogger(_Logger):
"""JSON logger; easy to parse with jq. """

LoggerName = 'json'
LoggerArgs = {
"filename": "linkchecker-out.json",
"indent": "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the intent is 4 is the default in linkcheckerrc.

}

def __init__(self, **kwargs):
"""Initialize error counter and optional file output."""
args = self.get_args(kwargs)
super().__init__(**args)
self.init_fileoutput(args)
self.indent = args.get('indent', 'default')
self.number = 0

def comment(self, s, **args):
"""JSON does not support comments"""
print(args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop print() because it can't be controlled by the user.

pass

def start_output(self):
"""Nothing to do"""
self.write("[")

def log_url(self, url_data):
"""Write url checking info."""

json_dict = {}

self.number += 1

if self.number > 1:
self.write(",")

if self.has_part('url'):
json_dict.update({ "url": url_data.base_url })

if url_data.name and self.has_part('name'):
json_dict.update({ "name": url_data.name })

if url_data.parent_url and self.has_part('parenturl'):
parent_url = { "url": url_data.parent_url }
if url_data.line is not None:
parent_url.update({ "line": url_data.line })
if url_data.column is not None:
parent_url.update({ "col": url_data.column })
if url_data.page > 0:
parent_url.update({ "page": url_data.page })
json_dict.update({ "parenturl": parent_url })

if url_data.base_ref and self.has_part('base'):
json_dict.update({ "base": url_data.base_ref })

if url_data.url and self.has_part('realurl'):
json_dict.update({ "realurl": url_data.url })

if url_data.checktime and self.has_part('checktime'):
json_dict.update({ "checktime": url_data.checktime })

if url_data.dltime >= 0 and self.has_part('dltime'):
json_dict.update({ "dltime": url_data.dltime })

if url_data.size >= 0 and self.has_part('dlsize'):
json_dict.update({ "dlsize": url_data.size })

if url_data.info and self.has_part('info'):
json_dict.update({ "info": url_data.info })

if url_data.modified and self.has_part('modified'):
json_dict.update({ "modified": url_data.modified })

if url_data.warnings and self.has_part('warning'):
json_dict.update({ "warnings": ["[%s] %s" % x for x in url_data.warnings] })

if self.has_part('result'):
json_dict.update({ "result": url_data.result })
json_dict.update({ "valid": { True:"true", False:"false" } [url_data.valid] })
json_dict.update({ "error": { True:"true", False:"false" } [not url_data.valid] })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both valid and error get written based on url_data.valid? Why have both?


self.write(json.dumps(json_dict, indent=self.indent))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other loggers call self.flush() at the end of log_url().

def end_output(self, **kwargs):
"""Nothing to do"""
self.write("]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess self.close_fileoutput() should be called as self.init_fileoutput() had been used.