Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add new file categorization plugin #2945 #2958

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

johnmhoran
Copy link
Member

@johnmhoran johnmhoran commented May 13, 2022

This PR add a new file categorization plugin.

@johnmhoran
Copy link
Member Author

This PR is related to issue #2945.

@pombredanne pombredanne changed the title 2945 file cat Add new file categorization plugin #2945 Aug 18, 2022
@pombredanne pombredanne changed the title Add new file categorization plugin #2945 [WIP] Add new file categorization plugin #2945 Aug 18, 2022
@pombredanne pombredanne added this to the v32.1 milestone Jan 6, 2023
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! See my nits inline.

Also the test file jarDependImg.png likely comes from http://svn.apache.org/repos/asf/xmlgraphics/batik/branches/webapi/documentation-sources/resources/images/jarDependImg.png under an Apache license that we did not document and is also too big for a test file. Can you instead create an extra small PNG that is not from a third-party?

from commoncode.datautils import String
from plugincode.post_scan import PostScanPlugin
from plugincode.post_scan import post_scan_impl
from commoncode.cliutils import PluggableCommandLineOption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort your imports

from commoncode.cliutils import POST_SCAN_GROUP

"""
Categorize files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand the documentation to explain the workings of this module at a high level?


options = [
PluggableCommandLineOption(
("--file-cat",),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a more explicit, non-abbreviated command line option, like --categorize-file or --file-category to stay consistent with other options.

@classmethod
def categorize(cls, resource):
"""
Return True if this Categorizer applies to this resource or False or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Return True if this Categorizer applies to this resource or False or None
Return True if this Categorizer category applies to this ``resource``.

But what is resource ? a path string? or a location? some object? we need to be precise there.

resource.save(codebase)


class Categorizer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some docstring to document what is this class used for and how to extend it

Suggested change
class Categorizer:
class Categorizer:
"""
A categorizer can categorize a file Resource based on its path, name,
extension and other attributes such as file or mime type.
Subclasses must override the ``categorize`` class method.
"""


class Categorizer:
order = 0
analysis_priority = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be named importance or prominence?
analysis_priority seems to imply this is only use to drive some analysis?

analysis_priority = "1"
file_category = "archive"
file_subcategory = "debian"
category_notes = "special type of archive"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just call this notes?

order = 0
analysis_priority = None
file_category = None
file_subcategory = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always enforce these to be lowercase: we have a mix or capital case or lowercase subcategories at the moment.

@classmethod
def categorize(cls, resource):
return extension_in(resource.extension, [".bzl"]) or (
name_in(resource.name, ["build.bazel"]) and resource.type == "file"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is resource_type == file needed here? Are these not all about files?


@classmethod
def categorize(cls, resource):
return extension_uppercase_in(resource.extension, [".S"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this upper case?

    * Add in postscan plugin entry points

Signed-off-by: John M. Horan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants