Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtaining aggregate information of hashed directory #7

Open
ruffsl opened this issue Apr 11, 2021 · 0 comments
Open

Obtaining aggregate information of hashed directory #7

ruffsl opened this issue Apr 11, 2021 · 0 comments

Comments

@ruffsl
Copy link

ruffsl commented Apr 11, 2021

It would be handy to obtain an aggregate of the information that dirhash used to compute the final hash for the root directory. For example, in the form of a ordered dictionary data structure that could be pretty printed to a yaml or json file. These printed files could be easily diffable, enabling use cases for logging or highlighting file tree or content changes to end users.

I see from the scantree examples, the .apply() function can be used for such recursive transforms:

hello_count_tree =  tree.apply(
    file_apply=lambda path: {
        'name': path.name,
        'count': sum([
            w.lower() == 'hello'
            for w in path.as_pathlib().read_text().split()
        ])
    },
    dir_apply=lambda dir_: {
        'name': dir_.path.name,
        'count': sum(e['count'] for e in dir_.entries),
        'sub_counts': [e for e in dir_.entries]
    },
)
from pprint import pprint
pprint(hello_count_tree)
{'count': 3,
 'name': 'dir',
 'sub_counts': [{'count': 2, 'name': 'file1.txt'},
                {'count': 1,
                 'name': 'd1',
                 'sub_counts': [{'count': 1, 'name': 'file2.txt'},
                                {'count': 0,
                                 'name': 'd2',
                                 'sub_counts': [{'count': 0,
                                                 'name': 'file3.txt'}]}]}]}

However, the root_node internally computed for this is not easily accessed without reimementing much of the library internals.

root_node = scantree(

_, dirhash_ = root_node.apply(file_apply=file_apply, dir_apply=dir_apply)

I'm also unsure yet how to leverage scantree with the RecursionPath class to render/print this aggregate data structure.

https://github.com/andhus/scantree/blob/25b51ca9be973389d671565de20cbb021871521d/src/scantree/_path.py#L16

Related: colcon/colcon-package-selection#44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant