Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pseudo-Keys Part II: history metadata #738

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bpoldrack
Copy link
Collaborator

@bpoldrack bpoldrack commented Dec 6, 2024

Also closes #739

This patch implements retrieval of structured information from `git log`
and `git log --follow <path>`, including parsing of onyo's operations
record.
This is done in preparation of implementing pseudo-keys for
history-related metadata.
Currently operates on `Path` objects. In context of fully implementing
the pseudo-key and namespace concept, `Inventory.get_history()` will
need to change in order to take `Item` instances instead.
Copy link

codecov bot commented Dec 6, 2024

Codecov Report

Attention: Patch coverage is 95.92391% with 15 lines in your changes missing coverage. Please review.

Project coverage is 94.63%. Comparing base (cf88479) to head (15ebf32).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
onyo/lib/git.py 81.63% 8 Missing and 1 partial ⚠️
onyo/lib/parser.py 85.00% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #738      +/-   ##
==========================================
+ Coverage   94.57%   94.63%   +0.05%     
==========================================
  Files          73       74       +1     
  Lines        6400     6744     +344     
  Branches      544      641      +97     
==========================================
+ Hits         6053     6382     +329     
- Misses        254      265      +11     
- Partials       93       97       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@aqw aqw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another cool PR. :-D

I left some thoughts. Probably half of it is suggesting alternate approaches to using regex. IMO, those would help consolidate a decent amount of code.

onyo/lib/git.py Outdated Show resolved Hide resolved
onyo/lib/git.py Outdated Show resolved Hide resolved
"""
import datetime
import re
regex_author = re.compile(r"Author:\s+(?P<name>\b.*\b)\s+<(?P<email>[^\s]+)>$")
Copy link
Collaborator

@aqw aqw Dec 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, we're not going to get much of a win doing the compile here, since we're only using it once per function invocation. Thus, it's effectively the same as using match, which does a compile per run.

Moving the compilation out of the function would probably lead to a notable performance win.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you understand correctly.
Not sure it's worth it moving out, since this means it's always going to be compiled even if never used in an onyo call. But wanted to decouple from usage so it can easily be moved and the used expressions are in one place.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's worth it moving out, since this means it's always going to be compiled even if never used in an onyo call

I think that depends where you put it. If it's something that gets imported (similar to consts) then it wouldn't be.

Just something to keep in mind. We'll see when we get there. And we have benchmarking now. :-D

onyo/lib/git.py Outdated Show resolved Hide resolved
@@ -401,3 +404,85 @@ def check_ignore(self, ignore: Path, paths: list[Path]) -> list[Path]:

# TODO: git check-ignore --no-index --stdin (or via command call) -> lazy, check GitRepo.files once. (Same invalidation)
# -> OnyoRepo would use it to maintain a ignore list from a (top-level .onyoignore)? .onyo/ignore ? Both?

def _parse_log_output(self, lines: list[str]) -> dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal: this function can be simplified if all lines are put into one large string.

That string is then split into two pieces, splitting on the first entirely empty line.

The first half is the header. The second half is the message.

The regex already checks for "Author", "Commit", etc. Have them all run in one large try statement on the header text, and all get caught by the same exception handling.

Then the message is dealt with separately, and simply assigns the second half.

@@ -725,3 +725,66 @@ def raise_empty_keys(self, asset: dict | UserDict) -> None:
# Note, that DotNotationWrapper.keys() delivers strings (and has to).
# Hence, `None` as a key would show up here as 'None'.
raise ValueError("Keys are not allowed to be empty or None-values.")

def _parse_operations_record(self, record: list[str]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if much of this function should be an inverse of recorders.py, and stored in something named like decoder.py (I don't love the name, but I think you get my point).

Then just import what you need here and use.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was wondering how exactly to have both directions share some code/definitions.
So, not sure ow yet, but something like this, yes.

onyo/lib/git.py Outdated Show resolved Hide resolved
return parsed_record

def get_history(self, path: Path | None = None, n: int | None = None):
# TODO: This isn't quite right yet. operations records are defined in Inventory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a nice paragraph written up about this, but then I realized that I'm not sure if I have a clear understanding of the distinction between OnyoRepo and Inventory.

I see your desire to separate the layers, and not hop directly from Inventory to GitRepo. But what I don't have an answer for yet (and I hope you can help explain), is what is an Onyo Repo other than an Inventory?

The question that makes me ask this is: should OnyoRepo or Inventory be the one to parse the commit log dict, to extract and add info about the Inventory Operations?

Copy link
Collaborator Author

@bpoldrack bpoldrack Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea would be, that from Inventory's POV, OnyoRepo is the storage backend. If we were to add a DB-version of this, Inventory would refer to a backend rather than a repo and nothing else changes in Inventory or any of the command implementations. So, OnyoRepo implements what's specific about an onyo repository, utilizing the generic GitRepo. But it's still aware of filesystem and uses paths in the repo. Whereas Inventory is supposed to be more abstracted. It knows assets and inventory dirs (which happen to have a path-like label ATM, but soon will be Items). It doesn't care about how this is stored.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the comment suggests that:

  • there may be an Inventory.get_history(item: Item) that refers to an OnyoRepo.get_history(path: Path).
  • make Inventory.commit() not create the commit message yet, but pass the operations record separately to OnyoRepo and have only this layer decide, that this is going to be part of the commit message, so that the definition of how it's stored and decoded is at the same layer.

However, it's not super urgent/important. This entire distinction between Inventory and OnyoRepo helps to have a clear conceptualization of what premises a given piece of code operates on, but it probably won't be perfectly separated w/o an actual second backend anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. :-)

there may be an Inventory.get_history(item: Item) that refers to an OnyoRepo.get_history(path: Path).

This makes sense to me.

make Inventory.commit() not create the commit message yet, but pass the operations record separately to OnyoRepo and have only this layer decide, that this is going to be part of the commit message, so that the definition of how it's stored and decoded is at the same layer.

This... is less obvious to me. IMO, the messages should be stable, regardless of the backend. Thus it should be in Inventory. The GitRepo/OnyoRepo layer would be responsible for norming the message (for example, remove the leading 4 spaces added by git).

However, it's not super urgent/important.

+1

# or have sort of a proxy in OnyoRepo.
# -> May be: get_history(Item) in Inventory and get_history(path) in OnyoRepo.

for commit in self.repo.git.follow(path, n):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm on a bit of a regex trip, but wouldn't this be easier (and faster) to extract as the group after --- Inventory Operations ---\n?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was one big string, then yes. Not convinced of that yet, though.
However, code may still be simpler by looking for that line and get the rest via list slicing.

asset = Item(
some_key="some_value",
type="TYPE",
make="MAKER",
make="MAKE",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. These kill me every time. 😆

@bpoldrack bpoldrack force-pushed the pseudokeys branch 2 times, most recently from 9c60e9a to 8a58406 Compare December 16, 2024 12:06
When reading an asset's content from the temporary file for editing
(``onyo new --edit``/``onyo edit``), do not remove the `Item`'s
pseudo-keys. This happened, because we overwrite the ``.data`` attribute
of the ``Item`` directly. That's necessary for ruamel's comment
roundtrip, but we need to put the pseudo-key definitions back in
afterwards.
Failing to do so, resulted in unset pseudo-keys, which in case of
``onyo.path.relative`` then generated commit messages with ``None`` as
the asset path.

Closes psyinfra#739
Copy link
Collaborator

@aqw aqw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the most recent two commits. Really looking good. :-D

# rather than splitting lines and iterating over them when assembling output that belongs to a single
# commit, parsing the git data and then the operations record (also removes leading spaces for the commit
# message):
# --pretty='format:commit: %H%nAuthor: %an (%ae)%nCommitter: %cn (%ce)%nCommitDate: %cI%nMessage:%n%B'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. +1

# def fill_modified(self, what: str | None):
# pass#raise NotImplementedError
def fill_created(self, what: str | None):
"""Initializer for the 'onyo.was.modified' pseudo-keys.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'onyo.was.modified' -> 'onyo.was.created'

#
# def fill_modified(self, what: str | None):
# pass#raise NotImplementedError
def fill_created(self, what: str | None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose what -> key.

I havn't read the rest of the code, so I'm not sure how this function is being used yet, but I feel like this might not be the right way to do it.

I imagine a getter for the key, and then if the key is unset, running fill_created() (without an argument) which populates the fields, and then the getter looks up/returns the (now populated) key.

I'm not sold on the what (aka key) argument.


Fills in the entire sub-dict and returns the value specified by ``what``.

Note/TODO:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is not so much a TODO.

It's a note that says: official onyo commands do this right. It is possible for others to generate commits that are simply too insane for Onyo to anticipate. They need to take care. You have been warned.

@@ -125,7 +165,14 @@ def get_path_parent(self):

def get_path_file(self):
"""Initializer for the 'onyo.path.file' pseudo-key."""
return self._path.relative_to(self.repo.git.root) if self.repo and self['onyo.is.asset'] else None
if self.repo and self['onyo.path.absolute']:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why, but this function evokes very positive emotions.

This was definitely the right design decision. :-D

@@ -56,6 +62,48 @@ def delegate(self: Item, attribute: str, *args, **kwargs):
'onyo.is.empty': PseudoKey(description="Is the directory empty. <unset> if the item is not a directory.",
implementation=partial(delegate, attribute='is_empty')
),
'onyo.was.modified.hexsha': PseudoKey(description="SHA of the last commit that modified the item.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "last" is just vague enough to be problematic. In this case, you mean "last" as in "most recent".

description="Email of the committer of the last commit that modified the item.",
implementation=partial(delegate, attribute='fill_modified', what='committer.email')
),
'onyo.was.created.hexsha': PseudoKey(description="SHA of the last commit that created the item.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps drop "last" here? I see your programmatic point. Onyo is searching for the most recent commit... but if I would read this, then I would want to read the 1-2 sentences that explain the corner cases this is avoiding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: onyo new generates commit message new [1]: None (sometimes)
2 participants