-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pseudo-Keys Part II: history metadata #738
base: main
Are you sure you want to change the base?
Conversation
This patch implements retrieval of structured information from `git log` and `git log --follow <path>`, including parsing of onyo's operations record. This is done in preparation of implementing pseudo-keys for history-related metadata. Currently operates on `Path` objects. In context of fully implementing the pseudo-key and namespace concept, `Inventory.get_history()` will need to change in order to take `Item` instances instead.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #738 +/- ##
==========================================
+ Coverage 94.57% 94.63% +0.05%
==========================================
Files 73 74 +1
Lines 6400 6744 +344
Branches 544 641 +97
==========================================
+ Hits 6053 6382 +329
- Misses 254 265 +11
- Partials 93 97 +4 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another cool PR. :-D
I left some thoughts. Probably half of it is suggesting alternate approaches to using regex. IMO, those would help consolidate a decent amount of code.
""" | ||
import datetime | ||
import re | ||
regex_author = re.compile(r"Author:\s+(?P<name>\b.*\b)\s+<(?P<email>[^\s]+)>$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, we're not going to get much of a win doing the compile here, since we're only using it once per function invocation. Thus, it's effectively the same as using match
, which does a compile per run.
Moving the compilation out of the function would probably lead to a notable performance win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you understand correctly.
Not sure it's worth it moving out, since this means it's always going to be compiled even if never used in an onyo call. But wanted to decouple from usage so it can easily be moved and the used expressions are in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it's worth it moving out, since this means it's always going to be compiled even if never used in an onyo call
I think that depends where you put it. If it's something that gets imported (similar to consts) then it wouldn't be.
Just something to keep in mind. We'll see when we get there. And we have benchmarking now. :-D
@@ -401,3 +404,85 @@ def check_ignore(self, ignore: Path, paths: list[Path]) -> list[Path]: | |||
|
|||
# TODO: git check-ignore --no-index --stdin (or via command call) -> lazy, check GitRepo.files once. (Same invalidation) | |||
# -> OnyoRepo would use it to maintain a ignore list from a (top-level .onyoignore)? .onyo/ignore ? Both? | |||
|
|||
def _parse_log_output(self, lines: list[str]) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal: this function can be simplified if all lines are put into one large string.
That string is then split into two pieces, splitting on the first entirely empty line.
The first half is the header. The second half is the message.
The regex already checks for "Author", "Commit", etc. Have them all run in one large try statement on the header text, and all get caught by the same exception handling.
Then the message is dealt with separately, and simply assigns the second half.
onyo/lib/inventory.py
Outdated
@@ -725,3 +725,66 @@ def raise_empty_keys(self, asset: dict | UserDict) -> None: | |||
# Note, that DotNotationWrapper.keys() delivers strings (and has to). | |||
# Hence, `None` as a key would show up here as 'None'. | |||
raise ValueError("Keys are not allowed to be empty or None-values.") | |||
|
|||
def _parse_operations_record(self, record: list[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if much of this function should be an inverse of recorders.py
, and stored in something named like decoder.py
(I don't love the name, but I think you get my point).
Then just import what you need here and use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was wondering how exactly to have both directions share some code/definitions.
So, not sure ow yet, but something like this, yes.
onyo/lib/inventory.py
Outdated
return parsed_record | ||
|
||
def get_history(self, path: Path | None = None, n: int | None = None): | ||
# TODO: This isn't quite right yet. operations records are defined in Inventory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a nice paragraph written up about this, but then I realized that I'm not sure if I have a clear understanding of the distinction between OnyoRepo and Inventory.
I see your desire to separate the layers, and not hop directly from Inventory to GitRepo. But what I don't have an answer for yet (and I hope you can help explain), is what is an Onyo Repo other than an Inventory?
The question that makes me ask this is: should OnyoRepo or Inventory be the one to parse the commit log dict, to extract and add info about the Inventory Operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea would be, that from Inventory
's POV, OnyoRepo
is the storage backend. If we were to add a DB-version of this, Inventory
would refer to a backend
rather than a repo
and nothing else changes in Inventory
or any of the command implementations. So, OnyoRepo
implements what's specific about an onyo repository, utilizing the generic GitRepo
. But it's still aware of filesystem and uses paths in the repo. Whereas Inventory
is supposed to be more abstracted. It knows assets and inventory dirs (which happen to have a path-like label ATM, but soon will be Item
s). It doesn't care about how this is stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the comment suggests that:
- there may be an
Inventory.get_history(item: Item)
that refers to anOnyoRepo.get_history(path: Path)
. - make
Inventory.commit()
not create the commit message yet, but pass the operations record separately toOnyoRepo
and have only this layer decide, that this is going to be part of the commit message, so that the definition of how it's stored and decoded is at the same layer.
However, it's not super urgent/important. This entire distinction between Inventory
and OnyoRepo
helps to have a clear conceptualization of what premises a given piece of code operates on, but it probably won't be perfectly separated w/o an actual second backend anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. :-)
there may be an Inventory.get_history(item: Item) that refers to an OnyoRepo.get_history(path: Path).
This makes sense to me.
make Inventory.commit() not create the commit message yet, but pass the operations record separately to OnyoRepo and have only this layer decide, that this is going to be part of the commit message, so that the definition of how it's stored and decoded is at the same layer.
This... is less obvious to me. IMO, the messages should be stable, regardless of the backend. Thus it should be in Inventory
. The GitRepo/OnyoRepo layer would be responsible for norming the message (for example, remove the leading 4 spaces added by git).
However, it's not super urgent/important.
+1
onyo/lib/inventory.py
Outdated
# or have sort of a proxy in OnyoRepo. | ||
# -> May be: get_history(Item) in Inventory and get_history(path) in OnyoRepo. | ||
|
||
for commit in self.repo.git.follow(path, n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I'm on a bit of a regex trip, but wouldn't this be easier (and faster) to extract as the group after --- Inventory Operations ---\n
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was one big string, then yes. Not convinced of that yet, though.
However, code may still be simpler by looking for that line and get the rest via list slicing.
asset = Item( | ||
some_key="some_value", | ||
type="TYPE", | ||
make="MAKER", | ||
make="MAKE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. These kill me every time. 😆
9c60e9a
to
8a58406
Compare
When reading an asset's content from the temporary file for editing (``onyo new --edit``/``onyo edit``), do not remove the `Item`'s pseudo-keys. This happened, because we overwrite the ``.data`` attribute of the ``Item`` directly. That's necessary for ruamel's comment roundtrip, but we need to put the pseudo-key definitions back in afterwards. Failing to do so, resulted in unset pseudo-keys, which in case of ``onyo.path.relative`` then generated commit messages with ``None`` as the asset path. Closes psyinfra#739
8a58406
to
15ebf32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the most recent two commits. Really looking good. :-D
# rather than splitting lines and iterating over them when assembling output that belongs to a single | ||
# commit, parsing the git data and then the operations record (also removes leading spaces for the commit | ||
# message): | ||
# --pretty='format:commit: %H%nAuthor: %an (%ae)%nCommitter: %cn (%ce)%nCommitDate: %cI%nMessage:%n%B' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. +1
# def fill_modified(self, what: str | None): | ||
# pass#raise NotImplementedError | ||
def fill_created(self, what: str | None): | ||
"""Initializer for the 'onyo.was.modified' pseudo-keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'onyo.was.modified'
-> 'onyo.was.created'
# | ||
# def fill_modified(self, what: str | None): | ||
# pass#raise NotImplementedError | ||
def fill_created(self, what: str | None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose what
-> key
.
I havn't read the rest of the code, so I'm not sure how this function is being used yet, but I feel like this might not be the right way to do it.
I imagine a getter for the key, and then if the key is unset, running fill_created()
(without an argument) which populates the fields, and then the getter looks up/returns the (now populated) key.
I'm not sold on the what
(aka key
) argument.
|
||
Fills in the entire sub-dict and returns the value specified by ``what``. | ||
|
||
Note/TODO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is not so much a TODO.
It's a note that says: official onyo commands do this right. It is possible for others to generate commits that are simply too insane for Onyo to anticipate. They need to take care. You have been warned.
@@ -125,7 +165,14 @@ def get_path_parent(self): | |||
|
|||
def get_path_file(self): | |||
"""Initializer for the 'onyo.path.file' pseudo-key.""" | |||
return self._path.relative_to(self.repo.git.root) if self.repo and self['onyo.is.asset'] else None | |||
if self.repo and self['onyo.path.absolute']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why, but this function evokes very positive emotions.
This was definitely the right design decision. :-D
@@ -56,6 +62,48 @@ def delegate(self: Item, attribute: str, *args, **kwargs): | |||
'onyo.is.empty': PseudoKey(description="Is the directory empty. <unset> if the item is not a directory.", | |||
implementation=partial(delegate, attribute='is_empty') | |||
), | |||
'onyo.was.modified.hexsha': PseudoKey(description="SHA of the last commit that modified the item.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "last" is just vague enough to be problematic. In this case, you mean "last" as in "most recent".
description="Email of the committer of the last commit that modified the item.", | ||
implementation=partial(delegate, attribute='fill_modified', what='committer.email') | ||
), | ||
'onyo.was.created.hexsha': PseudoKey(description="SHA of the last commit that created the item.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps drop "last" here? I see your programmatic point. Onyo is searching for the most recent commit... but if I would read this, then I would want to read the 1-2 sentences that explain the corner cases this is avoiding.
Also closes #739