Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add log-formatting to the otel-LogHandler #3673

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

a-recknagel
Copy link

@a-recknagel a-recknagel commented Feb 5, 2024

Description

Added support for log message formatters to opentelemetry.sdk._logs._internal.LogHandler.

Fixes #3664

Type of change

  • added a self.format call on the original LogRecord to the code in the LogHandler's emit method, the result is added as the body field of the otel-LogRecord.

  • removed code that filtered out all standard-LogRecord fields from getting added to the otel-LogRecord. The only values that still get filtered are those that are considered "private" by the LogRecord API (see a note "You shouldn’t need to format this yourself" in https://docs.python.org/3/library/logging.html#logrecord-attributes) or only get used to cache results.

  • Bug fix (non-breaking change which fixes an issue)

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

  • This change requires a documentation update

The documentation should include an example which shows how exactly adding a formatter changes the result of what gets emitted to the receiver. I think there is some element of surprise here, because the standard procedure is that, if a formatter was set, the handler emits a string whose content is wholly dictated by said formatter.

The breaking change is due to me removing a check where a non-string msg attribute in the original LogRecord will be attached to the otel-LogRecord as-is. Now, the format call will always return a string, even if no formatter was set and args is empty. I believe this is the right behavior, makes the codebase easier to maintain. Code that looked like this

log.info({"message": "foo", "context": "bar"})

could before this change be interpreted as a complex data structure by some log-renderer. I'd argue that the "right way of logging" (even without the changes in this PR) would be the following:

log.info("foo", extra={"context": "bar"})

How Has This Been Tested?

I wrote three new unit tests, and updated a few other ones that were broken after the change.

Does This PR Require a Contrib Repo Change?

  • No

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Open Discussion Points

  • change type of opentelemetry.sdk._logs._internal.LogRecord.body to Optional[str] now? I believe not as it would violate the LogRecordAPI, but would like to make sure
  • is the breaking change worth it, and the migration path reasonable enough?
  • what exactly should be documented?

@a-recknagel a-recknagel requested a review from a team as a code owner February 5, 2024 12:15
Copy link

linux-foundation-easycla bot commented Feb 5, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

…before emitting it as the body, and retain all meaningful attributes of the original logrecord.
@bogdandrutu
Copy link
Member

bogdandrutu commented Feb 5, 2024

I think this is against the semantic conventions. To understand what I mean, see #3675 in otel we have standard semantic conventions for things like lineno and we should follow them and not add everything as python defined them.

@a-recknagel
Copy link
Author

@bogdandrutu Yeah it is. Scanning over SpanAttributes, there probably are a bunch more like thread.id. Is it ok if I just create an issue for it, something like "convert logrecord attribute into their correct semantic convention names"?

@bogdandrutu
Copy link
Member

@bogdandrutu Yeah it is. Scanning over SpanAttributes, there probably are a bunch more like thread.id. Is it ok if I just create an issue for it, something like "convert logrecord attribute into their correct semantic convention names"?

It is ok to create an issue, but would be worried merging this PR since then it becomes a breaking change.

@a-recknagel
Copy link
Author

In that case I'll try to find all semantic conventions that exist for the standard logrecord attributes and add them with the right names at least. It might actually be just the code. and thread. ones.

@a-recknagel
Copy link
Author

Ok, I've updated my fork with your changes and suggestions.

@lzchen lzchen added this to In progress in Logging RC via automation Mar 7, 2024
@a-recknagel
Copy link
Author

@bogdandrutu @mariojonke if it's not a bother, I'd like to ask if the PR in its current state could be approved, or if there is something left that I should improve on. Thank you for your time.

@xrmx xrmx mentioned this pull request Mar 14, 2024
11 tasks
Copy link
Member

@srikanthccv srikanthccv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, I have concerns about the extra attributes being added that don't have sem conv

CHANGELOG.md Outdated Show resolved Hide resolved
Comment on lines +59 to +64
"args",
"msg",
"message",
"stack_info",
"exc_info",
"exc_text",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should still discard the attributes like taskName, asctime etc.. that don't have the mapping in sematic conventions. Not that they are not useful but we only emit the attributes that sem convs defined in SDK

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an overly strong opinion on this, but sway towards emit all, or at the very least some of them.

I can't find the source right now, but I think I've read that OTEL implementations should prioritize being written idiomatic within the language the implementation is written in over sticking to details of the general OTEL guidelines. I'd consider log-record attributes to fall into this category. As a python developer, I expect attributes like lineno, module, and funcName to exist on a logrecord.

Also, custom attributes already do get emitted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think language idiomatic implementation suggestions should interpreted to associate with the attributes. This is an independent area. We are emitting some of them already by mapping to semconv attributes. And will emit the rest too but that will be after there is some spec guidelines about such cases.

Copy link
Member

@srikanthccv srikanthccv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, question about the excluded attributes

@amurauyou
Copy link

Thank you all for contributing to the fix. Having formatting work properly is really important.

Could this change be reviewed and prioritized again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Logging RC
In progress
Development

Successfully merging this pull request may close these issues.

opentelemetry.sdk._logs.LoggingHandler doesn't format
5 participants