Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove panic during NVML shutdown #70

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alexbagirov
Copy link
Contributor

We're using go-nvml in production and have concerns about receiving a panic() during NVML shutdown - this can kill other services depending on the library.

This PR removes this panic(), although I'm not sure which approach is better. My proposal is to return both Return and error where Return corresponds to nvml.Init() and nvml.Shutdown() calls, error corresponds to dlOpen() and dlClose().

@alexbagirov
Copy link
Contributor Author

@klueska Could you please make a review for this PR?

Signed-off-by: Alexandr Bagirov <[email protected]>
@elezar
Copy link
Member

elezar commented Aug 25, 2023

@alexbagirov would implementing SafeInit(), SafeInitWithFlags() and SafeShutdown() functions be feasible? Something along the lines of:

func SafeInit() (ret Return) {
	defer func() {
		if err := recover(); err != nil {
			ret = ERROR_UNKNOWN
		}
	}()
	return Init()
}

We could even make these the default implementations -- although the downside is that we lose the reason for the panic in this case. Adding some global error that can be queried by a client (similar to C.dlerror()) may be useful here.

Note that this kind of wrapper can be implemented at client side without modifying the bindings.

My major concern with modifying the function signatures is that this goes against the intent of having this package be bindings of the NVML API.

@klueska
Copy link
Contributor

klueska commented Aug 28, 2023

I am not in favor of having Init() and Shutdown() return an additional error paramater (as @elezar suggests it goes against the nature of these being 1:1 bindings of the C-based library).

That said, I would be fine doing away with the panic in favor of returning NVML_ERROR_UNKNOWN. Some information is obviously lost there, but I think it will be rare that this ever returns an error anyway.

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this change was originally proposed there have been a number of changes. We have reworked how the library is represented internally.

I think what I would like to see is extending the Library interface to include a GetErrors() map[string]error or GetLastError() error function that can be called instead of the panic. We would then set the error accordingly instead of panicing. We could add a top-level Option to allow users to revert to the existing behaviour (WithPanicOnShutdownErrors(bool)).

I could make these changes at some point in the future, but my capacity is currently limited. Would you be able to make these changes?

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alexbagirov

Since the basic ask is covered by #107, would you be willing to update this PR to implement something as described in #70 (review)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants