Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fiber sigsev #686

Open
badsgahhl opened this issue Dec 22, 2024 · 19 comments
Open

fiber sigsev #686

badsgahhl opened this issue Dec 22, 2024 · 19 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@badsgahhl
Copy link

2024-12-22T16:04:52.251 app[0801021a02e638] fra [info] panic: runtime error: invalid memory address or nil pointer dereference

2024-12-22T16:04:52.252 app[0801021a02e638] fra [info] [signal SIGSEGV: segmentation violation code=0x1 addr=0x200 pc=0x86058e]

2024-12-22T16:04:52.252 app[0801021a02e638] fra [info] goroutine 44 [running]:

2024-12-22T16:04:52.252 app[0801021a02e638] fra [info] github.com/valyala/fasthttp.(*RequestCtx).Done(...)

2024-12-22T16:04:52.252 app[0801021a02e638] fra [info] /go/pkg/mod/github.com/valyala/[email protected]/server.go:2747

2024-12-22T16:04:52.252 app[0801021a02e638] fra [info] github.com/valyala/fasthttp.(*RequestCtx).Err(0x0?)

2024-12-22T16:04:52.253 app[0801021a02e638] fra [info] /go/pkg/mod/github.com/valyala/[email protected]/server.go:2761 +0xe

2024-12-22T16:04:52.253 app[0801021a02e638] fra [info] github.com/danielgtaylor/huma/v2/adapters/humafiber.(*contextAdapter).Err(0xc0008bef48?)

2024-12-22T16:04:52.253 app[0801021a02e638] fra [info] /go/pkg/mod/github.com/danielgtaylor/huma/[email protected]/adapters/humafiber/humafiber.go:36 +0x1d

2024-12-22T16:04:52.254 app[0801021a02e638] fra [info] database/sql.(*Rows).awaitDone(0xc0000b54a0, {0x2ddd678, 0xc0008b16e0}, {0x0, 0x0}, {0x2ddce10, 0xc0005ed9f0})

2024-12-22T16:04:52.254 app[0801021a02e638] fra [info] /usr/local/go/src/database/sql/sql.go:3012 +0x1e8

2024-12-22T16:04:52.254 app[0801021a02e638] fra [info] created by database/sql.(*Rows).initContextClose in goroutine 35

2024-12-22T16:04:52.254 app[0801021a02e638] fra [info] /usr/local/go/src/database/sql/sql.go:2988 +0x150

We updated one of our apps to 2.27 from 2.26 and now get a SIGSEGV. We cant really reproduce this behavior locally but we use prefork true in the staging env this app is currently running.

@badsgahhl
Copy link
Author

So locally with fiber I get this after a while and instantly with prefork true.

@danielgtaylor
Copy link
Owner

@badsgahhl thanks for reporting. My guess is that it must be related to the changes in #659. I don't personally use fiber so any help would be appreciated. @excavador any ideas on this one?

@danielgtaylor danielgtaylor added bug Something isn't working help wanted Extra attention is needed labels Dec 22, 2024
@excavador
Copy link
Contributor

@badsgahhl thanks for reporting. My guess is that it must be related to the changes in #659. I don't personally use fiber so any help would be appreciated. @excavador any ideas on this one?

I will take a look closer tomorrow and fix it

@excavador
Copy link
Contributor

@badsgahhl thanks for reporting. My guess is that it must be related to the changes in #659. I don't personally use fiber so any help would be appreciated. @excavador any ideas on this one?

I will take a look closer tomorrow and fix it

@badsgahhl @danielgtaylor I am sorry for delay. I am looking now.

@excavador
Copy link
Contributor

@badsgahhl

I understand the issue, but I need to rethink how to fix it.

fasthttp has a precise way of handling fasthttp.RequestCtx and I make implementation incompatible with it.

It sounds like you are using Context.Err() after exit from the HTTP handler, and that is the reason why you catch this issue.
fasthttp prohibits user RequestCtx after returning from fasthttp Handler, but you are NOT dealing with fasthttp.Request, you are dealing with normal context.Context and huma - and huma and context.Context does not have limitations like that.

I will rethink how to adjust my version to avoid a crash like this on one hands, and race conditions with fasthttp on other hands

CC @danielgtaylor

@excavador
Copy link
Contributor

excavador commented Dec 30, 2024

@danielgtaylor BTW, refer to *fiber.Ctx is prohibited by fiber (because of fasthttp.RequestCtx limitations)
https://docs.gofiber.io/#zero-allocation
image

So, we have issue already, even before my fix.

type fiberCtx struct {
	op     *huma.Operation
	orig   *fiber.Ctx
	status int
}

I will rework adapter there to avoid this bug and any similar issues.

Just keep you informed about my discoveries

excavador added a commit to excavador/huma that referenced this issue Dec 30, 2024
@excavador
Copy link
Contributor

@badsgahhl could you please try to use code from my Pull Request and confirm that issue is fixes?

@badsgahhl
Copy link
Author

badsgahhl commented Dec 30, 2024

Tried it and got the panic

panic: handler was done already
goroutine 140 [running]:
github.com/danielgtaylor/huma/v2/adapters/humafiber.(*fiberCtx).orig(...)
	C:/Users/Pascal/go/pkg/mod/github.com/danielgtaylor/huma/[email protected]/adapters/humafiber/humafiber.go:51
github.com/danielgtaylor/huma/v2/adapters/humafiber.(*fiberCtx).Value(0xc0000ae070?, {0x1b40ca0, 0x3d02ed0})
	C:/Users/Pascal/go/pkg/mod/github.com/danielgtaylor/huma/[email protected]/adapters/humafiber/humafiber.go:70 +0xa8
context.Cause({0x2d924d8, 0xc00024bcb0})
	C:/Users/Pascal/go/pkg/mod/golang.org/[email protected]/src/context/context.go:284 +0x45
context.(*cancelCtx).propagateCancel.func2()
	C:/Users/Pascal/go/pkg/mod/golang.org/[email protected]/src/context/context.go:512 +0xc9
created by context.(*cancelCtx).propagateCancel in goroutine 107
	C:/Users/Pascal/go/pkg/mod/golang.org/[email protected]/src/context/context.go:509 +0x3e5
2024/12/30 17:43:36.379278 prefork.go:81: [Error] prefork: failed to kill child: invalid argument
panic: exit status 2

goroutine 1 [running]:
main.main()

excavador added a commit to excavador/huma that referenced this issue Dec 30, 2024
@excavador
Copy link
Contributor

excavador commented Dec 30, 2024

panic: handler was done already

@badsgahhl perfect!

What does it mean: you try to use "huma.Context" or "Context()" from huma.Context OUTSIDE handler.

According to documentation of Fiber and FastHttp server it is prohibited

Take a look to this link https://docs.gofiber.io/#zero-allocation
image

Other words: this is EXPECTED behavior

@excavador
Copy link
Contributor

@badsgahhl if you show me your code, I could explain where the problem with it and how to change it

@badsgahhl
Copy link
Author

badsgahhl commented Dec 30, 2024

We actually don't use anything beyond the handler. And also I still only get the error reproductible while using Fiber Preforking.
Without preforking I don't have these issues directly now.

@excavador
Copy link
Contributor

We actually don't use anything beyond the handler.

Unless I missunderstand something important, your stacktrace is evidence that you are actually try to use it outside handler!

Please, take a look to your stack trace.

goroutine 140 [running]:
github.com/danielgtaylor/huma/v2/adapters/humafiber.(*fiberCtx).orig(...)
	C:/Users/Pascal/go/pkg/mod/github.com/danielgtaylor/huma/[email protected]/adapters/humafiber/humafiber.go:51
github.com/danielgtaylor/huma/v2/adapters/humafiber.(*fiberCtx).Value(0xc0000ae070?, {0x1b40ca0, 0x3d02ed0})
	C:/Users/Pascal/go/pkg/mod/github.com/danielgtaylor/huma/[email protected]/adapters/humafiber/humafiber.go:70 +0xa8
context.Cause({0x2d924d8, 0xc00024bcb0})
	C:/Users/Pascal/go/pkg/mod/golang.org/[email protected]/src/context/context.go:284 +0x45
context.(*cancelCtx).propagateCancel.func2()
	C:/Users/Pascal/go/pkg/mod/golang.org/[email protected]/src/context/context.go:512 +0xc9
created by context.(*cancelCtx).propagateCancel in goroutine 107
	C:/Users/Pascal/go/pkg/mod/golang.org/[email protected]/src/context/context.go:509 +0x3e5
2024/12/30 17:43:36.379278 prefork.go:81: [Error] prefork: failed to kill child: invalid argument
panic: exit status 2

Inside stack trace I could see "humafiber.go:51" and "humafiber.go:70", but I can not see something like "humafiber.go:207"

What it is important?
Take a look to code from my PR - link

func (a *fiberAdapter) Handle(op *huma.Operation, handler func(huma.Context)) {
	// Convert {param} to :param
	path := op.Path
	path = strings.ReplaceAll(path, "{", ":")
	path = strings.ReplaceAll(path, "}", "")
	a.router.Add(op.Method, path, func(c *fiber.Ctx) error {
		var ctx, cancel = context.WithCancel(context.Background())
		var fc = &fiberCtx{
			op:              op,
			unsafeFiberCtx:  c,
			unsafeGolangCtx: ctx,
		}
		defer func() {
			cancel()
			fc.unsafeFiberCtx = nil
		}()
		handler(fc)
		return nil
	})
}

Our point to interest the ling number 203-208

		defer func() {
			cancel()
			fc.unsafeFiberCtx = nil
		}()
		handler(fc)
		return nil

We do not have any of it inside stacktrace!

What does it mean?

  • you cache huma.Context or huma.Context => Context() somewhere outside of the handler, and try to call something AFTER you exit your handler.

@excavador
Copy link
Contributor

CC @badsgahhl ^

@badsgahhl
Copy link
Author

badsgahhl commented Dec 30, 2024

Just from looking at the handlers I don't see anything that would use the context more than it should be used. All huma handlers pass down the context to the database etc. just as normal. The only thing that really always causes crashes now is if huma requests and fiber request are sent in parallel while preforking is active.

So to explain: The application currently transitions to huma handlers but uses huma handlers and legacy fiber handlers in parallel. If a page calls both of them in parallel (e.g. react-query tab switch) and preforking is enabled then the application crashes.

Otherwise I really don't see any issues there. Otherwise we should have noticed some weird behavior some versions before in my opinion.

@excavador
Copy link
Contributor

excavador commented Dec 30, 2024

@badsgahhl I took a closer look to implementation of golang Context and this is mind-blowing - lines 509-512
https://cs.opensource.google/go/go/+/refs/tags/go1.23.4:src/context/context.go;l=509

	go func() {
		select {
		case <-parent.Done():
			child.cancel(false, parent.Err(), Cause(parent))
		case <-child.Done():
		}
	}()

Sounds like cancelation of the parent context try to use children context, and context.Cause call context.Value and we have this issue.

I am rethinking your stack-trace and try to wrap my head about this really weird consequences to figure out proper fix.

I will let you know when I figure it out

@excavador
Copy link
Contributor

@badsgahhl could you please try updated version of my PR?

@badsgahhl
Copy link
Author

Works 👍🏼 with the small test case I have. Will try it out some time.

Just some sidenote:

// Cause returns a non-nil error explaining why c was canceled.
// The first cancellation of c or one of its parents sets the cause.
// If that cancellation happened via a call to CancelCauseFunc(err),
// then [Cause] returns err.
// Otherwise Cause(c) returns the same value as c.Err().
// Cause returns nil if c has not been canceled yet.
func Cause(c Context) error {
	if cc, ok := c.Value(&cancelCtxKey).(*cancelCtx); ok {
		cc.mu.Lock()
		defer cc.mu.Unlock()
		return cc.cause
	}
	// There is no cancelCtxKey value, so we know that c is
	// not a descendant of some Context created by WithCancelCause.
	// Therefore, there is no specific cause to return.
	// If this is not one of the standard Context types,
	// it might still have an error even though it won't have a cause.
	return c.Err()
}

As you mentioned this causes the error: child.cancel(false, parent.Err(), Cause(parent))

Is it really correct to return nil here for the &cancelCtxKey. Just saw that line during debugging. No clue if this could break something up tbh.

@excavador
Copy link
Contributor

@badsgahhl thank you for letting me know!

In general, this is the big surprise to me, how context.Context is working actually!

In our application - https://docs.truvity.com/ - we are heavily use context.Value, and there is the reason of the original patch from my side.

The performance-related behavior of fiber and fasthttp + separation to "Context" and "UserContext" inside fasthttp make the proper implementation of "Context" inside adapter extremely complicated!

Please let me know, if you will find any other issues, I will be glad to fix my code

@danielgtaylor I think the PR is good to go

@excavador
Copy link
Contributor

@badsgahhl if it was the question

Is it really correct to return nil here for the &cancelCtxKey

then correct, because "parent" context where is my context from adapter, and it propagate cancel to children context. In my case it will be the usual normal "Cancel" without any "Cause", and because of that returing nil is okay - it will pick up "Cause" from chilren context, not from the my context from adapter

Unless I missunderstood something important :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants