Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficult to use one session per request (e.g. gateway request) #567

Open
Jorropo opened this issue Jan 16, 2024 · 7 comments
Open

Difficult to use one session per request (e.g. gateway request) #567

Jorropo opened this issue Jan 16, 2024 · 7 comments
Labels
dif/hard Having worked on the specific codebase is important effort/days Estimated to take multiple days, but less than a week P1 High: Likely tackled by core team if no one steps up

Comments

@Jorropo
Copy link
Contributor

Jorropo commented Jan 16, 2024

Background

The point of data transfer sessions is to handle the logic around discovering peers that might have the data you need and which peers to ask for which subset of the data. It does this by making assumptions/guesses around the association of various requests (i.e. these 10 block requests are all associated with the same HTTP gateway request). Historically getting the plumbing right here to actually have one session in code per logical session has been a pain resulting in multiple sessions being created within single requests (e.g. gateway requests).

On top of this long standing issue we're also seeing some issues with blockservice wrappers not working nicely with sessions ipfs-shipyard/nopfs#34 due to how the abstraction layers are entangled.

Options

There have been a few recent and older proposals, most of them involve embedding session information in the context (e.g. ipfs/kubo#7198, #549 although it's incomplete on it's own).

Some of the concrete approaches some of them (along with a brief description) are:

  1. blockservice: move session handling as part of the interface #563 -> Put the concept of sessions in the blockservice interface
  2. blockservice: move to single unique struct and add WithContentBlocker option #561 -> Remove the abstraction of a blockservice interface and whenever someone needs more functionality it means more options passed into the boxo.BlockService struct's constructor
  3. Context-Scoped Sessions kubo#7198 -> Make the concept of sessions global, but basically just a uint64, individual layers of the stack (e.g. bitswap or something higher level if that emerges) can then choose to locally associate state with
  4. Do nothing (pre blockservice: add NewSessionContext and EmbedSessionInContext #549) -> Make the user create a shallow clone of the stack of components for every session

Analysis

The main difference between #561 and ipfs/kubo#7198 is a tradeoff between cleanup complexity and adding more propagation of sessions everywhere.

  • blockservice: move session handling as part of the interface #563
    • When the context gets cleaned up by Go's GC the session object will get cleaned up as well
    • Sessions need to be handled by: Session consumers (e.g. bitswap), Session creators (e.g. gateway), everything in between (e.g. boxo.blockservice, nopfs, dagservice, fetcher, etc.)
  • Context-Scoped Sessions kubo#7198
    • Needs to manually manage and synchronise state (uint64*session) in all session consumers
    • A goroutine would need to get spun up to handle cleanup on context cancellation.
    • Sessions need to be handled by: Session consumers (e.g. bitswap), Session creators (e.g. gateway)

Let's decide which change to make going forward.

@Jorropo Jorropo added need/triage Needs initial labeling and prioritization and removed need/triage Needs initial labeling and prioritization labels Jan 16, 2024
@Jorropo Jorropo self-assigned this Jan 16, 2024
@Jorropo Jorropo moved this to 🏃‍♀️ In Progress in IPFS Shipyard Team Jan 16, 2024
@aschmahmann aschmahmann changed the title gateway recreates many sessions for the same requests Difficult to use one session per request (e.g. gateway request) Jan 18, 2024
@aschmahmann
Copy link
Contributor

To me ipfs/kubo#7198 seems more reasonable than #563 given:

  • It requires minimal (no?) breaking changes
  • It's less work on people implementing wrapping layers that don't care about sessions
  • We can start deprecating (and removing) the sessions logic in places like the dagservice, fetcher, etc. and preserving context creation for root-level requests
  • It opens the door for us to use either fuller sessions or sessions with explicit single-goroutine cleanup via a Close() function if needed

I see the downsides as:

  • By choosing not to really shove the concept of sessions in people's faces they might not realize they're supposed to use one which would hurt their performance and they wouldn't realize
    • Even now people don't always call NewSession on the blockservice, dagservice, etc. so if we wanted to have a "with guardrails" API it would probably need to be a separate one anyhow
  • The cost of an async goroutine being spun up and down for implementers who won't be using a long lived goroutine anyway
    • Right now it doesn't matter as bitswap is the only consumer and it uses a long lived goroutine per session)

@Jorropo
Copy link
Contributor Author

Jorropo commented Jan 18, 2024

It requires minimal (no?) breaking changes

We would need to remove NewSession and explicit session handling in various pieces of code:

(e.g. boxo.blockservice, nopfs, dagservice, fetcher, etc.)

We could not remove it, and either:


everything in between (e.g. boxo.blockservice, nopfs, dagservice, fetcher, etc.)

I don't exactly agree with this,
Right now this is the case of how things are because the API didn't allowed to do any other way in the past.

#563 let you do both or either one
either explicitly passing around sessions.
and shoving the session in the context,
it even let you do both at once (calling NewSession with a session already in the context does not create a new session).

The only real piece of work that we force to be duplicated is for wrappers on the blockservice like nopfs.
However it also becomes easy to do, no complex state management in global maps.
In the very particular nopfs case it could hoist the GetBlock and GetBlocks implementation to a:

type nopedBlockGetter struct{
 g blockservice.BlockGetter
 blocker *nopfs.Blocker
}

Then nopfs's blockservice implementation would embed this struct.
While the session would literally just be this struct.

Even if it would split librairies which use implicit vs the ones that use explicit sessions.
This would not be an issue since you can always use implicit passing.

@aschmahmann
Copy link
Contributor

We would need to remove NewSession and explicit session handling in various pieces of code:

IIUC we wouldn't need to. You could mostly leave functions like blockservice.NewSession and merkledag.NewSession alone. For example:

func NewSession(ctx context.Context, bs BlockService) *Session {

Could even be left as is with Session's sesctx field being used to grab the exchange.Fetcher.

(e.g. boxo.blockservice, nopfs, dagservice, fetcher, etc.)

We could not remove it, and either:

Same here. IIUC we should be able to mostly leave things as they are (#549 mostly demonstrated that already in that despite being able to rely on contexts the dagservice's NewSession should still work, right?).

The only real piece of work that we force to be duplicated is for wrappers on the blockservice like nopfs.

nopfs is a blockservice wrapper that is itself a blockservice, but this applies to things like the dagservice too. We currently have:

func (n *dagService) Session(ctx context.Context) format.NodeGetter {
session := bserv.NewSession(ctx, n.Blocks)

Which allows consumers that need a dagservice to call dsrv.Session(ctx) or merkledag.NewSession(ctx, dsrv). But what do they do if they want to embed the session in the context? Looks like either:

  1. They have to have access to both the blockservice and dagservice and call blockservice.ContextWithSession before taking that context and giving it to the dagservice
  2. There needs to be something like dagservice.ContextWithSession to be used with the dagservice

Neither of which feel great.

@Jorropo
Copy link
Contributor Author

Jorropo commented Jan 18, 2024

Could even be left as is with Session's sesctx field being used to grab the exchange.Fetcher.

But why bother with the uint64 then ?
Anyone wraping the blockservice will still to know about .(BoundedBlockService) and future ones like #536 right now.

They have to have access to both the blockservice and dagservice and call blockservice.ContextWithSession before taking that context and giving it to the dagservice

SGTM

@aschmahmann
Copy link
Contributor

But why bother with the uint64 then ?

blockservice.NewSession can be deprecated and refer to a wrapper *Session object while anything else can use the (wrapped) uint64 identifier. Given the actual session information (today) lives in Bitswap having extra Session objects in merkledag and blockservice, and fetcher (as we do today) seems extraneous and mostly a pain.

Anyone wraping the blockservice will still to know about .(BoundedBlockService) and future ones like #536 right now.

For both of those examples that doesn't seem correct. For both the allowlist and provider "addons" something like the dagservice doesn't care. It just takes the blockservice as an argument and consumers of the dagservice don't need to know anything about either of those options.

SGTM

Good to know that's the option you'd prefer if we went with #563. To me this seems to force leaky abstractions though. If the dagservice/fetcher/etc. is meant to abstract over blocks, but you still need the blockservice available to meaningfully use the abstraction your abstraction isn't really working.

@aschmahmann
Copy link
Contributor

Had a sync with @Jorropo earlier and it seems like there's an in between that gets most of the benefits of both approaches.

Putting a map as the value in the context rather than a uint64 allows the consumers of sessions to let Go GC remove the session once the context is GC'd (i.e. without requiring something like context.AfterFunc and state management), while also allowing the use of a higher level package to handle the sessions that won't require propagating WithContext() methods through all the middleware interfaces between the session producer (e.g. a gateway request) and the sessions consumer (e.g. bitswap).

Going to try moving forward with a new PR here and in kubo to see what it looks like.

@Jorropo
Copy link
Contributor Author

Jorropo commented Jan 22, 2024

Latest pr at #570

@hacdias hacdias added P1 High: Likely tackled by core team if no one steps up effort/days Estimated to take multiple days, but less than a week dif/hard Having worked on the specific codebase is important labels Jan 23, 2024
@Jorropo Jorropo removed their assignment Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dif/hard Having worked on the specific codebase is important effort/days Estimated to take multiple days, but less than a week P1 High: Likely tackled by core team if no one steps up
Projects
No open projects
Status: 🏃‍♀️ In Progress
Development

No branches or pull requests

3 participants