Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bitswap: long-running session leaks #752

Open
Wondertan opened this issue Dec 12, 2024 · 7 comments
Open

bitswap: long-running session leaks #752

Wondertan opened this issue Dec 12, 2024 · 7 comments
Labels
need/analysis Needs further analysis before proceeding need/maintainers-input Needs input from the current maintainer(s)

Comments

@Wondertan
Copy link
Member

We've been running long-running Bitswap sessions and encountered a series of memory leaks that we've successfully identified and fixed in our fork. I plan to start upstreaming them eventually, but I am currently botted with millions of things. Anyway, I want to make you aware of their existence already in case you have the capacity to look into those.

The full branch is located here. Some of them were merged, some of them aren't. The most notable are:

All of those fixes have been tested on hundreds of nodes in production, and things have been running smoothly and steadily for 3 weeks. I am happy to answer if you start working on this sooner than when I get to opening PRs myself.

cc @hsanjuan, @gammazero, @lidel

@Wondertan Wondertan added the need/triage Needs initial labeling and prioritization label Dec 12, 2024
@gammazero
Copy link
Contributor

I will create separate PR for each of these and review and merge these into our code base, unless you would rather do this from your fork. We are also resolving memory consumption issues with connections, so it would be great to get all the known memory issues handled in at the same time. So let me know soon.

Thank you.

@gammazero
Copy link
Contributor

gammazero commented Dec 16, 2024

The first two fixes appear to be incorrect according to the comment in SessionInterestManager:

		// Note that once the block is received the session no longer wants
		// the block, but still wants to receive messages from peers who have
		// the block as they may have other blocks the session is interested in.

@Wondertan Please let me know if this needs further investigation.

@lidel lidel added need/author-input Needs input from the original author and removed need/triage Needs initial labeling and prioritization labels Dec 17, 2024
Copy link

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

@lidel lidel added need/triage Needs initial labeling and prioritization and removed need/author-input Needs input from the original author kind/stale labels Dec 25, 2024
@Wondertan
Copy link
Member Author

Hey @gammazero, thank you for submitting the fixes.

The first two fixes appear to be incorrect according to the comment in SessionInterestManager:

Re above. The first commit cleans up the SessionInterestManager, otherwise its only cleaned when the session is closed. I don't think that this behavior is correct, even though it might be expected, as per comment you referenced. However, I also don't see my fix contradict to the comment. It says "but still wants to receive messages from peers who have the block as they may have other blocks the session is interested in". Why would cleaning up SessionInterestManager prevent receiving new messages from the peers? It also states other blocks, while we clean particular blocks.

@Wondertan
Copy link
Member Author

6cd1ed9 goes further and cleans up BlockPresenceManager, but specifically for CIDs that are not in interest for any other session. There is no reason to keep CID presence information from peers when the CID has already been processed and no other session needs it.

@Wondertan
Copy link
Member Author

Wondertan commented Jan 6, 2025

And 70a6503 is pretty severe bug independently of long-running sessions. Check the comment there for description. Besides, one could blow up a peer by sending presences of completely arbitrary CIDs, because Bitswap never checks if those CIDs presences are even relevant(in SIM) and directly stores them. This commit fixes that

@gammazero gammazero added need/analysis Needs further analysis before proceeding need/maintainers-input Needs input from the current maintainer(s) and removed need/triage Needs initial labeling and prioritization labels Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/analysis Needs further analysis before proceeding need/maintainers-input Needs input from the current maintainer(s)
Projects
None yet
Development

No branches or pull requests

3 participants