Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS as storage? #454

Closed
matthiasbeyer opened this issue Dec 6, 2017 · 11 comments
Closed

IPFS as storage? #454

matthiasbeyer opened this issue Dec 6, 2017 · 11 comments

Comments

@matthiasbeyer
Copy link

Would it be appropriate to ask for this here? Not sure. If not, please close and point me to the right repo.


So, there was already a comment about IPFS in another issue. From my point of view, it would be really fortunate to port the data storage to IPFS (and possibly orbitdb as abstraction layer over it).

Not only because of the deduplication of work, but also because projects like ipld would be adoptable without much hassle and thus yielding an even bigger advantage: Connectability to all the things.

I've created an github organization for discussing these things myself and I'm getting more and more the impression that SSB already does a lot of these things and it might be worth to contribute discussion here instead of running into an almost same direction and doubling effort.

@staltz
Copy link
Member

staltz commented Dec 6, 2017

Hi @matthiasbeyer, because of scuttlebot's plugin system, it's pretty viable to build an ssb-ipfs plugin, and then use it in sbot. We don't need "official" integration if that's the case. :)

@matthiasbeyer
Copy link
Author

Do I understand that correctly that one is able to develop a plugin which automatically puts content and metadata into IPFS? So the plugin has full access to the data and metadata inside scuttlebutt?

@staltz
Copy link
Member

staltz commented Dec 6, 2017

@matthiasbeyer Yes :)

@matthiasbeyer
Copy link
Author

Hmh. I'm not quite sure whether we misunderstand each other.

What I understand is that there is a possibility to develop a plugin which takes the content the node receives and puts it into IPFS.

My question, though, was about moving the entire network to use IPFS as storage. So that the whole social data, metadata and everything else is stored in IPFS rather than scuttlebutts own format/storage.

My reason is that IPFS already does all the things very well and reusing existing technology instead of reinventing everything is better.

@staltz
Copy link
Member

staltz commented Dec 6, 2017

scuttlebutt's own storage is always local-first, so one option is to mirror the SSB database in IPFS. But despite IPFS being great for some things, it's not great for everything, and in SSB it's important to have a database that's good for append-only logs. SSB uses the very fast leveldb written in C, and more recently the flume db which is made for append-only logs.

I think a good use for IPFS is to store blob content (images, videos, etc), although one can also store textual content in IPFS. So, in a nutshell, I think it makes some sense to store data in IPFS but not metadata.

Also, it's worth asking why start from the assumption that you will use IPFS no matter what.

@dominictarr
Copy link
Contributor

@matthiasbeyer ssb is optimized for what would be a pathelogical case in ipfs (long append-only chains)
and in general follows completely different design principles, for example, ssb avoids using any global singletons (such as DHTs) both because they have inherent problems (such as weakness to sybil attacks) and also to prove you can build a viable system without them.

Also, ipfs isn't as general purpose as you think ;) for example - ipfs needs to announce each block that you have in the DHT every 24 hours. that means a pointer from the hash back to your ip... and public key (maybe?) anyway just the hashes for 100k messages would come to ~10 megabytes which must be uploaded every 24 hours per peer, even if they arn't doing anything!

Also, each of those blocks you retrive needs a DHT lookup O(log(number of peers in dht)) ... that can be pretty slow, especially if you suffer from slow internet... (i live on a sailboat and I care very much about people with slow internet)

On the other hand, ssb has much lower overhead: replication with a peer starts after one roundtrip, and recent improvements to https://github.com/ssbc/ssb-ebt means most of the requests can actually be dropped so that the overhead on replication is proportional to the number of feeds that have changed - a number which is normally much lower than the number of feeds you follow ;)

So - p2p data replication is not a solved problem. We are not duplicating effort with ipfs because we are trying to do something different, with a different design philosophy. ipfs is trying to be a file system... we are trying to be a social graph database. Also check out http://github.com/datproject/ really, there should be more p2p data replication projects, not less.

@matthiasbeyer
Copy link
Author

First: Thanks for your detailed answer. I see your point. Still, I want to correct some things:

ipfs needs to announce each block that you have in the DHT every 24 hours.

This is not true. First, IP_FS_ does not announce any blocks by itself, that's IP_NS_. Secondly, there is absolutely no need to reannounce blocks after 24 hours! The cache-timeout is just set to 24 hours für IPNS hashes, that's something completely different.

Still, you're right that there must be reannouncements at some point.

just the hashes for 100k messages would come to ~10 megabytes which must be uploaded every 24 hours per peer, even if they arn't doing anything!

This is not true because you don't need to reannounce all hashes. You only reannounce one hash: The IPNS hash (or rather one IPNS hash per "public name", thus per private key). That IPNS hash is a pointer to the IPFS hash which may contain a whole tree of data. The IPNS hash is mutable, so you can re-announce it with another IPFS hash it points to. That's how updating things work.


That said, I totaly see your points! I really hope I can get patchwork packaged for my distro to try it out! Also multi-client support... for my journey around north america next year, ssb sounds awesome for "staying connected" 👍

@dominictarr
Copy link
Contributor

@matthiasbeyer well, I came to this understanding by talking to ipfs developers. This was a few months ago, so if that has changed that is good (can you link me to current docs about this?) but this was how it worked when I asked about this.

@RangerMauve
Copy link

Hi, I understand why you'd want to avoid storing all of the chains on IPFS, but the IPFS ecosystem is actually a bit larger than file sharing.

Libp2p is my favorite part of the stack in that it abstracts away interaction between peers through different protocols. I think it would be useful to at least use libp2p for discovering SSB peers.

libp2p facilitates connecting to peers directly via their advertised transports, and it will soon allow browser-based peers to connect to non-browser peers using the new circuit-relay functionality.
This means that you no longer need pubs to sync people up, and you no longer need to worry about bridging peers that use different transports.

libp2p allows one to define services that a peer can provide which work regardless of which transport is used. This will allow browser peers to act as full nodes that can have incoming connections to them, and open the doors for more transports without having to worry too much about implementation differences between TCP/Websockets/HTTP/whatever.

One could forsee an extension to SSB which uses the libp2p DHT in order to advertise which logs you're replicating, and then peers searching for a log can query the DHT to find peers who advertise having more recent logs.

AFAIK SSB doesn't really work in the browser without extensions because of the fact that the browser cannot get incoming connections and needs centralized signaling for WebRTC. If you integrate libp2p, you get browser support for free, this supporting more people with less technical know-how, and supporting web apps that want to make use of SSB in a distributed system.

Using libp2p you can improve the existing SSB network without having to buy-into all of IPFS and the assumptions that IPFS has about the actual data storage. You can use it to spread SSB to more places and decentralize it further by removing the need for pubs.

@stale
Copy link

stale bot commented Nov 1, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 1, 2018
@stale stale bot closed this as completed Nov 9, 2018
@zicmama
Copy link

zicmama commented Jan 16, 2020

Node just simply copy .ssb into ipfs and publish it on its own IPNS address.
Then all Nodes from IPFS swarm could just replicate and merge back into .ssb

for id in $(ipfs swarm peers | awk -F '/' '{print $7}');
do
    count=$((count+1))
    id=$(echo $id | cut -d '.' -f 3 | cut -d '/' -f 1)
    ipfs get --output=./.ssb/ /ipns/$id"
done

I wonder what that action will do to ssb nodes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants