IPFS as storage? #454

matthiasbeyer · 2017-12-06T17:36:32Z

Would it be appropriate to ask for this here? Not sure. If not, please close and point me to the right repo.

So, there was already a comment about IPFS in another issue. From my point of view, it would be really fortunate to port the data storage to IPFS (and possibly orbitdb as abstraction layer over it).

Not only because of the deduplication of work, but also because projects like ipld would be adoptable without much hassle and thus yielding an even bigger advantage: Connectability to all the things.

I've created an github organization for discussing these things myself and I'm getting more and more the impression that SSB already does a lot of these things and it might be worth to contribute discussion here instead of running into an almost same direction and doubling effort.

staltz · 2017-12-06T19:04:23Z

Hi @matthiasbeyer, because of scuttlebot's plugin system, it's pretty viable to build an ssb-ipfs plugin, and then use it in sbot. We don't need "official" integration if that's the case. :)

matthiasbeyer · 2017-12-06T19:06:12Z

Do I understand that correctly that one is able to develop a plugin which automatically puts content and metadata into IPFS? So the plugin has full access to the data and metadata inside scuttlebutt?

staltz · 2017-12-06T19:12:32Z

@matthiasbeyer Yes :)

matthiasbeyer · 2017-12-06T21:55:46Z

Hmh. I'm not quite sure whether we misunderstand each other.

What I understand is that there is a possibility to develop a plugin which takes the content the node receives and puts it into IPFS.

My question, though, was about moving the entire network to use IPFS as storage. So that the whole social data, metadata and everything else is stored in IPFS rather than scuttlebutts own format/storage.

My reason is that IPFS already does all the things very well and reusing existing technology instead of reinventing everything is better.

staltz · 2017-12-06T23:17:35Z

scuttlebutt's own storage is always local-first, so one option is to mirror the SSB database in IPFS. But despite IPFS being great for some things, it's not great for everything, and in SSB it's important to have a database that's good for append-only logs. SSB uses the very fast leveldb written in C, and more recently the flume db which is made for append-only logs.

I think a good use for IPFS is to store blob content (images, videos, etc), although one can also store textual content in IPFS. So, in a nutshell, I think it makes some sense to store data in IPFS but not metadata.

Also, it's worth asking why start from the assumption that you will use IPFS no matter what.

dominictarr · 2017-12-09T00:24:08Z

@matthiasbeyer ssb is optimized for what would be a pathelogical case in ipfs (long append-only chains)
and in general follows completely different design principles, for example, ssb avoids using any global singletons (such as DHTs) both because they have inherent problems (such as weakness to sybil attacks) and also to prove you can build a viable system without them.

Also, ipfs isn't as general purpose as you think ;) for example - ipfs needs to announce each block that you have in the DHT every 24 hours. that means a pointer from the hash back to your ip... and public key (maybe?) anyway just the hashes for 100k messages would come to ~10 megabytes which must be uploaded every 24 hours per peer, even if they arn't doing anything!

Also, each of those blocks you retrive needs a DHT lookup O(log(number of peers in dht)) ... that can be pretty slow, especially if you suffer from slow internet... (i live on a sailboat and I care very much about people with slow internet)

On the other hand, ssb has much lower overhead: replication with a peer starts after one roundtrip, and recent improvements to https://github.com/ssbc/ssb-ebt means most of the requests can actually be dropped so that the overhead on replication is proportional to the number of feeds that have changed - a number which is normally much lower than the number of feeds you follow ;)

So - p2p data replication is not a solved problem. We are not duplicating effort with ipfs because we are trying to do something different, with a different design philosophy. ipfs is trying to be a file system... we are trying to be a social graph database. Also check out http://github.com/datproject/ really, there should be more p2p data replication projects, not less.

matthiasbeyer · 2017-12-09T11:54:29Z

First: Thanks for your detailed answer. I see your point. Still, I want to correct some things:

ipfs needs to announce each block that you have in the DHT every 24 hours.

This is not true. First, IP_FS_ does not announce any blocks by itself, that's IP_NS_. Secondly, there is absolutely no need to reannounce blocks after 24 hours! The cache-timeout is just set to 24 hours für IPNS hashes, that's something completely different.

Still, you're right that there must be reannouncements at some point.

just the hashes for 100k messages would come to ~10 megabytes which must be uploaded every 24 hours per peer, even if they arn't doing anything!

This is not true because you don't need to reannounce all hashes. You only reannounce one hash: The IPNS hash (or rather one IPNS hash per "public name", thus per private key). That IPNS hash is a pointer to the IPFS hash which may contain a whole tree of data. The IPNS hash is mutable, so you can re-announce it with another IPFS hash it points to. That's how updating things work.

That said, I totaly see your points! I really hope I can get patchwork packaged for my distro to try it out! Also multi-client support... for my journey around north america next year, ssb sounds awesome for "staying connected" 👍

dominictarr · 2017-12-10T09:15:31Z

@matthiasbeyer well, I came to this understanding by talking to ipfs developers. This was a few months ago, so if that has changed that is good (can you link me to current docs about this?) but this was how it worked when I asked about this.

RangerMauve · 2018-01-18T20:42:11Z

Hi, I understand why you'd want to avoid storing all of the chains on IPFS, but the IPFS ecosystem is actually a bit larger than file sharing.

Libp2p is my favorite part of the stack in that it abstracts away interaction between peers through different protocols. I think it would be useful to at least use libp2p for discovering SSB peers.

libp2p facilitates connecting to peers directly via their advertised transports, and it will soon allow browser-based peers to connect to non-browser peers using the new circuit-relay functionality.
This means that you no longer need pubs to sync people up, and you no longer need to worry about bridging peers that use different transports.

libp2p allows one to define services that a peer can provide which work regardless of which transport is used. This will allow browser peers to act as full nodes that can have incoming connections to them, and open the doors for more transports without having to worry too much about implementation differences between TCP/Websockets/HTTP/whatever.

One could forsee an extension to SSB which uses the libp2p DHT in order to advertise which logs you're replicating, and then peers searching for a log can query the DHT to find peers who advertise having more recent logs.

AFAIK SSB doesn't really work in the browser without extensions because of the fact that the browser cannot get incoming connections and needs centralized signaling for WebRTC. If you integrate libp2p, you get browser support for free, this supporting more people with less technical know-how, and supporting web apps that want to make use of SSB in a distributed system.

Using libp2p you can improve the existing SSB network without having to buy-into all of IPFS and the assumptions that IPFS has about the actual data storage. You can use it to spread SSB to more places and decentralize it further by removing the need for pubs.

stale · 2018-11-01T16:32:59Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

zicmama · 2020-01-16T12:06:17Z

Node just simply copy .ssb into ipfs and publish it on its own IPNS address.
Then all Nodes from IPFS swarm could just replicate and merge back into .ssb

for id in $(ipfs swarm peers | awk -F '/' '{print $7}');
do
    count=$((count+1))
    id=$(echo $id | cut -d '.' -f 3 | cut -d '/' -f 1)
    ipfs get --output=./.ssb/ /ipns/$id"
done

I wonder what that action will do to ssb nodes?

stale bot added the stale label Nov 1, 2018

stale bot closed this as completed Nov 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPFS as storage? #454

IPFS as storage? #454

matthiasbeyer commented Dec 6, 2017

staltz commented Dec 6, 2017

matthiasbeyer commented Dec 6, 2017

staltz commented Dec 6, 2017

matthiasbeyer commented Dec 6, 2017

staltz commented Dec 6, 2017

dominictarr commented Dec 9, 2017

matthiasbeyer commented Dec 9, 2017

dominictarr commented Dec 10, 2017

RangerMauve commented Jan 18, 2018

stale bot commented Nov 1, 2018

zicmama commented Jan 16, 2020

IPFS as storage? #454

IPFS as storage? #454

Comments

matthiasbeyer commented Dec 6, 2017

staltz commented Dec 6, 2017

matthiasbeyer commented Dec 6, 2017

staltz commented Dec 6, 2017

matthiasbeyer commented Dec 6, 2017

staltz commented Dec 6, 2017

dominictarr commented Dec 9, 2017

matthiasbeyer commented Dec 9, 2017

dominictarr commented Dec 10, 2017

RangerMauve commented Jan 18, 2018

stale bot commented Nov 1, 2018

zicmama commented Jan 16, 2020