Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add independent txindex #6

Open
dcousens opened this issue Mar 16, 2017 · 15 comments
Open

Add independent txindex #6

dcousens opened this issue Mar 16, 2017 · 15 comments
Labels

Comments

@dcousens
Copy link
Contributor

dcousens commented Mar 16, 2017

This would mean we could run indexd on a pruning node, provided it is fully synchronized to start.

@dcousens
Copy link
Contributor Author

dcousens commented Apr 19, 2017

The three options as I see them:

  • Ask bitcoind for the block@height (getblock), and our index maintains the transaction offset into that block
    • Easily cache-able... but probably? high latency due to multiple applications doing IO
  • Store all of the transaction data in our in database... you wouldn't bother with -txindex... and you'd definitely want to -prune the node. Not as simple to "reset" your index but; you'd have reset everything.
  • Use -txindex, and forgo the problem

@dcousens
Copy link
Contributor Author

If we maintain our own txindex, not only can we bundle it with #9 - but we can preserve the entire transaction for other analysis.

@Runn1ng the concern here is, this could severely blow out a disk in terms of space required...
Maybe optional?

@karelbilek
Copy link

What would be the motivation of having at the same time pruning node and txindex? When user is using pruning, he wants to save disk space, which is then negated by saving txindex :)

Also what is the reasoning of having separate index here instead of relying on bitcoind?

@karelbilek
Copy link

Btw, getting back to bitcore fork (I am going back to it, because it does what I need :), but it's a bit painful to maintain because of the rather large patchset)

If you have addressindex, spentindex, timestampindex and txindex enabled, the disk space significantly grows, I think about 2 or 3 times from blockchain without the indexes

@unsystemizer
Copy link

unsystemizer commented May 21, 2017

addrindex and txindex don't add nearly as much. If the exact figures are important I can look them up.
A motivation of having indexes with a pruned blockchain would be you could fetch tx details later (not necessarily from the same bitcoind) no?

@dcousens
Copy link
Contributor Author

@unsystemizer exactly

@instagibbs
Copy link

instagibbs commented May 30, 2017

@Runn1ng I don't have any special info but txindex may someday get retired from bitcoind, especially if external indexes like this are successful. Core contributors in general are quite down on additional indexes due to complexity and interactions with consensus code.

@unsystemizer
Copy link

unsystemizer commented Oct 28, 2017

Just did a repeat of the same experiment somebody did with addrindex before:
a) Create txindex on testnet (mainnet is larger, so ...)
b) Use the max dbcache value (16GB) on bitcoind

root@indexd:~/.bitcoin/testnet3/blocks/index# du -sh
949M .
...
root@indexd:~/.bitcoin# tail -f /root/.bitcoin/testnet3/debug.log
...
2017-10-28 13:41:29 Cache configuration:
2017-10-28 13:41:29 * Using 1024.0MiB for block index database
2017-10-28 13:41:29 * Using 8.0MiB for chain state database
2017-10-28 13:41:29 * Using 15352.0MiB for in-memory UTXO set

Conclusions:

  • on mainnet, txindex likely cannot fit in dbcache (I haven't tried, but...)
  • on testnet, txindex can't fit in dbcache unless one wastefully allocates 15 GiB of RAM to in-mem UTXOs

It'd be valuable to be able to either load txindex in RAM (mainnet) or avoid wasting many GBs of RAM on caching in-memory UTXOs (testnet) in order to be able to fully cache txindex (edit: in indexd, of course)

@KanoczTomas
Copy link

KanoczTomas commented Oct 29, 2017

@unsystemizer I have a feeling you are mixing up txindex and chainstate. The UTXO set in bitcoind is in chainstate dir, while the txindex is in blocks/index. The chainstate is currently 2.8G for mainnet, a full index is 14G. edit: of course I could be wrong ... that is my understanding from chat with the devs.

The dbcache switch is only used for the chainstate db in bitcoind. Just wanted to make sure you have the correct asumptions, not sure what you were trying to calculate. 4G dbcache is effectively an infinite space while syncing as the utxos set never raches it (core devs use it as the max in benchmarks).

@unsystemizer
Copy link

What would be the motivation of having at the same time pruning node and txindex?

@Runn1ng exactly that. It allows me to keep my indexes on a fast node (while running bitcoind on a slow node) and at the same time doesn't require bitcoind admin to worry about index maintenance. There are several other reasons, some of which I mentioned 2 comments above.
Regarding your point on txidnex size from the other comment, I checked on both testnet and mainnet, and currently txindex occupies (approximately) 10% of block capacity (on a non-pruned node). If both addrindex and txindex are enabled (using Bitcoin Core with addrindex patch), they take (approximately) 25% of block data capacity (addrindex 15% and txindex 10%, roughly speaking).

@dcousens
Copy link
Contributor Author

dcousens commented Nov 2, 2017

@unsystemizer the issue with a local transaction index, is that we can't index into the .dat files themselves, as they may be in an indeterminate state (as bitcoind updates them).

We would have to maintain nearly the entire blockchain in our database, as the block headers only account for 80 bytes...

Hence, why I suggest that sane users will think they should prune... which could make any "catch up"/"resync" phase difficult if the data is missing.

I agree that you could use an external node for initial resync, then the local pruning node after that.

@dcousens
Copy link
Contributor Author

dcousens commented Nov 2, 2017

Another alternative is if we could ask network peers for the blocks on the initial sync... then continue as normal with our pruned node.
We don't want to ask random peers directly, as we don't want indexd to have to verify consensus rules.

If the block was verified by bitcoind on the way... that'd be near perfect.

Maybe a new RPC call for pruned nodes?
fetchblock, with a condition the block has to be on the best chain.
For non-pruned nodes, it is an alias to getblock.

@theuni thoughts, could this be possible?

This would allow us to resync, using a pruned node, and therefore drop our dependency on -txindex by maintaing our own.

The option to fast synchronize via something like fast-dat-parser could still be done, as that is an offline-step to initialize the local database, and is more of an overall deployment consideration.

@dcousens
Copy link
Contributor Author

dcousens commented Nov 2, 2017

In the mean time, indexd could use the pruneblockchain RPC command to signal where it is up to -prune=1 (aka, manual RPC pruning only), then we could allow indexd to signal when it is safe to prune.

This wouldn't help if the database is lost, but, it would stop there being too much data duplication.

@theuni
Copy link

theuni commented Nov 2, 2017

@dcousens If I'm understanding your question, I think bitcoin/bitcoin#10794 would do what you want?

@dcousens
Copy link
Contributor Author

dcousens commented Nov 3, 2017

@theuni yes it would. Thanks for pointing that issue out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants