Skip to content
This repository has been archived by the owner on Jul 12, 2021. It is now read-only.

server stops suddenly / std::bad_alloc memory exhaustion #126

Open
gits7r opened this issue Sep 26, 2015 · 21 comments
Open

server stops suddenly / std::bad_alloc memory exhaustion #126

gits7r opened this issue Sep 26, 2015 · 21 comments

Comments

@gits7r
Copy link
Contributor

gits7r commented Sep 26, 2015

Hi,

Server was running fine. Just upgraded to latest commits few days ago. Now when I type electrum-server starts, it looks like it starts (take some time) but when I run electrum-server getinfo (after ~1 minute) it says server not running. There is nothing in the log files which would be interesting, except starting TCP Server on ... and starting SSL server on...

bitcoind is working good, didn't touch it. I have tried restarting bitcoind as well, and then then electrum server started and was running for few hours, but died again with nothing in the logfile. How can I debug this?

@abitfan
Copy link

abitfan commented Sep 26, 2015

You can try to run run_electrum_server directly and see if it spits out more info.

@gits7r
Copy link
Contributor Author

gits7r commented Sep 26, 2015

INFO:electrum:Starting Electrum server on 127.0.0.1
ERROR:electrum:db init
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 180, in init
self.db_utxo = DB(self.dbpath, 'utxo', config.getint('leveldb', 'utxo_cache'))
File "/usr/lib/python2.7/ConfigParser.py", line 359, in getint
return self._get(section, int, option)
File "/usr/lib/python2.7/ConfigParser.py", line 356, in _get
return conv(self.get(section, option))
File "/usr/lib/python2.7/ConfigParser.py", line 618, in get
raise NoOptionError(option, section)
NoOptionError: No option 'utxo_cache' in section: 'leveldb'
INFO:electrum:Stopping Stratum
INFO:electrum:Initializing database
Traceback (most recent call last):
File "/usr/local/bin/run_electrum_server", line 4, in
import('pkg_resources').run_script('electrum-server==1.0', 'run_electrum_server')
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 534, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1445, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/electrum_server-1.0-py2.7.egg/EGG-INFO/scripts/run_electrum_server", line 256, in

File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 57, in init
File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 195, in init
File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 324, in put_node
AttributeError: 'Storage' object has no attribute 'db_utxo'

@ecdsa
Copy link
Member

ecdsa commented Sep 26, 2015

you have to run run_electrum_server.py, not run_electrum_server

@gits7r
Copy link
Contributor Author

gits7r commented Sep 26, 2015

Maybe the database was corrupt. I have deleted electrum's database and downloaded it again from foundry. Started fine and working for last hours under normal parameters... slowly catching up.

Could my database just get corrupted on the fly, without anyone doing anything wrong? I know how to start/stop the server and never kill -9 electrum.

@gits7r
Copy link
Contributor Author

gits7r commented Sep 27, 2015

@ecdsa It have downloaded a fresh leveldb dump from foundry, started again and it died again unfortunately. There is a bug here. I run run_electrum_server.py in console and here is what I get:

INFO:electrum:Starting Electrum server on 127.0.0.1
INFO:electrum:Database version 3.
INFO:electrum:Pruning limit for spent outputs is 10000.
INFO:electrum:Blockchain height 375506
INFO:electrum:UTXO tree root hash: c5e8dca8fefc2e5f8ab198aac02824d9b0b3e08c414cd 249fa62bb0d0408221a
INFO:electrum:Coins in database: 1463486254755852
INFO:electrum:catching up missing headers: 375492 375506
INFO:electrum:TCP server started on 127.0.0.1:50001
INFO:electrum:SSL server started on 127.0.0.1:50002
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
File "/usr/lib/python2.7/threading.py", line 763, in run
File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 83, in do_catch_up
File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 657, in catch_up
File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 413, in import_block
File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 625, in import_transaction
File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 585, in set_spent
File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 144, in get
File "_plyvel.pyx", line 299, in plyvel._plyvel.DB.get (plyvel/_plyvel.cpp:4025)
File "_plyvel.pyx", line 103, in plyvel._plyvel.db_get (plyvel/_plyvel.cpp:1891)
File "_plyvel.pyx", line 80, in plyvel._plyvel.raise_for_status (plyvel/_plyvel.cpp:1698)
IOError: IO error: /home/bitnode/electrum-server/electrum-leveldb-utxo-10000/hist/30610918.ldb: Too many open files

What could be the issue? I have the correct limits setup in /etc/security/limits.conf for the user running electrum.

@abitfan
Copy link

abitfan commented Sep 27, 2015

If this is a ubuntu install you also need to edit /etc/pam.d/common-session and add
session required pam_limits.so
To test that your changes are ok login with the user running electrum and run:
ulimit -n

@gits7r
Copy link
Contributor Author

gits7r commented Sep 27, 2015

@abitfan I am on Debian Jessie.
Unfortunately, here is what ulimit -n run as the user running electrum says:
sudo -u bitnode -i ulimit -n
1024

I have in /etc/security/limits.conf the following appended:
bitnode hard nofile 65536
bitnode soft nofile 65536

@abitfan
Copy link

abitfan commented Sep 27, 2015

Actually the common-session mod is required for debian as well

@gits7r
Copy link
Contributor Author

gits7r commented Sep 27, 2015

@abitfan can you let me know step by step what do I need to do in order to enable it? thanks.

@abitfan
Copy link

abitfan commented Sep 27, 2015

as root:
echo "session required pam_limits.so" >> /etc/pam.d/common-session

@gits7r
Copy link
Contributor Author

gits7r commented Sep 27, 2015

I have done that. now the limit is 65536 for 'bitnode' which is the user I run electrum-server as.
It still did not fix it. It starts and dies with nothing relevant in electrum.log. Running from console I get the following:

sudo -u bitnode -i run_electrum_server.py
INFO:electrum:Starting Electrum server on 127.0.0.1
ERROR:electrum:db init
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 180, in init
self.db_utxo = DB(self.dbpath, 'utxo', config.getint('leveldb', 'utxo_cache'))
File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 129, in init
self.db = plyvel.DB(os.path.join(path, name), create_if_missing=True, compression=None, lru_cache_size=cache_size)
File "_plyvel.pyx", line 236, in plyvel._plyvel.DB.init (plyvel/_plyvel.cpp:3129)
File "_plyvel.pyx", line 80, in plyvel._plyvel.raise_for_status (plyvel/_plyvel.cpp:1698)
IOError: IO error: lock /home/bitnode/electrum-server/electrum-leveldb-utxo-10000/utxo/LOCK: Resource temporarily unavailable
INFO:electrum:Stopping Stratum

@abitfan
Copy link

abitfan commented Sep 28, 2015

Can you try this with a fresh db ?

@gits7r
Copy link
Contributor Author

gits7r commented Sep 30, 2015

Ok. I have tried with fresh DB 10 times. Correct dbs, checked the hash and everything.
I have set the limits properly like you said, the user running electrum now has soft 65536 and hard 65536. It always dies like this after few seconds:

INFO:electrum:Starting Electrum server on 127.0.0.1
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

I am on latest commit. What could be wrong?

@shsmith
Copy link
Contributor

shsmith commented Oct 1, 2015

bad_alloc sounds like memory exhaustion. Try allocating more swap space.
You could also reduce the cache sizes via your electrum.conf hist_cache, utxo_cache and addr_cache settings.

@gits7r
Copy link
Contributor Author

gits7r commented Oct 1, 2015

My swap allocated space looks empty. This machine used to work with electrum very well. Can it suddenly require more swap space?

@gits7r
Copy link
Contributor Author

gits7r commented Oct 2, 2015

@shsmith @ecdsa I have increased the allocated RAM for this virtual machine from 8GB to 16GB and increased the swap space from 5GB to 8GB and this seam to have fixed it -- now electrum is catching up with bitcoind height and updating leveldb.

Do we require more resources now to run electrum server?

@gits7r
Copy link
Contributor Author

gits7r commented Oct 18, 2015

Tried 100 more times with different changes, it still won't work.
I think this is not related to electrum-server, this is maybe the fault of not enough hard disk I/O operations allowed since the server doesn't have SSD (server has normal SATA drives, no raid). This is a virtual machine, hosted on shared hardware - on the same hardware I have another electrum server + many other things so I guess the disk just can't take all of it and the hypervisor doesn't allocate more I/O hard disk resources to this virtual machine in order to protect the others.

We already know that leveldb uses the disk very much, it needs SSD. So, I will close this, since I don't see a bug in electrum-server. The last log message is:
[18/10/2015-05:06:38] block 379312 (410 401.10s) 457255bb18a4ba9e792ab8f3e2b4d5fd34f3dccf7b008ed5d278622c31f3e280 (4.49tx/s, 255.59s/block) (eta 11.3 hours, 112 blocks)

You can see it takes a lot to expand blocks. The RAM/CPU/Swap space resources are plenty, but the hard disk is not.

@gits7r gits7r closed this as completed Oct 18, 2015
@EagleTM EagleTM reopened this Nov 9, 2015
@EagleTM
Copy link

EagleTM commented Nov 9, 2015

I'm seeing the same issue here:
server with 4 GB RAM and 4 GB swap dies after around a week of running, with caches at half the size of the new lower default (so they are not the issue):
Crash message "terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc"

It's definitely running out of memory / swap.
I've noticed the current git head pulls in like 90% of memory (with swap that'd be 8 GB on my system) on startup to catch up blocks.

This might be related to recent changes like "writing once per block" or the "ordering of tx" stuff. server versions from June 2015 don't have this issue.

Unless memory footprint can be reduced we should recommend at least 8 GB or RAM - better 16 GB - for running electrum server

@EagleTM EagleTM changed the title server stops suddenly server stops suddenly / std::bad_alloc memory exhaustion Nov 9, 2015
@lvets
Copy link
Contributor

lvets commented Dec 6, 2015

Any update on this? I'm still seeing electrum-server taking 16GB of RAM + 4 GB swap on a server when processing blocks...

@EagleTM
Copy link

EagleTM commented Dec 12, 2015

We're investigating the issue. Thomas' server is using less than 2 gigs of RES RAM, while I'm at 11 gig. It might be the plyvel verison. I've updated to 0.9 (from 0.2) recently - still running leveldb 1.9.x (2013) with it. Thomas is using leveldb 1.9.x and plyvel 0.8. Which pyvel versions are you using?

For now I get a stable running server with 16 GB RAM + 16 GB swap. Around 6 GB swap gets used so I can recommend setting 24 Gigs of RAM + swap.

@EagleTM
Copy link

EagleTM commented Feb 21, 2016

Sorry, no progress here currently. The RAM recommendations still stand. I've put them into the HOWTO for now

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants