Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I reduce the memory consumed by cosmic-ray? #486

Open
MartinThoma opened this issue Oct 18, 2019 · 12 comments
Open

Can I reduce the memory consumed by cosmic-ray? #486

MartinThoma opened this issue Oct 18, 2019 · 12 comments

Comments

@MartinThoma
Copy link

When I run cosmic-ray, I cannot use my computer. It consumes 15 GB of memory ... when it started filling the swap, I killed it.

Can I reduce the memory consumption of it?

@tomato42
Copy link
Contributor

this usually happens when a mutation changes parameters to range() on python2 or causes creation of some other large list

workaround would be to either execute it under a user with limited access to memory or run without swap

@MartinThoma
Copy link
Author

I execute cosmic-ray 5.6.1 with Python 3.6.8

Interestingly, cosmic-ray also seems not to install the required dependencies. I tried the following cr-config.toml for my module mpu:

[cosmic-ray]
module-path = "mpu"
python-version = ""
timeout = 10
exclude-modules = []
test-command = "python3 -m pytest"
execution-engine.name = "local"

[cosmic-ray.cloning]
method = 'copy'
commands = ["pip3 install -e .[all]"]

I executed:

$ time cosmic-ray init cr-config.toml cr_session.sqlite
real	46,95s
user	9,81s
sys	1,95s

# I 'DELETE FROM work_items LIMIT 2170' to keep the execution time low
# This leaves 10 tests
$ time cosmic-ray exec cr_session.sqlite
2019-10-19 18:58:06,839 cosmic_ray.cloning ERROR Error running command in virtual environment
command: pip3 install -e .[all]
error: b"Obtaining file:///tmp/tmp21ptzj3k/repo\nCollecting pandas (from mpu==0.21.0)\n  Using cached https://files.pythonhosted.org/packages/86/12/08b092f6fc9e4c2552e37add0861d0e0e0d743f78f1318973caad970b3fc/pandas-0.25.2-cp36-cp36m-manylinux1_x86_64.whl\nCollecting python-magic (from mpu==0.21.0)\n  ERROR: Could not find a version that satisfies the requirement python-magic (from mpu==0.21.0) (from versions: none)\nERROR: No matching distribution found for python-magic (from mpu==0.21.0)\nWARNING: You are using pip version 19.2.2, however version 19.3.1 is available.\nYou should consider upgrading via the 'pip install --upgrade pip' command.\n"

real	12,08s
user	49,01s
sys	5,47s

Due to those issues, I moved to mutmut.

@abingham
Copy link
Contributor

Sorry I didn't get to this sooner...buried under work right now.

this usually happens when a mutation changes parameters to range() on python2 or causes creation of some other large list

Right, this is a known issue (though there's no open issue on it), and I'm not sure the best way to deal with it beyond having the user skip certain mutations. We could do things like look for mutations inside range calls or list constructors, but that wouldn't address transitive mutations that make their way into those calls.

cosmic-ray also seems not to install the required dependencies

CR is certainly trying to install the dependencies you asked for. Any idea why it's seeing this:

ERROR: Could not find a version that satisfies the requirement python-magic (from mpu==0.21.0)

Due to those issues, I moved to mutmut.

Fair enough. Can we close this then?

@tomato42
Copy link
Contributor

Can we close this then?

I'd say that it's still a possible issue, so having some mechanism of handling it in cosmic-ray is probably a good idea.

I'm not aware of anything portable, but there are solutions for Unix systems in general (that use ulimit) and Linux specific (that use cgroups)

@abingham
Copy link
Contributor

I'd say that it's still a possible issue

It's certainly still an issue, and I'm happy to keep it open. Things like ulimit and cgroups seems like extrinsic solutions to me (though I'm no expert), and maybe something that can already be used today without any change to CR. If so, maybe what's really needed is a discussion of these things in the documentation (e.g. a section on "strategies for avoiding resource overuse").

Intrinsic solutions seem much harder. I don't have any real insight into how we'd detect mutations that might cause memory explosions except in very simple cases. Any ideas?

@tomato42
Copy link
Contributor

Intrinsic solutions seem much harder. I don't have any real insight into how we'd detect mutations that might cause memory explosions except in very simple cases

I wouldn't say that mutations like that should be avoided - but interpreting results of them is much more complex. In some cases it may indicate use of a wrong algorithm (like use or range() in py2 instead of xrange(), or list generator instead of iterator), and sometimes they can be false positives (like causing python to allocate 4GiB large byte string when the value processed will never be this large). So they are more like "things you may want to take a closer look at", rather than something we can feed into a formula for mutation score.

@abingham
Copy link
Contributor

This points to an interesting idea. I wonder if we could create post-processors that look for common pathologies in test output, e.g. looking for massive memory uses and suggesting that the user look at it.

@boxed
Copy link

boxed commented Oct 20, 2019

I wonder if I can get a concrete example of a mutation that can cause this? Maybe I'm missing some important mutant in mutmut and that's the only reason mutmut survives! That would be bad.

@abingham
Copy link
Contributor

I don't have an example from "the real world", but imagine something like mutating this:

x = [0] * 50

to

x = [0] * 50000000000

This would use a billion times the memory. I don't know if CR has this specific behavior; I'm pretty sure we select number mutations that are generally close to the original value. But nothing in CR would stop someone from creating an operator that does exactly that.

@abingham
Copy link
Contributor

abingham commented Oct 20, 2019

Something that could happen today is that mutation somehow prevents a loop from terminating, and this then results in unbounded memory consumption (i.e. because of the specifics of what's going on in that loop).

@tomato42
Copy link
Contributor

yeah, something that turns

for _ in range(50 * 80):
    x += b'some string'

into

for _ in range(50 ** 80):
    x += b'some string'

will exhaust memory of any system

@boxed
Copy link

boxed commented Oct 20, 2019

Ouch. Yea that's a great example. I would like to say I escape this by being smart but I believe I escape this because we've removed incorrect mutations of * to ** and removed too much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants