-
-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to speed up qvm-ls #165
base: master
Are you sure you want to change the base?
Conversation
e328e57
to
b81c949
Compare
Codecov Report
@@ Coverage Diff @@
## master #165 +/- ##
=========================================
- Coverage 53.93% 53.9% -0.04%
=========================================
Files 55 55
Lines 8452 8608 +156
=========================================
+ Hits 4559 4640 +81
- Misses 3893 3968 +75
Continue to review full report at Codecov.
|
1422b80
to
3605c92
Compare
Causes an unnecessary exception
3605c92
to
31d5624
Compare
strs = [] | ||
for vm in sorted(domains): | ||
self._vm_line_strs(strs, vm) | ||
return ''.join(strs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is str.append
really faster than str.format
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to avoid doing any kind of string processing until the final join call, which should be the fastest since it should just allocate a single string in join and copy them all in, avoiding any intermediate allocations, partial concatenations, format string parsing, etc.
It's slightly uglier, but not that much, it looks like writing to an output stream (which would be even faster, but not really worth the effort).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of those commits are good as-is, some requires changes. We'll think about property.GetAll
, so if you think other changes are important improvements anyway, splitting it into separate PR may be a good idea.
|
||
property_def = dest.property_get_def(self.arg) | ||
# pylint: disable=no-self-use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@staticmethod
qubes/app.py
Outdated
self._conn.registerCloseCallback(self._close_callback, None) | ||
break | ||
except libvirt.libvirtError: | ||
subprocess.call(["systemctl", "start", "libvirtd"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bad idea. It makes it impossible to stop libvirtd service - which admin should be allowed to do. I guess it will also lead to some corner cases during restart and system shutdown.
libvirtd service have configured automatic restart on crash, so this shouldn't be an issue in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it, although not sure which way is the best. The problem of not doing this is that qubesd will just freeze forever while trying to reconnect if libvirtd is stopped, which also doesn't seem great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some timeout (5s?) and throw exception?
qubes/__init__.py
Outdated
@@ -519,21 +517,50 @@ def __init__(self, xml, **kwargs): | |||
name, self.__class__.__name__)) | |||
|
|||
@classmethod | |||
def property_dict(cls, load_stage=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about @functools.lru_cache()
instead?
self._stubdom_xid = int(stubdom_xid_str) | ||
|
||
if self.is_halted() and not was_halted: | ||
self.fire_event('domain-shutdown') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #159 for related changes. Especially see the description of race condition fixed there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prevents the "double shutdown event" as well. It think just taking the startup lock while firing the event could fix the synchronization issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not prevent handling domain-shutdown after VM was started again, namely this sequence:
- VM shutdown
- vm.start()
- vm.fire_event('domain-shutdown')
You don't have any guarantee about order of input processing (request on qubesd.sock, event on libvirt socket). And additionally, you can't take lock directly in libvirt event handler, because it isn't coroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this needs to be thought about and done more carefully: probably the simplest way is having non-coroutines schedule coroutines, and then pretty much serializing all methods that change the VM in any way, possibly including libvirt state updates, by taking the startup_lock (which should just be renamed to "lock") and updating libvirt state under lock if necessary.
Could also consider deleting the libvirt domain for VMs that are shutdown to prevent external tools starting it while we finalize storage and other such robustness measures against raw use of the libvirt/xen APIs.
I think neither the current code nor this patch does this correctly, and some more work is needed.
raise | ||
|
||
assert False | ||
state = self._libvirt_state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes here are quite intrusive - especially more relies on libvirt events being really delivered (which is fragile...). What about caching libvirt state just in this function, instead of calling state()
in each if
statement? That should be big improvement already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to never call libvirtd for any read-only admin API call except for CPU/memory stats (and internal inspection).
It should theoretically not be fragile because it updates the state on every connection to libvirtd, on every lifecyle event received, and if the libvirt connection dies the close callback should be called, which triggers a reconnect, which triggers a state update.
Of course it's possible that the implementation is buggy
Without this doing qvm-ls requires at least one call to libvirtd per VM, which is probably going to take hundreds of milliseconds, and also any random "is_running" and similar call within qubes is going to cause I/O, which can also cause pervasive slowdown.
Plus we kind of want to know what the VMs are doing anyway to trigger events and perform appropriate actions, and if that can be done accurately, then the state can be accurately stored as well.
qubes/vm/qubesvm.py
Outdated
elif not self.is_running(): | ||
if not autostart: | ||
raise qubes.exc.QubesVMNotRunningError(self) | ||
yield from self.start(start_guid=gui) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, this commit is quite unrelated to the rest anyway.
qubes/api/admin.py
Outdated
@@ -145,6 +145,33 @@ def vm_list(self): | |||
self._vm_line_strs(strs, vm) | |||
return ''.join(strs) | |||
|
|||
@qubes.api.method('admin.vm.GetAllData', no_payload=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this is too much. Even if we agree for property.GetAll
, getting all info of all the VMs in one call is beyond what we want in Admin API.
qubes/app.py
Outdated
except: # pylint: disable=bare-except | ||
pass | ||
return cls._name | ||
|
||
def _default_pool(app): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@functools.lru_cache
instead?
1c01393
to
1db1dd1
Compare
it's unused and has a netid property with name different than key that would cause issues in the next commit
currently it takes 100ms+ to determine the default pool every time, including subprocess spawning (!), which is unacceptable since finding out the default value of properties should be instantaneous instead of checking every thin pool to see if the root device is in it, find and cache the name of the root device thin pool and see if there is a configured pool with that name
Otherwise qubesd will die if you try to break in the debugger
1db1dd1
to
27742bc
Compare
- connect in a loop, starting libvirtd - reconnect on demand in a loop - register events on reconnection
info already contains domain state, so just use it, making a single libvirtd call
This should prevent getting "qrexec not connected" due to trying to start a service in the middle of VM startup
This allows MUCH faster qvm-ls and qvm-prefs The code is optimized to be as fast as possible: - optimized property access - gather strings in a list and join them only at the end
Even faster qvm-ls!
27742bc
to
0175ad6
Compare
The property.GetAll idea has been implemented in #298. Few individual commits may be worth including too, but the IMO most important part already is. |
This is the server part of a bunch of changes that get qvm-ls to around 200ms run time.
Warning: I haven't tested them much, and they change fundamental code, so they probably need fixes
The first major change is the introduction of APIs that allow to get all properties of a VM or of Qubes and all properties of all VMs in a single call.
The second major change is storing the libvirt state in qubesd (and changing it in response to events), so that libvirtd calls are not required to get the system state.
Finally, there's some optimization and fixes.