-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve NumPy compatibility hell #3231
Conversation
Wouldn't that preclude us from using any new API functions from numpy? (new = introduced since If so, that doesn't sound like a desirable solution. For example, this PR seems to introduce a regression: #2864 and #3220 fixed |
gensim/models/word2vec.py
Outdated
@@ -1956,8 +1956,7 @@ def _load_specials(self, *args, **kwargs): | |||
if not hasattr(self.wv, 'vectors_lockf') and hasattr(self.wv, 'vectors'): | |||
self.wv.vectors_lockf = np.ones(1, dtype=REAL) | |||
if not hasattr(self, 'random'): | |||
# use new instance of numpy's recommended generator/algorithm | |||
self.random = np.random.default_rng(seed=self.seed) | |||
self.random = np.random.RandomState(self.seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regresses #2782.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed ab53e62
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that preclude us from using any new API functions from numpy? (new = introduced since oldest-supported-numpy)
It doesn't have to. Where we can easily replicate the new API functions, we should do so (as @menshikh-iv has done with gensim.utils.default_rng function). Where we can't (so far, there are no such cases, but there may be in the future), we could do something like:
def np_cool_new_function(*args, **kwargs):
try:
fn = np.cool_new_function
except AttributeError:
raise RuntimeError("your numpy version is too old, please upgrade")
else:
return fn(*args, **kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. Although this smells of re-implementing dependency resolution internally in Gensim, which would be extra hell.
But I guess as long as setup.py
requires an actual working version of numpy (not just oldest-supported-numpy
), it should be OK.
But requiring some dependency version in setup.py
, and then raising RuntimeErrors dynamically suggesting that version is too old, would not be OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I guess as long as setup.py requires an actual working version of numpy (not just oldest-supported-numpy), it should be OK.
The problem with going that way is that it prevents us from building against oldest-supported-numpy. We then have to reinvent that particular wheel in our build system infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see if I got this right:
numpy API (setup.py): oldest-support-numpy | numpy API (setup.py): 1.17.0 | |
---|---|---|
numpy ABI (wheels): oldest-support-numpy | this #3231 | current develop |
numpy ABI (wheels): 1.17.0 | ? | desired, #3226 (comment) |
Building against the oldest numpy we actually support makes more sense to me than building against oldest-supported-numpy and then throwing RuntimeError because it's too old, or maintaining separate code paths for deprecated functions we don't really need/want (RandomState
).
@mpenkov what exactly was the problem with the "desired" quadrant? Too much code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not exactly code, it's build system configuration.
We're essentially replicating this with slightly different business logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Reading the comment in your .yml
link, that sounds exactly right, exactly like what we want = the "desired" state. Why does that not work?
@gojomo can you check too? I may have gotten lost in the PRs and comments.
@mpenkov not to get too philosophical – let's release bugfix 4.1.1 ASAP. We can discuss a "proper" solution once the fire is put out.
If we'll have tests for oldest & newest numpy in CI, these will be detected automatically |
Adding some comments to the code
Closing this, because we've moved on. I also recall a problem with oldest-supported-numpy, but can't remember what it is at this stage. @menshikh-iv Thank you for your effort and for guiding us through these numpy issues. |
🙏 |
Idea:
oldest-supported-numpy
everywhere (on every python and platform)Goal: Make gensim works with numpy correctly (avoid issues like #3226).
Fixes #3226.