-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gtools version of merge #76
Comments
@NilsJPWerner In theory I'd like to implement this, but in practice I've looked into it a bit and it's very complicated and not at all clear that I'd get a very large speed improvement. I'd like to look into this again in the future but it won't be any time soon. Sorry! |
Unrelated to merge but also a suggestion: it would be great if you could provide a gtools enhancement for carryforward. This is an essential (to me) but often overlooked command, and currently extremely slow. Thanks! |
@fpet19 I am curious, what is a specific scenario/example where carryforward is very slow? I have not used it but itsn't it a wrapper for It's surprising this is specially slow. Or is the issue that if you call it with |
Yes, that seems to be the case, I always call it with by. I use gegen to create a group variable for a subset of the group, and then I populate it for the whole group using carryforward. The second command is over 5 times slower than the first. I assume that whatever magic gtools does for gegen which does not require sorting and then resorting should be useful here. In very long datasets just avoiding having to xtset after gegen-related commands is worth it. |
@fpet19 For a while now I've basically had carryforward implemented without realizing it. (I was doing this in some data cleaning and somehow remembered your commend from years ago.) I haven't exactly optimized it, but it's a byproduct of this
This gets the last non-missing value from var;
which looks at the observations with vaues <= index instead. Now, |
What would you like gtools to add or change (and why)?
It would be fantastic if gtools had a gmerge command. Ftools seems to have join/fmerge that is a 2x speedup over merge but since it is implemented in mata it can't support mixed types.
Please include a specific suggestion
Add gmerge command that implements the standard merge functionality.
The text was updated successfully, but these errors were encountered: