-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements and additions to gtools #30
Comments
Here are some ideas I have for an API. All the functions (except gisid, which executes 1-4 but is different onward) have a commonality to them:
Steps 2, 3, 6, and 7 require a copy of the data be available for C in memory. Saving the results in steps 3, 4, 5, or 6-8 would require creating variables in Stata in addition to allocating memory in C. To interact with Stata, there is an inefficiency throughout in casting doubles to and from 64-bit integers. To call from C directly, there would have to be a generic way to load the data into memory. Some stuff I could write:
|
Have you checked out the ReadStat library? It is the underlying C library used for the haven package in R that reads/writes R, SPSS, SAS, and Stata datasets. Perhaps that would be a way to load data into memory? I’m not sure how garbage collection works with the C API, but if the objects can persist beyond a single call it might make it possible to load multiple datasets simultaneously. I’m not familiar with C at all or I would offer to try helping when I can. |
I have this on my list of things to check out. Not sure if it will drastically improve gcollapse or greshape (the main issue there is the inability to create/drop observations and variables in memory). However, I am planning to implement gmerge at some point, and I think the way to go is to try to read the using data via ReadStat, if I can manage. EDIT: Actually, it should improve it a lot, now that I think about it. If i can save the characteristics of the dataset in memory, save the results from gcollapse/greshape to disk, then do |
If you were using Java I might be able to help a bit more since that is what I’m more familiar with, but I’ve also been experimenting with trying to do some of this directly in Mata. |
@wbuchanan Do you know if it is possible to read data directly from disk when using Java? |
@mcaceresb |
A lengthy discussion on improvements and additions to gtools started in issue #28, but it is more appropriate to have a sepparate thread for it. The main idea currently being discussed as a gtools API, which would consist of various wrappers to the core functionality of gtools.
I am not sure the Stata portion of the API will be as useful as the ftools analogue due to the way in which the Stata Plugin Interface works (which is that I have to use to interact with Stata via C). However, it might be useful in ways I have not considered, hence this thread (and I am also thinking of creating a C library based on this plugin, which would be useful for people who aim to write C plugins in the future).
Feel free to make any suggestions or comments on what you would like to see in a gtools API here, as well as any other features and suggestions that you don't think merit their own thread. This issue will remain open past version 1.0, since an API won't make it to that release.
The text was updated successfully, but these errors were encountered: