Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coercion methods #49

Open
wbuchanan opened this issue Jan 23, 2016 · 2 comments
Open

Coercion methods #49

wbuchanan opened this issue Jan 23, 2016 · 2 comments

Comments

@wbuchanan
Copy link

More of a question/potential enhancement request than anything. Basically, I was just wondering what it would take to create methods to coerce existing objects into a DataFrame object? I would imagine 2d Arrays would be fairly easy to handle (although I could be completely wrong). My hope was that as I get some other work wrapped up on some readers/parsers for Stata formatted files (as well as others in the future) it'd be possible to build the classes/methods around an idea of being able to coerce the data into a DataFrame (then there'd be the advantage of joins/unions of files from different statistical software platforms). Also, I haven't looked too much into the documentation yet, but if there is a way to retain any metadata with the file that would be helpful as well (e.g., variable labels (distinct from column names), value labels (e.g., analogous to descriptions in a look up table in a SQL database), etc...).

@cardillo cardillo added this to the 1.8 milestone Jan 23, 2016
@cardillo
Copy link
Owner

There are currently methods to read and write csv and Excel files, generally these provide the interoperability I need. That said, I release they are rather low fidelity (i.e. they preserve column names but not much else). There are also methods to convert to 2d arrays, but not from. I think this would be a useful addition. Also, reading and writing other formats would be useful as well. I can take a look at adding these features or will gladly merge a pull request.

Variable labels might be a little more difficult, Joinery doesn't currently store any additional information about the individual data points. While this certainly could be added, it isn't as high a priority for me personally. But again, pull requests are welcome.

@wbuchanan
Copy link
Author

The only working example I would have at the moment is some work I did on serializing data in memory to a JSON object using Stata's Java API https://github.com/wbuchanan/StataJSON. I've broken some of the work there into more generic classes here as well as trying to potentially test coercing some of the data to a DataFrame. There is a C library the could be helpful for parsing files from statistical packages, but I'm not terribly familiar with JNI or how the C library is working (https://github.com/WizardMac/ReadStat). I think once I can figure out how to get the data into a DataFrame object I could probably figure out how to get it into an object suitable for Stata.

@cardillo cardillo removed this from the 1.8 milestone Feb 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants