Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue skipped_columns not working #15947 #16071

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

Devanshusisodiya
Copy link

@Devanshusisodiya Devanshusisodiya commented Feb 13, 2024

Closes #15947

The changes done in frame.py have been validated with the required tests of H2OFrame being tested in /h2o-py/tests/testdir_apis/Data_Manipulation/pyunit_h2oH2OFrame.py

image

skipped_columns parameter is now functional and all tests are passing.

@Devanshusisodiya Devanshusisodiya changed the title Fix for issue #15947 Fix for issue skipped_columns not working #15947 Feb 13, 2024
@Devanshusisodiya
Copy link
Author

@wendycwong @tomasfryda please review this pr.

Copy link
Contributor

@tomasfryda tomasfryda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Devanshusisodiya for spending time on this issue. Unfortunately, your solution doesn't seem to me to be valid. Please don't be discouraged, H2O-3 doesn't have code base that's easy to contribute to.

@@ -146,6 +146,12 @@ def _upload_python_object(self, python_obj, destination_frame=None, header=0, se
if not column_names:
column_names = col_header

if skipped_columns:
column_names = [column_name for column_name in column_names[:len(column_names)-len(skipped_columns)]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ignores the skipped columns so in some cases the skipped column is present in the resulting frame.

Let's have a frame:

| a | b | c |
+---+---+---+
| 1 | 2 | 3 |
| 4 | 5 | 6 |

and if we have skipped_columns=[0,1], then we would get just:

| a |
+---+
| 1 | 
| 4 |

but I would expect to get:

| c |
+---+
| 3 |
| 6 |

Also note that it's likely that skipped_columns should support inputs that are column indices (List[int]) or column names (List[str]).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomasfryda I understand your point, nevertheless i understand the problem and will try to solve it.

Are there any other cases I should be aware of?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also make sure you modify the csv writing accordingly, so that only those columns that are not skipped are written. And please write a test for those cases when there's skipped_columns are at the beginning, the end, and in the middle (and possibly more cases depending on your actual implementation).

And while you're at it, you can also improve the code so that it makes sure that the tmp_path is removed, e.g.:

tmp_handle, tmp_path = tempfile.mkstemp(suffix=".csv")
try:
    .....
finally:
    os.remove(tmp_path)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomasfryda I made more changes based on the requirements and the original issue description. Also I think there is no need to make any in change csv writing because it takes the updated column_names as argument.

Please correct me if I'm wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python H2OFrame constructor 'skipped_columns' doesn't work
2 participants