Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated readme and added notebook #845

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

yash-kalathiya
Copy link

Why are these changes needed?

To add Google colab compatible notebook and enhance readme file.

Related issue number (if any).

844

@shahrokhDaijavad
Copy link
Member

A few comments to @yash-kalathiya: The notebook and README are generally fine, but my comments are about consistency with the other transform README and notebooks:
As an example, please refer to this transform: https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/doc_quality

  1. We put the main README in the Python folder of the transform and use a minimum README in the main folder (see example above), so I think your new README should be in the Python folder, but then your README should add some sections of the current README in the Python folder (like running the samples, etc).
  2. Please add Contributors and Description sections to the top of README.
  3. Your notebook should be in the main folder and not in a separate notebooks subfolder
  4. In the notebook, please add comment sections at the top (just like the doc_quality notebook) and about the transform parameters you are using

@yash-kalathiya
Copy link
Author

As per you said, I have changed readme file and also changed notebook directory.

@shahrokhDaijavad
Copy link
Member

@yash-kalathiya Thanks for making all the directory changes, but now I get errors running the notebook, as it doesn't find the transform. I have Python 3.10 on my laptop (I have 3.12 too, but not 3.11).
First, when I do make venv, I get the error:

Requirement already satisfied: six in ./venv/lib/python3.10/site-packages (from extractcode>=31.0.0->extractcode[full]>=31.0.0->scancode-toolkit==32.1.0->dpk_header_cleanser_transform_ray==0.2.3.dev0) (1.17.0)
INFO: pip is looking at multiple versions of extractcode[full] to determine which version is compatible with other requirements. This could take a while.
WARNING: Package 'extractcode' has an invalid Requires-Python: Invalid specifier: '>=3.6.*'
ERROR: Could not find a version that satisfies the requirement extractcode-7z>=16.5.210525; extra == "full" (from extractcode[full]) (from versions: none)
ERROR: No matching distribution found for extractcode-7z>=16.5.210525; extra == "full"
make[2]: *** [.defaults.install-local-requirements-venv] Error 1
make[1]: *** [.recurse] Error 2
make: *** [venv] Error 2

Then, running the notebook, I get:
image

@yash-kalathiya
Copy link
Author

I have try to run for both python version 3.10 and 3.12. It is running fine on my environment and I have shared screenshot of this. Can you once again check your side?

Screenshot from 2024-12-08 14-05-49
Screenshot from 2024-12-08 14-10-43
Screenshot from 2024-12-08 14-12-36
Screenshot from 2024-12-08 14-13-44

@shahrokhDaijavad
Copy link
Member

@yash-kalathiya I think the problem for my environment is still the extractcode package that gives the error below:

Using cached dockerfile_parse-2.0.1-py2.py3-none-any.whl.metadata (3.3 kB)
WARNING: Package 'extractcode' has an invalid Requires-Python: Invalid specifier: '>=3.6.'
Requirement already satisfied: six in ./venv/lib/python3.10/site-packages (from extractcode>=31.0.0->extractcode[full]>=31.0.0->scancode-toolkit==32.1.0->dpk_header_cleanser_transform_ray==0.2.3.dev0) (1.17.0)
INFO: pip is looking at multiple versions of extractcode[full] to determine which version is compatible with other requirements. This could take a while.
WARNING: Package 'extractcode' has an invalid Requires-Python: Invalid specifier: '>=3.6.
'
ERROR: Could not find a version that satisfies the requirement extractcode-7z>=16.5.210525; extra == "full" (from extractcode[full]) (from versions: none)
ERROR: No matching distribution found for extractcode-7z>=16.5.210525; extra == "full"
make[2]: *** [.defaults.install-local-requirements-venv] Error 1
make[1]: *** [.recurse] Error 2
make: *** [venv] Error 2

Can you please try: pip install scancode-toolkit==32.1.0 ; platform_system != Darwin in your venv and see if you get the error above or not? If you don't get the error above, that explains the problem I have and we can find a solution.
`

Signed-off-by: SHAHROKH DAIJAVAD <[email protected]>
@shahrokhDaijavad
Copy link
Member

@yash-kalathiya I pushed a "fixed" version of the code to the branch that successfully ran on my local environment. The 3 changes that I made are:

  1. I am pip installing the latest version of scancode-toolkit in the venv that we create locally. I could not get it to work without this pip install, though ideally, the transform toolkit should have taken care of this, but I think there is a problem in the version of scancode-toolkit that the transform toolkit uses.
  2. I ran the jupyterlab in the venv environment for consistency. You can see this in the comment section of the Notebook (the first cell).
  3. The change I made for the local path to input and output folders is just so that I could run it to the end successfully, and of course, the users will change that.

Copy link
Member

@shahrokhDaijavad shahrokhDaijavad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the functionality, after the latest changes I made.

@yash-kalathiya
Copy link
Author

Thank you, @shahrokhDaijavad , for resolving the version issue. I sincerely apologize for not being able to assist earlier as I was occupied with other work. I truly appreciate your efforts and understanding.

@shahrokhDaijavad
Copy link
Member

Thank you, @shahrokhDaijavad , for resolving the version issue. I sincerely apologize for not being able to assist earlier as I was occupied with other work. I truly appreciate your efforts and understanding.
No problem and you are welcome, @yash-kalathiya.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants