Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for additional packages in purl2url #143

Open
johnmhoran opened this issue Jan 23, 2024 · 10 comments
Open

Add support for additional packages in purl2url #143

johnmhoran opened this issue Jan 23, 2024 · 10 comments
Assignees

Comments

@johnmhoran
Copy link
Member

This is related to the PURL CLI tool/library described in aboutcode-org/purldb#247.

@johnmhoran
Copy link
Member Author

johnmhoran commented Mar 11, 2024

@TG1999 @keshav-space Yesterday I installed requests in my local repo fork of packageurl-python so I could explore getting download_url data from the pypi API ( and I am able to do that now). If I run pip list in the command line for that local repo, I get

$ pip list
Package            Version
------------------ --------
certifi            2024.2.2
charset-normalizer 3.3.2
idna               3.6
pip                24.0
requests           2.31.0
setuptools         69.1.0
urllib3            2.2.1
wheel              0.42.0

(venv) Mon Mar 11, 2024 10:30 AM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$

However, when I run bin/py.test tests/contrib/test_purl2url.py -vvs I get the error ERROR tests/contrib/test_purl2url.py - ModuleNotFoundError: No module named 'requests'.

I am exploring the purl2url work from my local "sandbox" -- simply another repo from inside of which I've run

pip install -e /home/jmh/dev/nexb/packageurl-python 

so I can access my changes in purl2url.py from that sandbox. However, inside my forked packageurl-python repo, there is no requirements.txt, and its setup.cfg contains

[options]
python_requires = >=3.7
packages = find:
package_dir = =src
include_package_data = true
zip_safe = false
install_requires =

but nothing listed under install_requires.

I think I need somehow to rerun make dev in this local fork, perhaps preceded by adding the requests library to the setup.cfg or creating a requirements.txt containing requests -- but I'm a bit reluctant to do so without confirming with you, concerned that I might mess up my local packageurl-python fork. Do you have any suggestions?

@johnmhoran
Copy link
Member Author

johnmhoran commented Mar 11, 2024

In the packageurl-python fork setup.cfg I added:

install_requires =
    requests == 2.31.0

and in /home/jmh/dev/nexb/packageurl-python I ran pip install -e ., but when I reran bin/py.test tests/contrib/test_purl2url.py -vvs I again got ERROR tests/contrib/test_purl2url.py - ModuleNotFoundError: No module named 'requests'.

@johnmhoran
Copy link
Member Author

This suggests to me that requests has been installed (and BTW so does my testing yesterday from my sandbox repo of this same packageurl-python repo/code):

(venv) Mon Mar 11, 2024 12:15 PM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$ python
Python 3.8.10 (default, Nov 22 2023, 10:22:35)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> requests.get("https://pypi.org/pypi/fetchcode")
<Response [200]>
>>> requests.get("https://pypi.org/pypi/fetchcode/json")
<Response [200]>
>>> exit()

(venv) Mon Mar 11, 2024 12:15 PM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$

@johnmhoran
Copy link
Member Author

Running pip install -e . after installing requests and updating setup.cfg did not fix the pytest no-module-found errors for requests in my forked packageurl-python repo -- but make clean followed by make dev did. Now there are a few failing tests, but that's OK. I do wonder why pip install -e . was not sufficient to fix the pytest no-module-found error....

For the record, this was the full error from pytest:

(venv) Mon Mar 11, 2024 01:19 PM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$ bin/py.test tests/contrib/test_purl2url.py -vvs
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.8.10, pytest-7.4.4, pluggy-1.4.0 -- /home/jmh/dev/nexb/packageurl-python/bin/python
cachedir: .pytest_cache
rootdir: /home/jmh/dev/nexb/packageurl-python
configfile: setup.cfg
collected 0 items / 2 errors

==================================================================================================== ERRORS =====================================================================================================
________________________________________________________________________________ ERROR collecting tests/contrib/test_purl2url.py ________________________________________________________________________________
tests/contrib/test_purl2url.py:29: in <module>
    from packageurl.contrib import purl2url
lib/python3.8/site-packages/_pytest/assertion/rewrite.py:186: in exec_module
    exec(co, module.__dict__)
src/packageurl/contrib/purl2url.py:27: in <module>
    import requests
E   ModuleNotFoundError: No module named 'requests'
________________________________________________________________________________ ERROR collecting tests/contrib/test_purl2url.py ________________________________________________________________________________
ImportError while importing test module '/home/jmh/dev/nexb/packageurl-python/tests/contrib/test_purl2url.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
lib/python3.8/site-packages/_pytest/python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
lib/python3.8/site-packages/_pytest/pathlib.py:567: in import_path
    importlib.import_module(module_name)
/usr/lib/python3.8/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1014: in _gcd_import
    ???
<frozen importlib._bootstrap>:991: in _find_and_load
    ???
<frozen importlib._bootstrap>:975: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:671: in _load_unlocked
    ???
lib/python3.8/site-packages/_pytest/assertion/rewrite.py:186: in exec_module
    exec(co, module.__dict__)
tests/contrib/test_purl2url.py:29: in <module>
    from packageurl.contrib import purl2url
lib/python3.8/site-packages/_pytest/assertion/rewrite.py:186: in exec_module
    exec(co, module.__dict__)
src/packageurl/contrib/purl2url.py:27: in <module>
    import requests
E   ModuleNotFoundError: No module named 'requests'
============================================================================================ short test summary info ============================================================================================
ERROR tests/contrib/test_purl2url.py - ModuleNotFoundError: No module named 'requests'
ERROR tests/contrib/test_purl2url.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================================================================== 2 errors in 0.22s ===============================================================================================

(venv) Mon Mar 11, 2024 01:19 PM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$

@JonoYang
Copy link
Collaborator

@johnmhoran Looking at your terminal prompt, this is what I think may be happening:

(venv) Mon Mar 11, 2024 01:19 PM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$ bin/py.test tests/contrib/test_purl2url.py -vvs

I think you have installed packageurl-python and requests to the venv virtual environment, but you are running bin/py.test from the packageurl-python directory, where bin/py.test is handled by a different virtual environment than the one you're in. Try running py.test tests/contrib/test_purl2url.py -vvs to use the py.test handled by venv.

@johnmhoran
Copy link
Member Author

@JonoYang Running py.test tests/contrib/test_purl2url.py -vvs threw an error

ImportError: No module named packageurl.contrib

But I think my running make clean then make dev, as I noted above, was the fix to the requests-related ModuleNotFoundError. Running pip install -e . was not enough to get requests loaded, evidently.

The 2 failing tests I now get are OK -- that's because I added the ability to actually get the pypi download_url for tar.gz downloads. However I have several questions about what our goals are in these tests. One failing test looks to get a pypi .whl as a download -- it seems that's just to test in case pypi is not yet supported for downloads (as has been the case until now).

Do you have time to discuss?

@johnmhoran
Copy link
Member Author

@TG1999 @keshav-space @tdruez I can now get a download_url for pypi PURLs (though the code is not quite ready for prime time). Looking at the pypi JSON structure/content I get from requests.get() and at our current tests , if the few JSON examples I've seen are representative, we can retrieve either a .whl (using "packagetype": "bdist_wheel"``) or a .tar.gz (using `"packagetype": "sdist"`). I have drafted the pypi download_url function for now to

  • retrieve the basic pypi.org url if no version is included with the PURL (e.g., https://pypi.org/project/aboutcode-toolkit/)

  • retrieve a .tar.gz if a version is supplied with the PURL (e.g., https://files.pythonhosted.org/packages/6a/16/9191e46344d6a5e98afa74730340bc5f82f2c9ac7922ac4a16e58885a652/aboutcode-toolkit-3.4.0rc1.tar.gz) (appears in download_url and in the list for inferred_urls)

  • retrieve both a .tar.gz (appears in download_url and in the list for inferred_urls) and a .whl (appears in repo_download_url) if the PURL includes a ?download_url= qualifier seeking a .whl

  • retrieve just a .tar.gz if the PURL includes a ?download_url= qualifier seeking a .tar.gz (appears in download_url, in the list for inferred_urls and in repo_download_url)

I see a variety of test PURL inputs and expected outputs in our tests but our actual goals for the purl2url.py output are not 100% clear. Is the approach I described above what we want? If not, please let me know what changes you want me to make in the data we retrieve. (At the risk of creating clutter, I'll paste sample output in the next comment below so you have the actual output data to examine.)

@johnmhoran
Copy link
Member Author

Rather than post the verbose output here I pasted to a .txt I'll upload....

packageurl-python-purl2url-pypi-sample-output-2024-03-11.txt

@johnmhoran
Copy link
Member Author

@TG1999 Further to your (and other) comments in the recently-closed prior PR 151, I've removed most of my prior code, and this issue -- and the new PR I'll open shortly -- now focus on adding repo URL support and testing for cocoapods (pypi support is already there and fine) and additional pypi testing.

I'll turn next to fetchcode/package.py to add download URL (and other) support for cocoapods and pypi.

@johnmhoran
Copy link
Member Author

@TG1999 Actually, I'd forgotten that fetchcode/package.py already handles pypi, including providing a single download URL entry (just one, as is the case for the other supported types as well, although there are often additional download files available).

I have a few questions for you and @pombredanne about the details (e.g., do we want to add the ability for additional download files as a list or otherwise) and will ask them in the related fetchcode issue I opened recently.

Re that question about multiple download files, I also raised it earlier in this issue (see this comment) -- this question is still a live question for you and @pombredanne -- I understand that I cannot simply modify the current inferred URLs function because people rely on its current form -- do we want to add this capability and, if so, how? We might want the download URL value to be a list rather than a single URL, and we might want the inferred URLs list to include more than the current repo and download URL values, but all of that would most naturally involve modifying the existing functions, which we don't want to do.

Please let me know what you think.

johnmhoran added a commit to johnmhoran/packageurl-python that referenced this issue Apr 8, 2024
johnmhoran added a commit to johnmhoran/packageurl-python that referenced this issue May 30, 2024
johnmhoran added a commit to johnmhoran/packageurl-python that referenced this issue Jun 5, 2024
johnmhoran added a commit to johnmhoran/packageurl-python that referenced this issue Jun 7, 2024
johnmhoran added a commit to johnmhoran/packageurl-python that referenced this issue Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants