Access to .npy datasets #1

preritt · 2022-05-14T17:24:48Z

Hi,
Thank you for releasing the package!
I wanted to check the procedure to access the offline datasets. It seems these are not part of the repo. I am not sure if I am missing something.

For example, I get the following error when using
task = design_bench.make('ChEMBL-ResNet-v0')
FileNotFoundError: [Errno 2] No such file or directory:
/chembl-GI50-CHEMBL1964047/chembl-y-2.npy'

Thank you!

The text was updated successfully, but these errors were encountered:

brandontrabucco · 2022-05-16T00:38:52Z

Hello preritt,

Thanks for your interest in the benchmark. If you would like to download the entire benchmark at once to access the raw .npy files, they are available at the following gcp bucket:

https://github.com/rail-berkeley/design-bench/blob/new-api/design_bench/disk_resource.py#L7

This post may be of interest if you are not familiar with gsutil:

https://stackoverflow.com/questions/58581873/how-to-download-an-entire-bucket-in-gcp

Generally speaking, the dataset files are downloaded as needed from gcp when design_bench.make is called. Could you share the full script producing the error, and the full stack trace?

Warm regards,
Brandon

preritt · 2022-05-21T04:44:35Z

Hi Brandon,

Sorry for the delayed response.
Thanks for the information!
Here is the code I used

import design_bench

# task = design_bench.make('TGFP-Transformer-v0')
# task = design_bench.make('TFBind8-Exact-v0')
task = design_bench.make('ChEMBL-ResNet-v0')

This is the error

`Traceback (most recent call last):

  File "/BerkleyDesignBenchVer01/testBerkleyV1.py", line 12, in <module>
    task = design_bench.make('ChEMBL-ResNet-v0')

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 328, in make
    oracle_kwargs=oracle_kwargs, **kwargs)

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 157, in make
    oracle_kwargs=oracle_kwargs, **kwargs)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 111, in make
    oracle_kwargs=oracle_kwargs_final, **kwargs)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/task.py", line 245, in __init__
    dataset = import_name(dataset)(**kwargs)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/discrete/chembl_dataset.py", line 310, in __init__
    soft_interpolation=soft_interpolation, **kwargs)

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/discrete_dataset.py", line 279, in __init__
    super(DiscreteDataset, self).__init__(*args, **kwargs)

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 470, in __init__
    for i, y in enumerate(self.iterate_samples(return_x=False)):

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 865, in iterate_samples
    return_x=return_x, return_y=return_y):

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 762, in iterate_batches
    y_shard_data = self.get_shard_y(shard_id)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 566, in get_shard_y
    return np.load(self.y_shards[shard_id].disk_target)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/numpy/lib/npyio.py", line 416, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))

FileNotFoundError: [Errno 2] No such file or directory: 'BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench_data/chembl-GI50-CHEMBL1964047/chembl-y-2.npy'`

I'll try the GCP method and get back in case of error.

Thank you so much for your response!

brandontrabucco · 2022-05-21T17:43:40Z

Could you try calling design_bench.make on a ChEMBL task with the following format:

https://github.com/rail-berkeley/design-bench/blob/new-api/design_bench/__init__.py#L809

For example, design_bench.make("ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0")

preritt · 2022-05-22T02:16:28Z

I tried the following:
task =design_bench.make("ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0")
However, I got the following error now.

Traceback (most recent call last):

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 201, in spec
    return self.task_specs[task_name]

KeyError: 'ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "perspectaTestsVer2/perspectaV1/myCodesV9Della/BerkleyDesignBenchVer01/testBerkleyV1.py", line 13, in <module>
    task =design_bench.make("ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0")

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 328, in make
    oracle_kwargs=oracle_kwargs, **kwargs)

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 155, in make
    return self.spec(task_name).make(

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 232, in spec
    UNKNOWN_MESSAGE.format(task_name))

ValueError: No registered task with name: ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0

brandontrabucco · 2022-05-22T02:25:32Z

Could you check which version number of the benchmark you have installed?

preritt · 2022-05-22T02:27:57Z

It is 2.0.12

design-bench 2.0.12 pypi_0 pypi

brandontrabucco · 2022-05-22T02:29:27Z

The latest version of the benchmark is 2.0.20, could you try that version?

preritt · 2022-05-22T02:36:51Z

I have the correct version now:

design-bench 2.0.20 pypi_0 pypi

Not sure why, but now I get an import error when using:
import design_bench

runcell(0, '/BerkleyDesignBenchVer01/testBerkleyV1.py')
Traceback (most recent call last):

  File "/BerkleyDesignBenchVer01/testBerkleyV1.py", line 8, in <module>
    import design_bench

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/__init__.py", line 766, in <module>
    feature_extractor=MorganFingerprintFeatures(dtype=np.int32),

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/oracles/feature_extractors/morgan_fingerprint_features.py", line 74, in __init__
    os.path.join(DATA_DIR, 'smiles_vocab.txt'))

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/deepchem/feat/smiles_tokenizer.py", line 89, in __init__
    self.max_len_single_sentence = self.max_len - 2

AttributeError: 'SmilesTokenizer' object has no attribute 'max_len'

brandontrabucco · 2022-05-22T02:39:52Z

Ah, this can happen if an incompatible version of deepchem is installed. Can you try installing the version of deepchem listed here: https://github.com/brandontrabucco/design-baselines/blob/master/requirements.txt#L29

I'm not sure if that's the only package that may need an update, so perhaps check the whole requirements file.

preritt · 2022-05-22T03:24:44Z

Thanks a lot! I did a pip install on the requirements and it resolved the issue.

harish2sista mentioned this issue Jan 12, 2024

error while importing design-bench brandontrabucco/design-bench#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access to .npy datasets #1

Access to .npy datasets #1

preritt commented May 14, 2022

brandontrabucco commented May 16, 2022 •

edited

Loading

preritt commented May 21, 2022

brandontrabucco commented May 21, 2022 •

edited

Loading

preritt commented May 22, 2022

brandontrabucco commented May 22, 2022

preritt commented May 22, 2022

brandontrabucco commented May 22, 2022

preritt commented May 22, 2022 •

edited

Loading

brandontrabucco commented May 22, 2022

preritt commented May 22, 2022

Access to .npy datasets #1

Access to .npy datasets #1

Comments

preritt commented May 14, 2022

brandontrabucco commented May 16, 2022 • edited Loading

preritt commented May 21, 2022

brandontrabucco commented May 21, 2022 • edited Loading

preritt commented May 22, 2022

brandontrabucco commented May 22, 2022

preritt commented May 22, 2022

brandontrabucco commented May 22, 2022

preritt commented May 22, 2022 • edited Loading

brandontrabucco commented May 22, 2022

preritt commented May 22, 2022

brandontrabucco commented May 16, 2022 •

edited

Loading

brandontrabucco commented May 21, 2022 •

edited

Loading

preritt commented May 22, 2022 •

edited

Loading