Failing to recognize the tokenizer #2

harish2sista · 2024-01-06T00:00:22Z

Hi,

I am trying to educate myself on how to use design-bench APIs. I tried to execute the "Reproducing Baseline Performance," and I keep getting the following error:

2024-01-05 18:53:35.795906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1
Traceback (most recent call last):
File "/home/user_name/miniconda3/envs/design-baselines/bin/design-baselines", line 33, in
sys.exit(load_entry_point('design-baselines', 'console_scripts', 'design-baselines')())
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/user_name/Desktop/Projects/Superconductor-OKD/design-baselines/design_baselines/cli.py", line 804, in make_table
from design_bench.datasets.discrete.tf_bind_8_dataset import TFBind8Dataset
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/init.py", line 766, in
feature_extractor=MorganFingerprintFeatures(dtype=np.int32),
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/oracles/feature_extractors/morgan_fingerprint_features.py", line 74, in init
os.path.join(DATA_DIR, 'smiles_vocab.txt'))
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/deepchem/feat/smiles_tokenizer.py", line 87, in init
super().init(vocab_file, **kwargs)
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/transformers/tokenization_bert.py", line 196, in init
"model use tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)".format(vocab_file)
ValueError: Can't find a vocabulary file at path '/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench_data/smiles_vocab.txt'. To load the vocabulary from a Google pretrained model use tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)

Can someone help me with this?

The text was updated successfully, but these errors were encountered:

harish2sista · 2024-01-08T14:18:27Z

Hi,

I have checked the source of this error. In the design-bench package that has been installed, the "design_bench_directory" is empty; there are no vocabulary files in it. Could you please guide me on how to install them if I have missed any steps?

brandontrabucco · 2024-01-29T21:31:14Z

Hello harish2sista,

Thanks for your interest in our benchmark!

I'm maintaining the benchmark on my personal github account at this location:
https://github.com/brandontrabucco/design-bench

The GCP bucket that contained the benchmark data was recently lost, and I'm currently working on a solution for this and migrating the benchmark to use Google Drive instead. The smiles_vocab.txt file is available in this folder:

https://drive.google.com/drive/folders/1FDoM9wWBm7ziWOSyY5V7eE1bx0mXQYwp

-Brandon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing to recognize the tokenizer #2

Failing to recognize the tokenizer #2

harish2sista commented Jan 6, 2024 •

edited

Loading

harish2sista commented Jan 8, 2024

brandontrabucco commented Jan 29, 2024 •

edited

Loading

Failing to recognize the tokenizer #2

Failing to recognize the tokenizer #2

Comments

harish2sista commented Jan 6, 2024 • edited Loading

harish2sista commented Jan 8, 2024

brandontrabucco commented Jan 29, 2024 • edited Loading

harish2sista commented Jan 6, 2024 •

edited

Loading

brandontrabucco commented Jan 29, 2024 •

edited

Loading