Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to recognize the tokenizer #2

Open
harish2sista opened this issue Jan 6, 2024 · 2 comments
Open

Failing to recognize the tokenizer #2

harish2sista opened this issue Jan 6, 2024 · 2 comments

Comments

@harish2sista
Copy link

harish2sista commented Jan 6, 2024

Hi,

I am trying to educate myself on how to use design-bench APIs. I tried to execute the "Reproducing Baseline Performance," and I keep getting the following error:

2024-01-05 18:53:35.795906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1
Traceback (most recent call last):
File "/home/user_name/miniconda3/envs/design-baselines/bin/design-baselines", line 33, in
sys.exit(load_entry_point('design-baselines', 'console_scripts', 'design-baselines')())
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/user_name/Desktop/Projects/Superconductor-OKD/design-baselines/design_baselines/cli.py", line 804, in make_table
from design_bench.datasets.discrete.tf_bind_8_dataset import TFBind8Dataset
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/init.py", line 766, in
feature_extractor=MorganFingerprintFeatures(dtype=np.int32),
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/oracles/feature_extractors/morgan_fingerprint_features.py", line 74, in init
os.path.join(DATA_DIR, 'smiles_vocab.txt'))
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/deepchem/feat/smiles_tokenizer.py", line 87, in init
super().init(vocab_file, **kwargs)
File "/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/transformers/tokenization_bert.py", line 196, in init
"model use tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)".format(vocab_file)
ValueError: Can't find a vocabulary file at path '/home/user_name/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench_data/smiles_vocab.txt'. To load the vocabulary from a Google pretrained model use tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)

Can someone help me with this?

@harish2sista
Copy link
Author

Hi,

I have checked the source of this error. In the design-bench package that has been installed, the "design_bench_directory" is empty; there are no vocabulary files in it. Could you please guide me on how to install them if I have missed any steps?

@brandontrabucco
Copy link
Member

brandontrabucco commented Jan 29, 2024

Hello harish2sista,

Thanks for your interest in our benchmark!

I'm maintaining the benchmark on my personal github account at this location:
https://github.com/brandontrabucco/design-bench

The GCP bucket that contained the benchmark data was recently lost, and I'm currently working on a solution for this and migrating the benchmark to use Google Drive instead. The smiles_vocab.txt file is available in this folder:

https://drive.google.com/drive/folders/1FDoM9wWBm7ziWOSyY5V7eE1bx0mXQYwp

-Brandon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants