-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using NLU for biobert embeddings -- takes a really long time on list of 10,000 words, and on 1 word #103
Comments
Hi @krico1 large embeddings like biobert can be quite slow because of the large deep learning models used for it. All you need to do is set See this notebook as a reference. Also note: If you have a large dataset at hand, it will be faster to feed NLU all the data at once instead of one by one |
@C-K-Loan Hi! Unfortunately, I am not able to obtain the embeddings (even when adding get_embeddings= True). I tried with multiple models, and by including other parameters but with no success. thank you! |
@C-K-Loan I have the same problem as @MargheCap . I assume it has something to do with how we install the With my installation (below), I get this rather slow calculation: And I checked the GPU visibility to Tensorflow: import tensorflow as tf
tf.config.list_physical_devices('GPU') gives --> For installation, I used: !wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash
import nlu
pipe = nlu.load('en.embed_sentence.biobert.pmc_base_cased', gpu=True) I used this installation because it was proposed in this colab-sheet: https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/component_examples/sentence_embeddings/NLU_BERT_sentence_embeddings_and_t-SNE_visualization_Example.ipynb#scrollTo=rBXrqlGEYA8G |
Hi, so we are working on generating biobert embeddings for our project. When we run it on a single word it takes about a second or so. When we run on a list of 10,000 words, it either times out or takes upwards of hours to run. Is this normal? Below is how we are using it:
def load_biobert(self):
# Load BioBERT model (for sentence-type embeddings)
self.logger.info("Loading BioBERT model...")
start = time.time()
biobert = nlu.load('en.embed_sentence.biobert.pmc_base_cased')
end = time.time()
self.logger.info('done (BioBERT loading time: %.2fs seconds)', end - start)
return biobert
def get_biobert_embeddings(self, strings):
embedding_list = []
for string in strings:
self.logger.debug("...Generating embedding for: %s", string)
embedding_list.append(self.get_biobert_embedding(string))
return embedding_list
def get_biobert_embedding(self, string):
embedding = self.biobert.predict(string, output_level='sentence', get_embeddings=True)
return embedding.sentence_embedding_biobert.values[0]
The text was updated successfully, but these errors were encountered: