quantization and gpu acceleration of the quantized model. #88

BBC-Esq · 2023-09-29T23:08:56Z

Wasn't sure if this was technically a new issue but just in case I'm reposting here:

I figured out how to dynamically quantize the instructor-xl model, but at the point that it's supposed to create the embeddings, i want it to use gpu acceleration (cuda) just like it does when I use the float32 version of the model. Is that possible? If I understand the comments above, it's not? What about quantizing the model beforehand NOT using the "dynamic" method? I've been struggling with this for months so any help would be much appreciated. The link above is to a discussion back in 2021 and "seek other solutions" doesn't point me in the right direction so...I'm looking at bitsandbytes but couldn't find a solution either... Here is the portion of the script I'm trying to use:

    if "instructor" in EMBEDDING_MODEL_NAME:
        # Create the instructor embeddings object
        embeddings = HuggingFaceInstructEmbeddings(
            model_name=EMBEDDING_MODEL_NAME,
            model_kwargs={"device": COMPUTE_DEVICE},
            query_instruction="Represent the document for retrieval."
        )
        
        # Quantize the instructor model on the CPU
        embeddings.client = quantize_dynamic(embeddings.client, dtype=torch.qint8)
        
        # Move the quantized model to the GPU
        embeddings.client = embeddings.client.to('cuda')
    elif "bge" in EMBEDDING_MODEL_NAME and "large-en-v1.5" not in EMBEDDING_MODEL_NAME:
        embeddings = HuggingFaceBgeEmbeddings(
            model_name=EMBEDDING_MODEL_NAME,
            model_kwargs={"device": COMPUTE_DEVICE},
            encode_kwargs={'normalize_embeddings': True}
        )
    else:
        embeddings = HuggingFaceEmbeddings(
            model_name=EMBEDDING_MODEL_NAME,
            model_kwargs={"device": COMPUTE_DEVICE},
        )

Also, when I try to use instructor-xl with the huggingfaceembeddings class specifically (i.e. not using embeddings = qmodel.encode([[instruction,sentence]]) it won't work either...

The text was updated successfully, but these errors were encountered:

hongjin-su · 2023-12-19T09:38:02Z

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

From these blogs (1, 2), it seems that dynamic quantization is not supported for GPUs. I am not sure whether there are more recent updates about this.

BBC-Esq closed this as completed Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization and gpu acceleration of the quantized model. #88

quantization and gpu acceleration of the quantized model. #88

BBC-Esq commented Sep 29, 2023 •

edited

Loading

hongjin-su commented Dec 19, 2023

quantization and gpu acceleration of the quantized model. #88

quantization and gpu acceleration of the quantized model. #88

Comments

BBC-Esq commented Sep 29, 2023 • edited Loading

hongjin-su commented Dec 19, 2023

BBC-Esq commented Sep 29, 2023 •

edited

Loading