Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Textattack for cuml models not utilising much GPU resources #790

Open
farwashah6 opened this issue May 13, 2024 · 4 comments
Open

Textattack for cuml models not utilising much GPU resources #790

farwashah6 opened this issue May 13, 2024 · 4 comments
Labels
question Further information is requested

Comments

@farwashah6
Copy link

Hi. I am new to using GPU. I have used the Textattack library earlier for one of my projects using Sklearn and Keras models. For that I created the customModelWrappers according to my models and they worked fine. Now since my data is different and very big, I want to implement it using GPU for the same (sklearn) models.

I have the understanding that sklearn models do not implement on GPU and I have to use CUML instead. But when I use CUML, and pass the cuml model to the CustomModelWrapper I created earlier, it gives me the following error
len() of unsized object
and then stops the execution.

Additional Info: For vectorisation of my data I am using CountVectorizer of cuml, which is the cause of this error. Instead when I use CountVectorizer of sklearn it does the attack but doesn't use much GPU resources (of course). Please help me in this.

I am attaching my modelWrapper here.

class CustomModelWrapper(ta.models.wrappers.ModelWrapper):

    def __init__(self, model, vectorizer):
        super().__init__()
        self.model = model
        self.vectorizer = vectorizer

    def __call__(self, text_input_list, batch=None):
        x_transform = self.vectorizer.transform(pd.Series(text_input_list)).astype(float)
        prediction = self.model.predict_proba(x_transform)
        return prediction
@farwashah6 farwashah6 changed the title Textattack fro cuml models not utilising much GPU resources Textattack for cuml models not utilising much GPU resources May 13, 2024
@jxmorris12 jxmorris12 added the question Further information is requested label Jul 25, 2024
@RealPolitiX
Copy link

@farwashah6, I'm curious how big your model is and how much GPU resource it should occupy. I've encountered perhaps a similar issue with HuggingFaceModelWrapper, but found a temporary solution by commenting out a few lines in the textattack code. See #798

@farwashah6
Copy link
Author

@farwashah6, I'm curious how big your model is and how much GPU resource it should occupy. I've encountered perhaps a similar issue with HuggingFaceModelWrapper, but found a temporary solution by commenting out a few lines in the textattack code. See #798

Thanks for the suggestion. The models I am trying to use are SVM, Random Forest and Linear Regression. So not big at all. But the problem is that they are scikit learn models and they are not supported by GPU and when I use CUML library, I encounter this problem.

@beckernick
Copy link

I work on RAPIDS at NVIDIA and came across this issue due to the cuML reference.

I've seen this error come up in scenarios where a downstream function expects a data structure that support the len operator but instead gets a scalar or ndarray equivalent (i.e., np.ndarray(1) or cupy.ndarray(1)).

If you're running into scenarios in which a cuML model's output or behavior isn't consistent with scikit-learn (particularly if you're passing in CPU-based inputs), could you file an issue on cuML?

If you can provide a minimal, reproducible example of your error we may be able to help triage it.

@farwashah6
Copy link
Author

Thank you for the suggestion and yes I have also posted the issue on cuML and attached a sample code.

I work on RAPIDS at NVIDIA and came across this issue due to the cuML reference.

I've seen this error come up in scenarios where a downstream function expects a data structure that support the len operator but instead gets a scalar or ndarray equivalent (i.e., np.ndarray(1) or cupy.ndarray(1)).

If you're running into scenarios in which a cuML model's output or behavior isn't consistent with scikit-learn (particularly if you're passing in CPU-based inputs), could you file an issue on cuML?

If you can provide a minimal, reproducible example of your error we may be able to help triage it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants