Got 'Blas xGEMMBatched launch failed' using BERT + BiLSTM #497

yanwii · 2023-01-10T16:11:37Z

You must follow the issue template and provide as much information as possible. otherwise, this issue will be closed.
请按照 issue 模板要求填写信息。如果没有按照 issue 模板填写，将会忽略并关闭这个 issue

Check List

Thanks for considering to open an issue. Before you submit your issue, please confirm these boxes are checked.

You can post pictures, but if specific text or code is required to reproduce the issue, please provide the text in a plain text format for easy copy/paste.

[Y] I have searched in existing issues but did not find the same one.
[Y ] I have read the documents

Environment

Debian 11
Python3.6.8
requirements.txt:

cudatoolkit               10.0.130                      0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cudnn                     7.6.5                cuda10.0_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
kashgari                  1.1.5                    pypi_0    pypi
keras                     2.3.1                    pypi_0    pypi
keras-applications        1.0.8                      py_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
keras-bert                0.89.0                   pypi_0    pypi
keras-embed-sim           0.10.0                   pypi_0    pypi
keras-gpt-2               0.17.0                   pypi_0    pypi
keras-layer-normalization 0.16.0                   pypi_0    pypi
keras-multi-head          0.29.0                   pypi_0    pypi
keras-pos-embd            0.13.0                   pypi_0    pypi
keras-position-wise-feed-forward 0.8.0                    pypi_0    pypi
keras-preprocessing       1.1.2              pyhd3eb1b0_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
keras-self-attention      0.51.0                   pypi_0    pypi
keras-transformer         0.40.0                   pypi_0    pypi
numpy                     1.16.4                   pypi_0    pypi
numpy-base                1.19.2           py36hfa32c7d_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorboard               1.14.0           py36hf484d3e_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
tensorflow                1.14.0          gpu_py36h57aa796_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorflow-addons         0.9.1                    pypi_0    pypi
tensorflow-estimator      1.14.0                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorflow-gpu            1.14.0                   pypi_0    pypi

And also nvidia-smi

Tue Jan 10 11:08:55 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   39C    P8    19W / 220W |    568MiB /  7979MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1288      G   /usr/lib/xorg/Xorg                223MiB |
|    0   N/A  N/A      1402      G   /usr/bin/gnome-shell               71MiB |
|    0   N/A  N/A      1698      G   ...b/firefox-esr/firefox-esr      171MiB |
|    0   N/A  N/A      1969      G   ...b/firefox-esr/firefox-esr        3MiB |
|    0   N/A  N/A      2217      G   ...RendererForSitePerProcess       90MiB |
|    0   N/A  N/A      9821      G   ...b/firefox-esr/firefox-esr        3MiB |
+-----------------------------------------------------------------------------+

My model:

import pandas as pd
import kashgari
from kashgari.embeddings import BERTEmbedding
from kashgari.tasks.classification import BiLSTM_Model
import numpy
import os

BERT_PATH = r'/chinese_L-12_H-768_A-12'


# 初始化 Embeddings
embed = BERTEmbedding(BERT_PATH,
                     task=kashgari.CLASSIFICATION,
                     sequence_length=64, layer_nums=4)

tokenizer = embed.tokenizer

df = pd.read_excel('data.xlsx')
# 进行分词处理
df['cutted'] = df['review'].apply(lambda x: tokenizer.tokenize(x))
df["label"] = df['label'].astype("str")

# 准备训练测试数据集
train_x = list(df['cutted'][:int(len(df)*0.7)])
train_y = list(df['label'][:int(len(df)*0.7)])

valid_x = list(df['cutted'][int(len(df)*0.7):int(len(df)*0.85)])
valid_y = list(df['label'][int(len(df)*0.7):int(len(df)*0.85)])

test_x = list(df['cutted'][int(len(df)*0.85):])
test_y = list(df['label'][int(len(df)*0.85):])


# 使用 embedding 初始化模型
model = BiLSTM_Model(embed)

# 先只训练一轮
model.fit(train_x, train_y, valid_x, valid_y, batch_size=12, epochs=1)

model.evaluate(test_x, test_y, batch_size=12)

Question

Alter training the model, i got some errors:

Traceback (most recent call last):
  File "train_model.py", line 41, in <module>
    model.fit(train_x, train_y, valid_x, valid_y, batch_size=12, epochs=1)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/kashgari/tasks/base_model.py", line 321, in fit
    **fit_kwargs)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
    steps_name='steps_per_epoch')
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 264, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1175, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas xGEMMBatched launch failed : a.shape=[144,64,64], b.shape=[144,64,64], m=64, n=64, k=64, batch_size=144
         [[{{node Encoder-1-MultiHeadSelfAttention/Encoder-1-MultiHeadSelfAttention-Attention/MatMul}}]]
         [[metrics/acc/Identity/_1711]]
  (1) Internal: Blas xGEMMBatched launch failed : a.shape=[144,64,64], b.shape=[144,64,64], m=64, n=64, k=64, batch_size=144
         [[{{node Encoder-1-MultiHeadSelfAttention/Encoder-1-MultiHeadSelfAttention-Attention/MatMul}}]]
0 successful operations.
0 derived errors ignored.

yanwii added the question Further information is requested label Jan 10, 2023

yanwii assigned BrikerMan Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got 'Blas xGEMMBatched launch failed' using BERT + BiLSTM #497

Got 'Blas xGEMMBatched launch failed' using BERT + BiLSTM #497

yanwii commented Jan 10, 2023 •

edited

Loading

Got 'Blas xGEMMBatched launch failed' using BERT + BiLSTM #497

Got 'Blas xGEMMBatched launch failed' using BERT + BiLSTM #497

Comments

yanwii commented Jan 10, 2023 • edited Loading

Check List

Environment

Question

yanwii commented Jan 10, 2023 •

edited

Loading