Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got 'Blas xGEMMBatched launch failed' using BERT + BiLSTM #497

Open
yanwii opened this issue Jan 10, 2023 · 0 comments
Open

Got 'Blas xGEMMBatched launch failed' using BERT + BiLSTM #497

yanwii opened this issue Jan 10, 2023 · 0 comments
Assignees
Labels
question Further information is requested

Comments

@yanwii
Copy link

yanwii commented Jan 10, 2023

You must follow the issue template and provide as much information as possible. otherwise, this issue will be closed.
请按照 issue 模板要求填写信息。如果没有按照 issue 模板填写,将会忽略并关闭这个 issue

Check List

Thanks for considering to open an issue. Before you submit your issue, please confirm these boxes are checked.

You can post pictures, but if specific text or code is required to reproduce the issue, please provide the text in a plain text format for easy copy/paste.

Environment

  • Debian 11
  • Python3.6.8
  • requirements.txt:
cudatoolkit               10.0.130                      0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cudnn                     7.6.5                cuda10.0_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
kashgari                  1.1.5                    pypi_0    pypi
keras                     2.3.1                    pypi_0    pypi
keras-applications        1.0.8                      py_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
keras-bert                0.89.0                   pypi_0    pypi
keras-embed-sim           0.10.0                   pypi_0    pypi
keras-gpt-2               0.17.0                   pypi_0    pypi
keras-layer-normalization 0.16.0                   pypi_0    pypi
keras-multi-head          0.29.0                   pypi_0    pypi
keras-pos-embd            0.13.0                   pypi_0    pypi
keras-position-wise-feed-forward 0.8.0                    pypi_0    pypi
keras-preprocessing       1.1.2              pyhd3eb1b0_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
keras-self-attention      0.51.0                   pypi_0    pypi
keras-transformer         0.40.0                   pypi_0    pypi
numpy                     1.16.4                   pypi_0    pypi
numpy-base                1.19.2           py36hfa32c7d_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorboard               1.14.0           py36hf484d3e_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
tensorflow                1.14.0          gpu_py36h57aa796_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorflow-addons         0.9.1                    pypi_0    pypi
tensorflow-estimator      1.14.0                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorflow-gpu            1.14.0                   pypi_0    pypi

And also nvidia-smi

Tue Jan 10 11:08:55 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   39C    P8    19W / 220W |    568MiB /  7979MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1288      G   /usr/lib/xorg/Xorg                223MiB |
|    0   N/A  N/A      1402      G   /usr/bin/gnome-shell               71MiB |
|    0   N/A  N/A      1698      G   ...b/firefox-esr/firefox-esr      171MiB |
|    0   N/A  N/A      1969      G   ...b/firefox-esr/firefox-esr        3MiB |
|    0   N/A  N/A      2217      G   ...RendererForSitePerProcess       90MiB |
|    0   N/A  N/A      9821      G   ...b/firefox-esr/firefox-esr        3MiB |
+-----------------------------------------------------------------------------+

My model:

import pandas as pd
import kashgari
from kashgari.embeddings import BERTEmbedding
from kashgari.tasks.classification import BiLSTM_Model
import numpy
import os

BERT_PATH = r'/chinese_L-12_H-768_A-12'


# 初始化 Embeddings
embed = BERTEmbedding(BERT_PATH,
                     task=kashgari.CLASSIFICATION,
                     sequence_length=64, layer_nums=4)

tokenizer = embed.tokenizer

df = pd.read_excel('data.xlsx')
# 进行分词处理
df['cutted'] = df['review'].apply(lambda x: tokenizer.tokenize(x))
df["label"] = df['label'].astype("str")

# 准备训练测试数据集
train_x = list(df['cutted'][:int(len(df)*0.7)])
train_y = list(df['label'][:int(len(df)*0.7)])

valid_x = list(df['cutted'][int(len(df)*0.7):int(len(df)*0.85)])
valid_y = list(df['label'][int(len(df)*0.7):int(len(df)*0.85)])

test_x = list(df['cutted'][int(len(df)*0.85):])
test_y = list(df['label'][int(len(df)*0.85):])


# 使用 embedding 初始化模型
model = BiLSTM_Model(embed)

# 先只训练一轮
model.fit(train_x, train_y, valid_x, valid_y, batch_size=12, epochs=1)

model.evaluate(test_x, test_y, batch_size=12)

Question

Alter training the model, i got some errors:

Traceback (most recent call last):
  File "train_model.py", line 41, in <module>
    model.fit(train_x, train_y, valid_x, valid_y, batch_size=12, epochs=1)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/kashgari/tasks/base_model.py", line 321, in fit
    **fit_kwargs)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
    steps_name='steps_per_epoch')
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 264, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1175, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "/home/interstellar/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas xGEMMBatched launch failed : a.shape=[144,64,64], b.shape=[144,64,64], m=64, n=64, k=64, batch_size=144
         [[{{node Encoder-1-MultiHeadSelfAttention/Encoder-1-MultiHeadSelfAttention-Attention/MatMul}}]]
         [[metrics/acc/Identity/_1711]]
  (1) Internal: Blas xGEMMBatched launch failed : a.shape=[144,64,64], b.shape=[144,64,64], m=64, n=64, k=64, batch_size=144
         [[{{node Encoder-1-MultiHeadSelfAttention/Encoder-1-MultiHeadSelfAttention-Attention/MatMul}}]]
0 successful operations.
0 derived errors ignored.
@yanwii yanwii added the question Further information is requested label Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants