Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The results are different when using CPU and GPU in inference with OpenVINO #1859

Open
1 task done
nakayamarusu opened this issue Mar 15, 2024 · 3 comments
Open
1 task done

Comments

@nakayamarusu
Copy link

nakayamarusu commented Mar 15, 2024

Describe the bug

Problem

Learn images of MVTec bottles using Padim. And after exporting it to a file that can be used in Openvino, I ran the inference in Openvino. You can get the correct heatmap by inferring on the CPU, but it won't work on the GPU. I just changed the device used from CPU to GPU. The GPU uses Intel iRIS Xew, and we have confirmed that OpenVino is compatible.

Dataset

MVTec

Model

PADiM

Steps to reproduce the behavior

Train

The learning process was as follows.
anomalib fit -c configs/model/padim.yaml --data configs/folder_bottle.yaml

▼ padim.yaml

model:
  class_path: anomalib.models.Padim
  init_args:
    layers:
      - layer1
      - layer2
      - layer3
    #input_size: [256, 256]
    backbone: resnet18
    pre_trained: true
    n_features: null

metrics:
  pixel: AUROC

▼ folder_bottle_yaml

class_path: anomalib.data.Folder
init_args:
  name: bottle
  root: "dataset/bottle"
  normal_dir: "train/good"
  abnormal_dir: "test/broken_large"
  normal_test_dir: "test/good"
  mask_dir: "ground_truth/broken_large"
  normal_split_ratio: 0
  extensions: [".png"]
  image_size: [256, 256]
  #center_crop: null
  #normalization: imagenet
  train_batch_size: 32
  eval_batch_size: 32
  num_workers: 8
  task: CLASSIFICATION
  #transform_config_train: null
  #transform_config_eval: null
  test_split_mode: NONE
  test_split_ratio: 0.2
  val_split_mode: same_as_test
  val_split_ratio: 0.5
  seed: null

Export

The export was performed as follows.
anomalib export --model Padim --export_type OPENVINO --ckpt_path results/Padim/bottle/latest/weights/lightning/model.ckpt

Inference

The inference was made by creating python code as follows.

from anomalib.deploy.inferencers import OpenVINOInferencer
import cv2
import numpy as np
import os

inferencer = OpenVINOInferencer(
    path=r"C:\anomalib_v1\results\weights\openvino\model.onnx",
    metadata=r"C:\anomalib_v1\results\weights\openvino\metadata.json",
    device="GPU",
)

img_path = r"C:\anomalib_v1\dataset\bottle\test\broken_small"

for i, file in enumerate(os.listdir(img_path)):
    
    image = cv2.imread(img_path + "\\" + file)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    input_img = image.astype(np.float32) / 255.0 
    result = inferencer.predict(image = input_img)
    print(i, result.pred_score)
    cv2.imshow("heatmap", cv2.cvtColor(result.heat_map, cv2.COLOR_RGB2BGR))
    cv2.waitKey()

OS information

OS information:

  • OS: Windows 11 Pro
  • Python version: 3.10.13
  • Anomalib version: 1.0.0
  • PyTorch-lightning version: 2.2.1
  • torch version : 2.1.2
  • GPU models and configuration : Intel(R) Iris(R) Xe Graphics

Expected behavior

Inference result

Heat map when the device used for inference is CPU as shown below

inferencer = OpenVINOInferencer(
    path=r"C:\anomalib_v1\results\weights\openvino\model.onnx",
    metadata=r"C:\anomalib_v1\results\weights\openvino\metadata.json",
    device="CPU",
)

heatmap1
▲ pred_score : 0.4836

heatmap2
▲ pred_score : 0.5605

Heat map when the device used for inference is GPU as shown below

inferencer = OpenVINOInferencer(
    path=r"C:\anomalib_v1\results\weights\openvino\model.onnx",
    metadata=r"C:\anomalib_v1\results\weights\openvino\metadata.json",
    device="GPU",
)

heatmap3
▲ pred_score : 0.0

heatmap4
▲ pred_score : 0.0

Both CPU and GPU inference use the same model.
Also, only the device = "CPU" and device = "GPU" parts have been changed.
I changed the OpenVino version from 2024.0.0 to 2023.2.0 and tried inference, but it doesn't work

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

class_path: anomalib.data.Folder
init_args:
  name: bottle
  root: "dataset/bottle"
  normal_dir: "train/good"
  abnormal_dir: "test/broken_large"
  normal_test_dir: "test/good"
  mask_dir: "ground_truth/broken_large"
  normal_split_ratio: 0
  extensions: [".png"]
  image_size: [256, 256]
  #center_crop: null
  #normalization: imagenet
  train_batch_size: 32
  eval_batch_size: 32
  num_workers: 8
  task: CLASSIFICATION
  #transform_config_train: null
  #transform_config_eval: null
  test_split_mode: NONE
  test_split_ratio: 0.2
  val_split_mode: same_as_test
  val_split_ratio: 0.5
  seed: null




model:
  class_path: anomalib.models.Padim
  init_args:
    layers:
      - layer1
      - layer2
      - layer3
    #input_size: [256, 256]
    backbone: resnet18
    pre_trained: true
    n_features: null

metrics:
  pixel: AUROC

Logs

(anomalib_latest_env) C:\anomalib_v1>anomalib fit -c configs/model/padim.yaml --data configs/data/folder_bottle.yaml
2024-03-15 10:37:21,750 - anomalib.utils.config - WARNING - Anomalib currently does not support multi-gpu training. Setting devices to 1.
[03/15/24 10:37:21] WARNING  Anomalib currently does not support multi-gpu training. Setting devices to 1.     config.py:126
C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torchmetrics\utilities\prints.py:36: UserWarning: Metric `PrecisionRecallCurve` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
  warnings.warn(*args, **kwargs)
2024-03-15 10:37:21,781 - anomalib.models.components.base.anomaly_module - INFO - Initializing Padim model.
                    INFO     Initializing Padim model.                                                  anomaly_module.py:39
2024-03-15 10:37:21,954 - timm.models.helpers - INFO - Loading pretrained weights from url (https://download.pytorch.org/models/resnet18-5c106cde.pth)
                    INFO     Loading pretrained weights from url                                              helpers.py:247
                             (https://download.pytorch.org/models/resnet18-5c106cde.pth)
2024-03-15 10:37:22,064 - anomalib.callbacks - INFO - Loading the callbacks
[03/15/24 10:37:22] INFO     Loading the callbacks                                                            __init__.py:43
2024-03-15 10:37:22,064 - anomalib.engine.engine - INFO - Overriding max_epochs from None with 1 for Padim
                    INFO     Overriding max_epochs from None with 1 for Padim                                   engine.py:84
2024-03-15 10:37:22,082 - anomalib.engine.engine - INFO - Overriding val_check_interval from None with 1.0 for Padim
                    INFO     Overriding val_check_interval from None with 1.0 for Padim                         engine.py:84
2024-03-15 10:37:22,084 - anomalib.engine.engine - INFO - Overriding num_sanity_val_steps from None with 0 for Padim
                    INFO     Overriding num_sanity_val_steps from None with 0 for Padim                         engine.py:84
2024-03-15 10:37:22,175 - lightning.pytorch.utilities.rank_zero - INFO - GPU available: False, used: False
[03/15/24 10:37:22] INFO     GPU available: False, used: False                                               rank_zero.py:64
2024-03-15 10:37:22,187 - lightning.pytorch.utilities.rank_zero - INFO - TPU available: False, using: 0 TPU cores
                    INFO     TPU available: False, using: 0 TPU cores                                        rank_zero.py:64
2024-03-15 10:37:22,191 - lightning.pytorch.utilities.rank_zero - INFO - IPU available: False, using: 0 IPUs
                    INFO     IPU available: False, using: 0 IPUs                                             rank_zero.py:64
2024-03-15 10:37:22,191 - lightning.pytorch.utilities.rank_zero - INFO - HPU available: False, using: 0 HPUs
                    INFO     HPU available: False, using: 0 HPUs                                             rank_zero.py:64
2024-03-15 10:37:22,191 - lightning.pytorch.utilities.rank_zero - INFO - `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
                    INFO     `Trainer(val_check_interval=1.0)` was configured so validation will run at the  rank_zero.py:64
                             end of the training epoch..
C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torchmetrics\utilities\prints.py:36: UserWarning: Metric `ROC` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
  warnings.warn(*args, **kwargs)
C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\lightning\pytorch\core\optimizer.py:180: `LightningModule.configure_optimizers` returned `None`, this fit will run with no optimizer
2024-03-15 10:37:22,375 - lightning.pytorch.callbacks.model_summary - INFO -
  | Name                  | Type                     | Params
-------------------------------------------------------------------
0 | model                 | PadimModel               | 2.8 M
1 | _transform            | Compose                  | 0
2 | normalization_metrics | MinMax                   | 0
3 | image_threshold       | F1AdaptiveThreshold      | 0
4 | pixel_threshold       | F1AdaptiveThreshold      | 0
5 | image_metrics         | AnomalibMetricCollection | 0
6 | pixel_metrics         | AnomalibMetricCollection | 0
-------------------------------------------------------------------
2.8 M     Trainable params
0         Non-trainable params
2.8 M     Total params
11.131    Total estimated model params size (MB)
                    INFO                                                                                 model_summary.py:90
                               | Name                  | Type                     | Params
                             -------------------------------------------------------------------
                             0 | model                 | PadimModel               | 2.8 M
                             1 | _transform            | Compose                  | 0
                             2 | normalization_metrics | MinMax                   | 0
                             3 | image_threshold       | F1AdaptiveThreshold      | 0
                             4 | pixel_threshold       | F1AdaptiveThreshold      | 0
                             5 | image_metrics         | AnomalibMetricCollection | 0
                             6 | pixel_metrics         | AnomalibMetricCollection | 0
                             -------------------------------------------------------------------
                             2.8 M     Trainable params
                             0         Non-trainable params
                             2.8 M     Total params
                             11.131    Total estimated model params size (MB)
C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:436: Consider setting `persistent_workers=True` in 'train_dataloader' to speed up the dataloader worker initialization.
C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:436: Consider setting `persistent_workers=True` in 'val_dataloader' to speed up the dataloader worker initialization.
Epoch 0:   0%|                                                                                       | 0/7 [00:00<?, ?it/s]C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\lightning\pytorch\loops\optimization\automatic.py:129: `training_step` returned `None`. If this was on purpose, ignore this warning...
Epoch 0: 100%|███████████████████████████████████████████████████████████████████████████████| 7/7 [00:17<00:00,  0.39it/s]2024-03-15 10:37:42,083 - anomalib.models.image.padim.lightning_model - INFO - Aggregating the embedding extracted from the training set.
[03/15/24 10:37:42] INFO     Aggregating the embedding extracted from the training set.                lightning_model.py:87
2024-03-15 10:37:42,109 - anomalib.models.image.padim.lightning_model - INFO - Fitting a Gaussian to the embedding collected from the training set.
                    INFO     Fitting a Gaussian to the embedding collected from the training set.      lightning_model.py:90
                                                                                                                           C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torchmetrics\utilities\prints.py:36: UserWarning: No negative samples in targets, false positive value should be meaningless. Returning zero tensor in false positive score
  warnings.warn(*args, **kwargs)
Epoch 0: 100%|████████████████████████████████████████████████████████████| 7/7 [00:37<00:00,  0.19it/s, pixel_AUROC=0.976]2024-03-15 10:38:00,833 - lightning.pytorch.utilities.rank_zero - INFO - `Trainer.fit` stopped: `max_epochs=1` reached.
[03/15/24 10:38:00] INFO     `Trainer.fit` stopped: `max_epochs=1` reached.                                  rank_zero.py:64
Epoch 0: 100%|████████████████████████████████████████████████████████████| 7/7 [00:37<00:00,  0.19it/s, pixel_AUROC=0.976]
2024-03-15 10:38:03,269 - anomalib.callbacks.timer - INFO - Training took 40.91 seconds
[03/15/24 10:38:03] INFO     Training took 40.91 seconds                                                         timer.py:59

(anomalib_latest_env) C:\anomalib_v1>anomalib export --model Padim --export_type OPENVINO --ckpt_path results/Padim/bottle/latest/weights/lightning/model.ckpt
C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torchmetrics\utilities\prints.py:36: UserWarning: Metric `PrecisionRecallCurve` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
  warnings.warn(*args, **kwargs)
2024-03-15 10:40:58,023 - anomalib.models.components.base.anomaly_module - INFO - Initializing Padim model.
[03/15/24 10:40:58] INFO     Initializing Padim model.                                                  anomaly_module.py:39
2024-03-15 10:40:58,198 - timm.models.helpers - INFO - Loading pretrained weights from url (https://download.pytorch.org/models/resnet18-5c106cde.pth)
                    INFO     Loading pretrained weights from url                                              helpers.py:247
                             (https://download.pytorch.org/models/resnet18-5c106cde.pth)
2024-03-15 10:40:58,291 - anomalib.callbacks - INFO - Loading the callbacks
                    INFO     Loading the callbacks                                                            __init__.py:43
2024-03-15 10:40:58,291 - anomalib.engine.engine - INFO - Overriding max_epochs from None with 1 for Padim
                    INFO     Overriding max_epochs from None with 1 for Padim                                   engine.py:84
2024-03-15 10:40:58,291 - anomalib.engine.engine - INFO - Overriding val_check_interval from None with 1.0 for Padim
                    INFO     Overriding val_check_interval from None with 1.0 for Padim                         engine.py:84
2024-03-15 10:40:58,291 - anomalib.engine.engine - INFO - Overriding num_sanity_val_steps from None with 0 for Padim
                    INFO     Overriding num_sanity_val_steps from None with 0 for Padim                         engine.py:84
2024-03-15 10:40:58,400 - lightning.pytorch.utilities.rank_zero - INFO - GPU available: False, used: False
[03/15/24 10:40:58] INFO     GPU available: False, used: False                                               rank_zero.py:64
2024-03-15 10:40:58,400 - lightning.pytorch.utilities.rank_zero - INFO - TPU available: False, using: 0 TPU cores
                    INFO     TPU available: False, using: 0 TPU cores                                        rank_zero.py:64
2024-03-15 10:40:58,400 - lightning.pytorch.utilities.rank_zero - INFO - IPU available: False, using: 0 IPUs
                    INFO     IPU available: False, using: 0 IPUs                                             rank_zero.py:64
2024-03-15 10:40:58,400 - lightning.pytorch.utilities.rank_zero - INFO - HPU available: False, using: 0 HPUs
                    INFO     HPU available: False, using: 0 HPUs                                             rank_zero.py:64
2024-03-15 10:40:58,400 - lightning.pytorch.utilities.rank_zero - INFO - `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
                    INFO     `Trainer(val_check_interval=1.0)` was configured so validation will run at the  rank_zero.py:64
                             end of the training epoch..
2024-03-15 10:40:58,870 - anomalib.models.components.base.anomaly_module - INFO - Initializing Padim model.
                    INFO     Initializing Padim model.                                                  anomaly_module.py:39
2024-03-15 10:40:59,028 - timm.models.helpers - INFO - Loading pretrained weights from url (https://download.pytorch.org/models/resnet18-5c106cde.pth)
[03/15/24 10:40:59] INFO     Loading pretrained weights from url                                              helpers.py:247
                             (https://download.pytorch.org/models/resnet18-5c106cde.pth)
C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torch\onnx\_internal\jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torch\onnx\utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
  _C._jit_pass_onnx_graph_shape_type_inference(
C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torch\onnx\utils.py:1209: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.)
  _C._jit_pass_onnx_graph_shape_type_inference(
2024-03-15 10:41:03,818 - root - INFO - Exported model to results\weights\openvino\model.xml
[03/15/24 10:41:03] INFO     Exported model to results\weights\openvino\model.xml                              engine.py:918

Code of Conduct

  • I agree to follow this project's Code of Conduct
@samet-akcay
Copy link
Contributor

samet-akcay commented Mar 21, 2024

@nakayamarusu, do you observe the same behaviour on TorchInferencer or Engine.predict method?

@nakayamarusu
Copy link
Author

@samet-akcay Can I use Intel's built-in GPU when using TorchInferencer? I only have Intel's iRIS Xe.
Just to be sure, I used TorchInferencer to perform inference on the CPU, but the results worked fine.

Use CPU

from anomalib.deploy.inferencers import TorchInferencer
from anomalib.data.utils import read_image
import torch
import numpy as np
import cv2

inferencer = TorchInferencer(path=r"C:\anomalib_v1\results\weights\torch\model.pt", device="cpu")

image = read_image(r"C:\anomalib_v1\dataset\bottle\test\broken_large\000.png")
input_img = image.astype(np.float32) / 1.
image_transposed = np.transpose(input_img, (2, 0, 1))
print(image_transposed.shape)
torch_image = torch.from_numpy(image_transposed)
result = inferencer.predict(torch_image)
cv2.imshow("result", cv2.cvtColor(result.heat_map, cv2.COLOR_RGB2BGR))
cv2.waitKey()

result

Use GPU

from anomalib.deploy.inferencers import TorchInferencer
from anomalib.data.utils import read_image
import torch
import numpy as np
import cv2

inferencer = TorchInferencer(path=r"C:\anomalib_v1\results\weights\torch\model.pt", device="gpu")

image = read_image(r"C:\anomalib_v1\dataset\bottle\test\broken_large\000.png")
input_img = image.astype(np.float32) / 1.
image_transposed = np.transpose(input_img, (2, 0, 1))
print(image_transposed.shape)
torch_image = torch.from_numpy(image_transposed)
result = inferencer.predict(torch_image)
cv2.imshow("result", cv2.cvtColor(result.heat_map, cv2.COLOR_RGB2BGR))
cv2.waitKey()
(anomalib_latest_env) C:\anomalib_v1>python C:\anomalib_v1\original_code\torch_infer.py
Traceback (most recent call last):
  File "C:\anomalib_v1\original_code\torch_infer.py", line 13, in <module>
    inferencer = TorchInferencer(path=r"C:\anomalib_v1\results\weights\torch\model.pt", device="gpu")
  File "C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\anomalib\deploy\inferencers\torch_inferencer.py", line 69, in __init__
    self.checkpoint = self._load_checkpoint(path)
  File "C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\anomalib\deploy\inferencers\torch_inferencer.py", line 109, in _load_checkpoint
    return torch.load(path, map_location=self.device)
  File "C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torch\serialization.py", line 1014, in load
    return _load(opened_zipfile,
  File "C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torch\serialization.py", line 1422, in _load
    result = unpickler.load()
  File "C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torch\serialization.py", line 1392, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torch\serialization.py", line 1366, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torch\serialization.py", line 1299, in restore_location
    return default_restore_location(storage, str(map_location))
  File "C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torch\serialization.py", line 381, in default_restore_location
    result = fn(storage, location)
  File "C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torch\serialization.py", line 274, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "C:\Users\n-nakayama\AppData\Local\anaconda3\envs\anomalib_latest_env\lib\site-packages\torch\serialization.py", line 258, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

@samet-akcay
Copy link
Contributor

Yeah, for GPU on an XPU device, we need to enable the XPU training support. I think this might potentially be supported in v1.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants