GPUv2 numerical inaccuracy in simple Add + Mul #66740

gustavla · 2024-04-30T22:23:44Z

System information

Google Pixel 7 / Android 14 / Google Tensor G2
TensorFlow 2.16.1

Running a simple model with a Add + Mul on both GPU and CPU gives vastly different results (on device CPU confirmed separately to be correct):

Arrays are not almost equal to 7 decimals

Mismatched elements: 392911 / 393216 (99.9%)
Max absolute difference: 2.719718
Max relative difference: 0.99997413
 x: array([[[[0.408207 , 0.2999736, 0.1134637, ..., 0.1838855, 0.1379557,
          0.2322327],
         [1.3259705, 0.9743981, 0.3685618, ..., 0.5973114, 0.4481186,...
 y: array([[[[0.4082031, 0.       , 0.       , ..., 0.       , 0.       ,
          1.2128906],
         [1.3251953, 0.       , 0.       , ..., 0.       , 0.       ,...

Standalone code to reproduce the issue

Model asset: tflite_66740_add_mul_gpu_numerically_incorrect.tflite
Generate some data and run on device on GPU (and optionally CPU for baseline).
This results in the above severe numerical discrepancy

Inference repro
I reproduced this using https://aihub.qualcomm.com/ out of convenience (but I'm sure it will repro through other means as well), so I'm attaching the script here.

import numpy as np
import qai_hub as hub

model = hub.upload_model("tflite_66740_add_mul_gpu_numerically_incorrect.tflite")

rng = np.random.RandomState(1234)
x = rng.uniform(size=(1, 64, 64, 1)).astype(np.float32)

inputs = hub.upload_dataset({"model_5/0/body/0/0/norm/LayerNormalization/batchnorm/Rsqrt": [x]})

device = hub.Device("Google Pixel 7")
cpu_job = hub.submit_inference_job(
    model,
    device=device,
    inputs=inputs,
    options="--compute_unit cpu",
)

gpu_job = hub.submit_inference_job(
    model,
    device=device,
    inputs=inputs,
    options="--compute_unit gpu",
)

cpu_output = cpu_job.download_output_data()
gpu_output = gpu_job.download_output_data()

np.testing.assert_almost_equal(cpu_output["output_0"][0], gpu_output["output_0"][0])

Model visualization

Any other info / logs

Logs from the GPU inference job (via https://aihub.qualcomm.com/):

[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.board.platform = gs201
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.boot.hardware = panther
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.boot.hardware.platform = gs201
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.system.build.id = UP1A.231005.007.A1
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.system.build.version.release = 14
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.hardware = panther
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.hardware.chipname = 
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.product.board = panther
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.product.brand = google
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.product.device = panther
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.product.build.fingerprint = google/panther/panther:14/UP1A.231005.007.A1/10762838:user/release-keys
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.product.manufacturer = Google
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.product.model = Pixel 7
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.product.name = panther
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.soc.manufacturer = Google
[01/May/2024:08:11:15 +10:00: profiler/info] Android system property: ro.soc.model = GS201
[01/May/2024:08:11:15 +10:00: profiler/info] [Manager] DeviceManager::DeviceManager
[01/May/2024:08:11:15 +10:00: profiler/info] [Manager] findAvailableDevices
[01/May/2024:08:11:15 +10:00: profiler/info] [Manager] Found interface google-edgetpu (version = 2.0)
[01/May/2024:08:11:15 +10:00: profiler/info] [Manager] Found interface google-armnn (version = ArmNN)
[01/May/2024:08:11:15 +10:00: profiler/info] NNAPI devices: google-edgetpu,google-armnn,nnapi-reference
[01/May/2024:08:11:15 +10:00: profiler/info] GPU device: ARM Mali-G710
[01/May/2024:08:11:15 +10:00: profiler/info] OpenGL Version: OpenGL ES 3.2 v1.r43p0-01eac0.ff5c643eda65d3f5ba9886fdffb12673
[01/May/2024:08:11:15 +10:00: profiler/info] OpenCL Version: OpenCL C 1.2 v1.r43p0-01eac0.ff5c643eda65d3f5ba9886fdffb12673
[01/May/2024:08:11:15 +10:00: profiler/info] -=- Tungsten Running Task: Loading -=-
[01/May/2024:08:11:15 +10:00: profiler/info] Detected chipset 3101, made by 3000.
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Loading tflite model Models/model.tflite
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Malloc VM size before: 28600.0 kB, allocated: 18032.4 kB, slack: 10567.6 kB.
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Current memory baseline range: 61792.4-72360.0 kB.
[01/May/2024:08:11:15 +10:00: profiler/debug] [job_id: jlpej1x85] [model.tflite] Runtime metadata not found in Models/model.tflite/trt_metadata.json or Models/model.tflite/trt_metadata.pb
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] TF Lite version 2.16.1. Loading model from Models/model.tflite.
[01/May/2024:08:11:15 +10:00: profiler/debug] [job_id: jlpej1x85] [model.tflite] Mapping resource file in Models/model.tflite
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Loaded model. Minimum TF Lite version = 1.5.0.
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] No delegates specified; using compute unit=cpu_and_gpu.
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] [tflite] Initialized TensorFlow Lite runtime.
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] GPUV2 delegate requested. OpenCL detected.
[01/May/2024:08:11:15 +10:00: profiler/debug] [job_id: jlpej1x85] [model.tflite] Enabling delegate cache in dir=/data/user/0/ai.tetra.tungsten/cache/1714515075350/ai.tetra.runtime/0.6.0/model.tflite_8945450969824422876_1714515066045/gpuv2.
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] [tflite] Created TensorFlow Lite delegate for GPU.
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] [tflite] Replacing 2 out of 2 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions for the whole graph.
[01/May/2024:08:11:15 +10:00: profiler/warning] [job_id: jlpej1x85] [model.tflite] [tflite] File /data/user/0/ai.tetra.tungsten/cache/1714515075350/ai.tetra.runtime/0.6.0/model.tflite_8945450969824422876_1714515066045/gpuv2/gpuv2_5609552586171313032.bin couldn't be opened for reading: No such file or directory
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] [tflite] Initialized OpenCL-based API.
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] [tflite] Created 1 GPU delegate kernels.
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Applied 1 delegates: GPUV2/OpenCL. Model is fully delegated=true.
[01/May/2024:08:11:15 +10:00: profiler/debug] [job_id: jlpej1x85] [model.tflite] Saving delegate selection for subsequent steps.
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Malloc VM size after: 36240.0 kB, allocated: 25209.7 kB, slack: 11030.3 kB.
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Status Successfully Loaded Cold with t = 132027 us and usage: before = 72360.0 kB; peakBefore = 72360.0 kB; mallocUnusedBefore = 10567.6 kB; after = 81780.0 kB; peakAfter = 81744.0 kB; mallocUnusedAfter = 11030.3 kB; increase = 0.0-8957.3 kB; peak = 9384.0-19951.6 kB
[01/May/2024:08:11:15 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Saving results to /storage/emulated/0/Android/data/ai.tetra.tungsten/files/Results/job_jlpej1x85/job_jlpej1x85_results.bin
[01/May/2024:08:11:16 +10:00: profiler/info] -=- Tungsten Running Task: Loading -=-
[01/May/2024:08:11:16 +10:00: profiler/info] Detected chipset 3101, made by 3000.
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Loading previously saved results in /storage/emulated/0/Android/data/ai.tetra.tungsten/files/Results/job_jlpej1x85/job_jlpej1x85_results.bin
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Loading tflite model Models/model.tflite
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Malloc VM size before: 30640.0 kB, allocated: 17338.9 kB, slack: 13301.1 kB.
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Current memory baseline range: 54914.9-68216.0 kB.
[01/May/2024:08:11:16 +10:00: profiler/debug] [job_id: jlpej1x85] [model.tflite] Runtime metadata not found in Models/model.tflite/trt_metadata.json or Models/model.tflite/trt_metadata.pb
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] TF Lite version 2.16.1. Loading model from Models/model.tflite.
[01/May/2024:08:11:16 +10:00: profiler/debug] [job_id: jlpej1x85] [model.tflite] Mapping resource file in Models/model.tflite
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Loaded model. Minimum TF Lite version = 1.5.0.
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] GPUV2 delegate requested. OpenCL detected.
[01/May/2024:08:11:16 +10:00: profiler/debug] [job_id: jlpej1x85] [model.tflite] Enabling delegate cache in dir=/data/user/0/ai.tetra.tungsten/cache/1714515075350/ai.tetra.runtime/0.6.0/model.tflite_8945450969824422876_1714515066045/gpuv2.
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] [tflite] Replacing 2 out of 2 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions for the whole graph.
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] [tflite] Found serialized data for model gpuv2 (790680 B) at /data/user/0/ai.tetra.tungsten/cache/1714515075350/ai.tetra.runtime/0.6.0/model.tflite_8945450969824422876_1714515066045/gpuv2/gpuv2_5609552586171313032.bin
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] [tflite] Initialized OpenCL-based API from serialized data.
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] [tflite] Created 1 GPU delegate kernels.
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Applied 1 delegates: GPUV2/OpenCL. Model is fully delegated=true.
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Malloc VM size after: 36496.0 kB, allocated: 25417.5 kB, slack: 11078.5 kB.
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Status Successfully Loaded Warm with t = 94180 us and usage: before = 68216.0 kB; peakBefore = 68216.0 kB; mallocUnusedBefore = 13301.1 kB; after = 78828.0 kB; peakAfter = 78076.0 kB; mallocUnusedAfter = 11078.5 kB; increase = 0.0-12834.6 kB; peak = 9860.0-23161.1 kB
[01/May/2024:08:11:16 +10:00: profiler/info] [job_id: jlpej1x85] [model.tflite] Saving results to /storage/emulated/0/Android/data/ai.tetra.tungsten/files/Results/job_jlpej1x85/job_jlpej1x85_results.bin

The text was updated successfully, but these errors were encountered:

sawantkumar · 2024-05-09T08:16:55Z

Hi @gustavla ,

I am in the process of replicating your issue . I will get back to you as soon as poosible.

gustavla added the comp:lite TF Lite related issues label Apr 30, 2024

google-ml-butler bot assigned SuryanarayanaY Apr 30, 2024

SuryanarayanaY assigned sawantkumar and unassigned SuryanarayanaY May 2, 2024

sawantkumar added the Android label May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPUv2 numerical inaccuracy in simple Add + Mul #66740

GPUv2 numerical inaccuracy in simple Add + Mul #66740

gustavla commented Apr 30, 2024 •

edited

sawantkumar commented May 9, 2024

GPUv2 numerical inaccuracy in simple Add + Mul #66740

GPUv2 numerical inaccuracy in simple Add + Mul #66740

Comments

gustavla commented Apr 30, 2024 • edited

sawantkumar commented May 9, 2024

gustavla commented Apr 30, 2024 •

edited