Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential using mis-ordered KITTI prompt #21

Open
haooooooqi opened this issue Nov 10, 2022 · 3 comments
Open

Potential using mis-ordered KITTI prompt #21

haooooooqi opened this issue Nov 10, 2022 · 3 comments

Comments

@haooooooqi
Copy link

haooooooqi commented Nov 10, 2022

Hi,
Thanks so much for providing such impactful works (LAION, Open CLIP, Open CLIP benchmark) to the community!
I noticed there might be a potential use of the mis-ordered KITTI prompt from https://github.com/openai/CLIP/blob/main/data/prompts.md#kitti

The prompt shared by Open AI is:

classes = [
    'a photo i took of a car on my left or right side.',
    'a photo i took with a car nearby.',
    'a photo i took with a car in the distance.',
    'a photo i took with no car.',
]

Here is the default order of the KITTI (distance) dataset:

above_20  below_20  below_8  no_vehicle

If we pair them:

above_20:  'a photo i took of a car on my left or right side.'
below_20: 'a photo i took with a car nearby.'
below_8: 'a photo i took with a car in the distance.'
no_vehicle: 'a photo i took with no car.'

Seems the right pair should be:

above_20:  'a photo i took with a car in the distance.'
below_20: 'a photo i took with a car nearby.'
below_8: 'a photo i took of a car on my left or right side.'
no_vehicle: 'a photo i took with no car.'

With the order chance, I can get 29.3 with the ViT-L/14 laion400m_e32 model.

Hope that could be helpful :)

Haoqi

@mehdidc
Copy link
Collaborator

mehdidc commented Nov 10, 2022

Thanks a lot @haooooooqi for catching/reporting this! will fix that now on the code and recaculate the benchmark for KITTI.

@mehdidc
Copy link
Collaborator

mehdidc commented Nov 10, 2022

@haooooooqi Quick question, could you explain where you found the default order of the classes? I was trying to read VTAB's original code (https://github.com/LAION-AI/CLIP_benchmark/blob/main/clip_benchmark/datasets/kitti.py#L101), and I have trouble to understand what they were trying to do. If we follow the code and if dist is a positive number, then the order would be : class 0: <=8, class1: >8 and <= 20, class 2: >20, class3: no vehicle. Or am I missing something ?

@mehdidc
Copy link
Collaborator

mehdidc commented Nov 13, 2022

Investigating further, I displayed the actual dists dist and corresponding labels label (from https://github.com/LAION-AI/CLIP_benchmark/blob/main/clip_benchmark/datasets/kitti.py#L101) for the first 32 examples.

26.96,2
7.06,0
15.06,1
73.79,2
64.83,2
10.37,1
12.54,1
12.35,1
1000.00,3
28.99,2
74.99,2
66.73,2
7.22,0
21.21,2
5.85,0
47.72,2
10.28,1
62.85,2
12.07,1
1000.00,3
1000.00,3
36.25,2
12.39,1
6.50,0
1000.00,3
15.51,1
10.28,1
11.50,1
10.88,1
54.08,2
8.93,1
5.06,0

so class 0 correspond indeed to <=8, class 1 to between 8 and 20, class 2 to > 20, class 3 to no vehicle (value of 1000)
The current order of the classes looks still correct to me:

        "a photo i took of a car on my left or right side.",
        "a photo i took with a car nearby.",
        "a photo i took with a car in the distance.",
        "a photo i took with no car.",
    ],

It is thus intriguing that you got better results with the following order (if I understand well):

  "a photo i took with a car in the distance."
  "a photo i took with a car nearby",
  "a photo i took of a car on my left or right side."
  "a photo i took with no car."
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants