Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lvu_durations.csv #5

Open
nbgundavarapu opened this issue Oct 19, 2022 · 8 comments
Open

lvu_durations.csv #5

nbgundavarapu opened this issue Oct 19, 2022 · 8 comments

Comments

@nbgundavarapu
Copy link

nbgundavarapu commented Oct 19, 2022

Hi authors,

How are the durations in lvu_durations.csv computed? The last 20s in most videos show preview for other videos. Does lvu_durations.csv show the number of seconds in the video excluding the preview duration?

Thanks

@nbgundavarapu
Copy link
Author

nbgundavarapu commented Oct 26, 2022

These lines of code

for i in range(int(duration)):
idx = int(video.shape[0] / duration * i)
x = torch.unsqueeze(video[idx], 0).to(device)
x = model.forward_features(x)

suggests that these previews are used in training and evaluation. Could you confirm? Thanks!

@md-mohaiminul
Copy link
Owner

Hi,
Thanks for reaching out. We used the duration from Condensed Movies dataset. They removed the outro/preview from each video which they describe in section 3.1 of their Paper. Therefore, lvu_durations.csv does not contain the outro/preview of each video.

@nbgundavarapu
Copy link
Author

nbgundavarapu commented Nov 1, 2022

Thanks for your reply! Do the downloaded mp4 videos have outro/preview removed?

If not, in the following code, outro/preview seems to be included and the same is being used later in training/evals.

video = get_video(video_fp)
video = torch.from_numpy(video.transpose([0, 3, 1, 2])).float()
duration = duration_data.loc[video_id]['duration']
print(cnt, video_id, video.shape, duration)
features = np.zeros((duration+1, 197, 1024))
for i in range(int(duration)):
idx = int(video.shape[0] / duration * i)
x = torch.unsqueeze(video[idx], 0).to(device)
x = model.forward_features(x)

e.g. Consider the video 9NG5mJgw6Yg in writer set with duration = 154s, and the actual video length = 184s. Above code will include frames after 154s containing outro/preview.

@nbgundavarapu
Copy link
Author

nbgundavarapu commented Nov 14, 2022

In the above example, could you walk through the above code from your codebase, at i=153?
idx = int(184/154*153) = 183
Hence, features[153] = model_fwd(video[183])

In effect, features[153] contains outro frame 183. So, during LVU evals, frame 183 will be used for this video which is not what you intended. This looks like a bug. The same is true for a lot of videos and frames.

@md-mohaiminul
Copy link
Owner

Hi,
I think you are right. You need to remove the outro first and we also did that. You can use the duration from 'lvu_durations.csv' to do this.

@nbgundavarapu
Copy link
Author

nbgundavarapu commented Nov 15, 2022

Thanks! Could you please check and confirm if the reported results in the paper contain outro by any chance in light of the above bug?
The current state of the codebase is definitely using the outro.

Context:
I'm struggling to reproduce results from the paper. There is a 1% difference in performance if I include/exclude the outro, and including the outro puts the results close to the reported results in the paper.

@md-mohaiminul
Copy link
Owner

Which task did you try and what performance are you getting? Also, how did you solve the 'NaN' issue? Can you please reply that on the other issue so that other's can benefit from it?

@nbgundavarapu
Copy link
Author

I've not been able to solve the NaN issue. I'm working on a reimplementation in jax building upon annotated-s4

I've tried all the classification tasks. There is a ~1% gap in relationship, director, writer, speaking including/excluding the outro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants