Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding evaluation #96

Open
xinzuan opened this issue Sep 5, 2023 · 2 comments
Open

Question regarding evaluation #96

xinzuan opened this issue Sep 5, 2023 · 2 comments
Assignees

Comments

@xinzuan
Copy link

xinzuan commented Sep 5, 2023

Hi, I run the basic-pitch/basic_pitch/experiments/run_evaluation.py from branch wip-training with MAESTRO dataset and model checkpoint from basic-pitch/saved_models/icassp_2022.

I expect the result should be similar reported in the paper. However, I got following result:
{"Precision": 0.0, "Recall": 0.0, "F-measure": 0.0, "Average_Overlap_Ratio": 0.0, "Precision_no_offset": 0.04398411727609082, "Recall_no_offset": 0.029748905165349712, "F-measure_no_offset": 0.03468172982454684, "Average_Overlap_Ratio_no_offset": 0.5793096961557063, "Onset_Precision": 0.631602431674569, "Onset_Recall": 0.4181107759888922, "Onset_F-measure": 0.4925505866527016, "Offset_Precision": 0.7521021756258168, "Offset_Recall": 0.5273589516900296, "Offset_F-measure": 0.6072445448462509}.

Based on my understanding on mir_eval definition of each metrics, the one corresponding to F should be the F-measure, Fno should be F-measure_no_offset. (I cannot find the mir_eval for Acc). However ,from the upper result, you can see the result is really far from what reported in the paper.

Could anyone please tell me which mir_eval metrics corresponding to each metric in the paper?

@drubinstein drubinstein assigned drubinstein and zwycl and unassigned drubinstein Sep 8, 2023
@xinzuan
Copy link
Author

xinzuan commented Sep 18, 2023

When i check the value of the ref_intervals, est_intervals, it gave a really different value:

ref_interval: [[  0.98046875   1.08723958]
 [  0.99739583   1.25260417]
 [  1.09375      1.16536458]
 ...
 [384.79557292 388.55338542]
 [384.79817708 388.61067708]
 [384.80989583 388.52864583]] 
est_interval: [[387.07030113 387.36055057]
 [387.07030113 387.5475941 ]
 [387.07030113 387.5475941 ]
 ...
 [146.40265896 146.64646848]
 [362.68555193 362.89453152]
 [307.87372971 308.01433333]]

which I think this is one of the reason why the previous result is really far from what reported in the paper. After modify the functions in basic-pitch/basic_pitch/experiments/run_evaluation.py:

  1. change minimum note length from 58.0 to 127.70 following the inconsistent minimum note length in issue Inconsistent minimum note length #93
  2. modify the model_inference function as follow:
def model_inference(audio_path, model, save_path,minimum_note_length=127.70):

    output = run_inference(audio_path, model)

    
    frames = output["note"]
    onsets = output["onset"]
     # frames (13678, 88) onsets(13678, 88)

    min_note_len = int(np.round(minimum_note_length / 1000 * (AUDIO_SAMPLE_RATE / FFT_HOP))) # add min_note len since it is required

    estimated_notes = note_creation.output_to_notes_polyphonic(
        frames,
        onsets,
        onset_thresh=0.5,
        frame_thresh=0.3,
        infer_onsets=True,
        min_note_len=min_note_len, # needed in the function, it will throw error if not provided
        max_freq=None, # needed in the function, it will throw error if not provided
        min_freq=None # needed in the function, it will throw error if not provided
    )
    # [(start_time_seconds, end_time_seconds, pitch_midi, amplitude)]

    
   
    pitch = np.array([n[2] for n in estimated_notes]) 
    pitch_hz = librosa.midi_to_hz(pitch)



    estimated_notes_with_pitch_bend = note_creation.get_pitch_bends(output["contour"],estimated_notes)
    times_s = note_creation.model_frames_to_time(output["contour"].shape[0])
    
    estimated_notes_time_seconds = [
        (times_s[note[0]], times_s[note[1]], note[2], note[3], note[4]) for note in estimated_notes_with_pitch_bend
    ]
    
   
    midi = note_creation.note_events_to_midi(estimated_notes_time_seconds, save_path)

    

    intervals = np.array([[times_s[note[0]], times_s[note[1]]] for note in estimated_notes_with_pitch_bend])
    

    return intervals, pitch_hz,midi # add midi in the return to be used in the evaluation

  1. In the function main, instead of using the intervals and pitch_hz returned from the function model_inference, I used:
 __,_,midi = model_inference(audio_path, model, save_path)

est_notes = io.load_notes_from_midi(midi = midi)
if est_notes is None:
    est_intervals = []
    est_pitches = []
else:
    est_intervals, est_pitches, _ = est_notes.to_mir_eval()

I finally got the result that are close to the result reported in the paper:
{'Precision': 0.11997030494604051, 'Recall': 0.11606390831628464, 'F-measure': 0.11663329326696836, 'Average_Overlap_Ratio': 0.8401297548289717, 'Precision_no_offset': 0.7436669014704781, 'Recall_no_offset': 0.6548245337432261, 'F-measure_no_offset': 0.6874150165838026, 'Average_Overlap_Ratio_no_offset': 0.4262920646319229, 'Onset_Precision': 0.8259000078273144, 'Onset_Recall': 0.721544837754125, 'Onset_F-measure': 0.7601824436965499, 'Offset_Precision': 0.5818535280932536, 'Offset_Recall': 0.504137416529927, 'Offset_F-measure': 0.5329684074137423}

@drubinstein drubinstein assigned rabitt and unassigned zwycl Jan 12, 2024
@drubinstein
Copy link
Contributor

Hi @xinzuan. The training branch is still a work in progress, so don't rely on it too heavily. Regarding your issue, it's possible that there is a difference in units between the estimate, reference timestamps and frequency values and your solution took care of the difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants