-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about inconsistencies between the paper and the released data #3
Comments
To follow up on this maybe we can restrict the scope of the question to the consistency with the datasets. In downloading the data I found: MLOS: 167 - 543 It would be helpful if the authors clarified the length discrepancies between mRNA stability, Fungal Expression, and MLOS datasets. Thanks a lot and congrats on the publication! |
Thank you for integrating and opensource the Benckmark dataset.
I noticed that there are some inconsistencies between statistics in the paper and the released data in
benchmarks/CodonBERT/data
. Here are the confusing parts:Could you kindly clarify them?
BTW, I noticed that some of the datasets are very small. When using a 0.7/0.15/0.15 split on such a small dataset and computing metrics like correlation, the results are not reliable. It would be better that you use k-fold cross validation.
The text was updated successfully, but these errors were encountered: