Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artefact Evaluation badges & criteria #3

Open
1 task
rolyp opened this issue Apr 3, 2024 · 15 comments
Open
1 task

Artefact Evaluation badges & criteria #3

rolyp opened this issue Apr 3, 2024 · 15 comments
Assignees

Comments

@rolyp
Copy link
Collaborator

rolyp commented Apr 3, 2024

Summary Sentence

Badges that may be awarded to authors that participate in the Artefact Evaluation process.

Subtasks:

  • Review POPL 2022 AE reviewer guidelines for ideas about how to structure reviews

See also:

@rolyp rolyp mentioned this issue Apr 8, 2024
8 tasks
@rolyp rolyp changed the title Artefact Evaluation badges Artefact Evaluation badges & criteria Apr 3, 2024
@rolyp
Copy link
Collaborator Author

rolyp commented Apr 3, 2024

@dorchard @MarionBWeinzierl @acocac @cassgvp I’ve created a separate issue for this since I imagine these being refined quite a bit as we move forward. Added a link to the current ACM process.

I feel we should probably stick to something broadly equivalent to the first 3 ACM criteria, i.e. Available, Functional, and Reusable, where “Functional” means (broadly) “Reproducible” (the same outputs can be obtained independently using the author’s artifacts). (There may be a case for preferring the term “Reproducible” to “Functional” to make this explicit.)

The ACM has 2 additional criteria/badges, “Results Reproduced” and “Results Replicated”. I think we can ignore both of these: the latter is clearly out of scope and the former is, for our purposes at least, mostly subsumed by Functional.

I think the key point for us is that “Functional” should mean functional with respect to the (computational) results presented in the paper, and “Reusable” should imply Functional.

@rolyp
Copy link
Collaborator Author

rolyp commented Apr 3, 2024

Further question:

  • Does Available say anything above and beyond author-declared Open Data and Open Materials? If not, will Available be useful/meaningful to have as a separate badge awarded in the addendum?

@MarionBWeinzierl
Copy link
Collaborator

Further question:

* Does Available say anything above and beyond author-declared Open Data and Open Materials? If not, will Available be useful/meaningful to have as a separate badge awarded in the addendum?

I think that Available is equivalent with the Open Data badge that CUP is awarding, only that it also extends to the software or is that included in that badge already, too?

@MarionBWeinzierl
Copy link
Collaborator

I started the checklist by copying over the description from the ACM page. We should think about slight rewording or examples, where necessary (e.g., I added a reference to FAIR and FAIR4RS).

@cassgvp
Copy link
Collaborator

cassgvp commented Apr 5, 2024

Just adding a link to the hackmd for some context on the discussions: https://hackmd.io/@turing-es/By7jk3eIp

@rolyp rolyp transferred this issue from alan-turing-institute/climate-informatics-2024 Apr 8, 2024
@dorchard
Copy link
Collaborator

I think this looks good to me. Should 'Available' also include mention of data, e.g., that relevant data sets are available where possible?

@dorchard
Copy link
Collaborator

I added a part to the 'Available' badge about tagging the version / doing a release.

@MarionBWeinzierl
Copy link
Collaborator

MarionBWeinzierl commented Apr 10, 2024

I think this looks good to me. Should 'Available' also include mention of data, e.g., that relevant data sets are available where possible?

You are right, we should probably add data explicitly. Although I'd think it's all covered under the first bullet point, so maybe that's enough?

@dorchard
Copy link
Collaborator

I updated the Reusable points which previously repeated points from functional ( Documented, Consistent, Complete, Exercisable) the latter three which I removed and added about being packaged to enable reuse.

@dorchard
Copy link
Collaborator

Moved the text in this issue to a file: https://github.com/alan-turing-institute/climate-informatics-2024-ae/blob/main/badges.md

@dorchard
Copy link
Collaborator

Are we happy with these now?

@MarionBWeinzierl
Copy link
Collaborator

Just one question about the "exercisable" point: When we talk about obtaining results, do we want to explicitly talk about generating figures, too, or are we happy with a dump of numbers?

@rolyp
Copy link
Collaborator Author

rolyp commented Apr 15, 2024

@MarionBWeinzierl I think it’s reasonable to expect figures to be reproducible (via some kind of script or manual process), with some room for reviewer discretion.

@MarionBWeinzierl
Copy link
Collaborator

OK, I added that under "exercisable"

@rolyp
Copy link
Collaborator Author

rolyp commented May 10, 2024

Added a TODO to look at the POPL 2022 AE reviewer guidelines, as it might be useful to add a bit more structure to the review format.

For example, they suggest organising reviews around specific content in the paper:

Q1: What is the central contribution of the paper?
Q2: What claims do the authors make of the artifact, and how does it connect to Q1 above?
Q3: Can you list the specific, significant experimental claims made in the paper (such as figures, tables, etc.)?
Q4: What do you expect as a reasonable range of deviations for the experimental results?

Q9: Does the artifact provide evidence for all the claims you noted in Q3? This corresponds to the completeness criterion of your evaluation.
Q10: Do the results of running / examining the artifact meet your expectations after having read the paper? This corresponds to the criterion of consistency between the paper and the artifact
Q11: Is the artifact well-documented, to the extent that answering questions Q5–Q10 is straightforward?

I think the idea of focusing around specific claims made in the paper (in the form of specific figures or tables) is a good one, and might help reviewers make their reviews more evidence-based (and encourage authors to think of their artefacts in terms of how they support specific claims).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants