Artefact Evaluation badges & criteria #3

rolyp · 2024-04-03T06:50:44Z

Summary Sentence

Badges that may be awarded to authors that participate in the Artefact Evaluation process.

Subtasks:

Review POPL 2022 AE reviewer guidelines for ideas about how to structure reviews

See also:

Working draft of badges and criteria
Artefact Evaluation process #1
ACM Artifact Review and Badging guidelines (v1.1, August 2020)

rolyp · 2024-04-03T07:33:48Z

@dorchard @MarionBWeinzierl @acocac @cassgvp I’ve created a separate issue for this since I imagine these being refined quite a bit as we move forward. Added a link to the current ACM process.

I feel we should probably stick to something broadly equivalent to the first 3 ACM criteria, i.e. Available, Functional, and Reusable, where “Functional” means (broadly) “Reproducible” (the same outputs can be obtained independently using the author’s artifacts). (There may be a case for preferring the term “Reproducible” to “Functional” to make this explicit.)

The ACM has 2 additional criteria/badges, “Results Reproduced” and “Results Replicated”. I think we can ignore both of these: the latter is clearly out of scope and the former is, for our purposes at least, mostly subsumed by Functional.

I think the key point for us is that “Functional” should mean functional with respect to the (computational) results presented in the paper, and “Reusable” should imply Functional.

rolyp · 2024-04-03T07:52:04Z

Further question:

Does Available say anything above and beyond author-declared Open Data and Open Materials? If not, will Available be useful/meaningful to have as a separate badge awarded in the addendum?

MarionBWeinzierl · 2024-04-05T13:39:56Z

Further question:

* Does Available say anything above and beyond author-declared Open Data and Open Materials? If not, will Available be useful/meaningful to have as a separate badge awarded in the addendum?

I think that Available is equivalent with the Open Data badge that CUP is awarding, only that it also extends to the software or is that included in that badge already, too?

MarionBWeinzierl · 2024-04-05T13:49:21Z

I started the checklist by copying over the description from the ACM page. We should think about slight rewording or examples, where necessary (e.g., I added a reference to FAIR and FAIR4RS).

cassgvp · 2024-04-05T22:34:17Z

Just adding a link to the hackmd for some context on the discussions: https://hackmd.io/@turing-es/By7jk3eIp

dorchard · 2024-04-10T05:46:53Z

I think this looks good to me. Should 'Available' also include mention of data, e.g., that relevant data sets are available where possible?

dorchard · 2024-04-10T06:34:19Z

I added a part to the 'Available' badge about tagging the version / doing a release.

MarionBWeinzierl · 2024-04-10T06:49:00Z

I think this looks good to me. Should 'Available' also include mention of data, e.g., that relevant data sets are available where possible?

You are right, we should probably add data explicitly. Although I'd think it's all covered under the first bullet point, so maybe that's enough?

dorchard · 2024-04-10T07:07:40Z

I updated the Reusable points which previously repeated points from functional ( Documented, Consistent, Complete, Exercisable) the latter three which I removed and added about being packaged to enable reuse.

dorchard · 2024-04-12T12:27:20Z

Moved the text in this issue to a file: https://github.com/alan-turing-institute/climate-informatics-2024-ae/blob/main/badges.md

dorchard · 2024-04-12T12:27:36Z

Are we happy with these now?

MarionBWeinzierl · 2024-04-15T11:36:01Z

Just one question about the "exercisable" point: When we talk about obtaining results, do we want to explicitly talk about generating figures, too, or are we happy with a dump of numbers?

rolyp · 2024-04-15T12:36:03Z

@MarionBWeinzierl I think it’s reasonable to expect figures to be reproducible (via some kind of script or manual process), with some room for reviewer discretion.

MarionBWeinzierl · 2024-04-15T12:39:24Z

OK, I added that under "exercisable"

rolyp · 2024-05-10T09:23:04Z

Added a TODO to look at the POPL 2022 AE reviewer guidelines, as it might be useful to add a bit more structure to the review format.

For example, they suggest organising reviews around specific content in the paper:

Q1: What is the central contribution of the paper?
Q2: What claims do the authors make of the artifact, and how does it connect to Q1 above?
Q3: Can you list the specific, significant experimental claims made in the paper (such as figures, tables, etc.)?
Q4: What do you expect as a reasonable range of deviations for the experimental results?

Q9: Does the artifact provide evidence for all the claims you noted in Q3? This corresponds to the completeness criterion of your evaluation.
Q10: Do the results of running / examining the artifact meet your expectations after having read the paper? This corresponds to the criterion of consistency between the paper and the artifact
Q11: Is the artifact well-documented, to the extent that answering questions Q5–Q10 is straightforward?

I think the idea of focusing around specific claims made in the paper (in the form of specific figures or tables) is a good one, and might help reviewers make their reviews more evidence-based (and encourage authors to think of their artefacts in terms of how they support specific claims).

rolyp assigned dorchard, rolyp and MarionBWeinzierl Apr 3, 2024

rolyp mentioned this issue Apr 8, 2024

Artefact Evaluation process #1

Open

8 tasks

rolyp changed the title ~~Artefact Evaluation badges~~ Artefact Evaluation badges & criteria Apr 3, 2024

rolyp transferred this issue from alan-turing-institute/climate-informatics-2024 Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Artefact Evaluation badges & criteria #3

Artefact Evaluation badges & criteria #3

rolyp commented Apr 3, 2024 •

edited

rolyp commented Apr 3, 2024 •

edited

rolyp commented Apr 3, 2024 •

edited

MarionBWeinzierl commented Apr 5, 2024

MarionBWeinzierl commented Apr 5, 2024

cassgvp commented Apr 5, 2024

dorchard commented Apr 10, 2024

dorchard commented Apr 10, 2024

MarionBWeinzierl commented Apr 10, 2024 •

edited

dorchard commented Apr 10, 2024

dorchard commented Apr 12, 2024

dorchard commented Apr 12, 2024

MarionBWeinzierl commented Apr 15, 2024

rolyp commented Apr 15, 2024

MarionBWeinzierl commented Apr 15, 2024

rolyp commented May 10, 2024 •

edited

Artefact Evaluation badges & criteria #3

Artefact Evaluation badges & criteria #3

Comments

rolyp commented Apr 3, 2024 • edited

Summary Sentence

rolyp commented Apr 3, 2024 • edited

rolyp commented Apr 3, 2024 • edited

MarionBWeinzierl commented Apr 5, 2024

MarionBWeinzierl commented Apr 5, 2024

cassgvp commented Apr 5, 2024

dorchard commented Apr 10, 2024

dorchard commented Apr 10, 2024

MarionBWeinzierl commented Apr 10, 2024 • edited

dorchard commented Apr 10, 2024

dorchard commented Apr 12, 2024

dorchard commented Apr 12, 2024

MarionBWeinzierl commented Apr 15, 2024

rolyp commented Apr 15, 2024

MarionBWeinzierl commented Apr 15, 2024

rolyp commented May 10, 2024 • edited

rolyp commented Apr 3, 2024 •

edited

rolyp commented Apr 3, 2024 •

edited

rolyp commented Apr 3, 2024 •

edited

MarionBWeinzierl commented Apr 10, 2024 •

edited

rolyp commented May 10, 2024 •

edited