Detailed guide on what AI benchmark metrics mean and how to use these to find the best foundation model for the use case at hand.
Metric/challenge name | TLDR explanation | Link to detailed explanation | Link to paper |
---|---|---|---|
GSM8K | Solve 'grade school math word problems' | Detailed explanation | arXiv |
- GSM8K (Detailed explanation, arXiv)
TBD
Above information may be wrong.