Skip to content

CircleCI-Public/evals-orb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

evals orb

CircleCI Build Status CircleCI Orb Version GitHub license CircleCI Community

This repository has the code for the CircleCI Evals Orb.

The Evals orb simplifies the definition and execution of evaluation jobs using popular third-party tools, and generates reports of evaluation results.

Given the volatile nature of evaluations, evaluations orchestrated through this orb do not halt the pipeline if an evaluation fails. This approach ensures that the inherent flakiness of evaluations does not disrupt the development cycle. Instead, a summary of the evaluation results is created and presented:

  • As an artifact within the CircleCI User Interface:

    Screenshot 2024-04-30 at 10 19 53
  • As a comment on the corresponding GitHub pull request (only available for GitHub projects integrated through OAuth):

    Screenshot 2024-04-30 at 10 21 48

Usage

Getting Started

Enter your OpenAI, LangSmith, and/or BrainTrust credentials into CircleCI

Just navigate to Project Settings > LLMOps and fill out the form by Clicking Set up Integration.

Create Context

This will create a context with environment variables for the credentials you've set up above.

⚠️ Please take note of the generated context name (e.g. ai-llm-eval-examples). This will be used to update context value in the CircleCI configuration file.

LLMOps Integration Context

💡 You can also optionally store a GITHUB_TOKEN as an environment variable on this context, if you'd like your pipelines to post summarized eval job results as comments on GitHub pull requests.

Set up the orb to post eval job summaries as comments on GitHub pull requests

Warning

Currently, this feature is available only to GitHub projects integrated through OAuth. To find out which GitHub account type you have, refer to the GitHub OAuth integration page of our Docs.

In order to post comments to GitHub pull requests, you will need to create an environment variable named GITHUB_TOKEN with a GitHub Personal Access Token that has repo scope access.

Once created, add GITHUB_TOKEN as a context environment variable on the same context you created as part of LLMOps Integration via Project Settings > LLMOps.

You can also access this context via Organization Settings > Contexts.

You will then need to ensure you add the context key to the job that requires access to it, as follows...

# WORKFLOWS
workflows:
  braintrust-evals:
    when: << pipeline.parameters.run-braintrust-evals >>
    jobs:
      - run-braintrust-evals:
          context:
            - ai-llm-eval-examples # Replace this with your context name
  langsmith-evals:
    when: << pipeline.parameters.run-langsmith-evals >>
    jobs:
      - run-langsmith-evals:
          context:
            - ai-llm-eval-examples # Replace this with your context name

Orb Parameters

The evals orb accepts the following parameters:

Some of the parameters are optional based on the eval platform being used.

Common parameters

  • circle_pipeline_id: CircleCI Pipeline ID

  • cmd: Command to run the evaluation

  • eval_platform: Evaluation platform (e.g. braintrust, langsmith etc.; default: braintrust)

  • evals_result_location: Location to save evaluation results (default: ./results)

Braintrust-specific parameters

  • braintrust_experiment_name (optional): Braintrust experiment name
    • If no value is provided, an experiment name will be auto-generated based on an MD5 hash of <CIRCLE_PIPELINE_ID>_<CIRCLE_WORKFLOW_ID>.

LangSmith-specific parameters

  • langsmith_endpoint (optional): LangSmith API endpoint (default: https://api.smith.langchain.com)

  • langsmith_experiment_name (optional): LangSmith experiment name

    • If no value is provided, an experiment name will be auto-generated based on an MD5 hash of <CIRCLE_PIPELINE_ID>_<CIRCLE_WORKFLOW_ID>.

Use in Config

For full config usage guidelines, see the evals orb documentation.

Usage Examples

For evals orb usage examples, check out the llm-eval-examples repo.

FAQ

View the FAQ in the wiki

Contributing

We welcome issues to and pull requests against this repository!

For further questions/comments about this or other orbs, visit the CircleCI Orbs discussion forum.