Ferry v2 - UNDER DEVELOPMENT

Installation

Ferry uses VSCode's Dev Container for the dev environment. All environment setup is handled through the standard Dev Container pipeline. For a quick start, use the Dev Containers: Reopen in Container command in VSCode.

MacOS: If you want to develop outside of the Dev Container, please install pygraphviz first before running pip install -e .

pip install -U --no-cache-dir  \
            --config-settings="--global-option=build_ext" \
            --config-settings="--global-option=-I$(brew --prefix graphviz)/include/" \
            --config-settings="--global-option=-L$(brew --prefix graphviz)/lib/" \
            pygraphviz

Installing submodules

Ferry uses a submodule for the data repository. To clone the submodule, run:

git submodule update --init --depth 1

Any time in the future, you can update the submodule with:

cd data
git fetch --depth 1 origin master
git reset --hard origin/master

Then, you can commit the new submodule link to the main repo. It's not necessary to sync the latest link all the time, but feel free to whenever you are committing other changes.

Running Ferry

Make sure you have all dependencies installed or are using the Dev Container.

python main.py -f config/dev_fetch.yml
python main.py -f config/dev_sync_db.yml

Options

Options can be passed either through the command line, through a YAML config file, or through environment variables, in decreasing order of precedence.

Note: because we use argparse, you can provide just the prefix of each argument, such as --config instead of --config-file.

CLI flag	Config option	Env key	Default	Description
`--cas-cookie`	`cas_cookie`	`CAS_COOKIE`	`None`; prompt if `crawl_evals`	Only used for fetching evals; see below
`-f`, `--config-file`	N/A	N/A	`None`	Path to YAML config file, relative to PWD; if unspecified, all options are read from command
`--data-dir`	`data_dir`	N/A	`data`	Directory to load/store parsed data. This is usually where the `ferry-data` is cloned.
`--database-connect-string`	`database_connect_string`	`POSTGRES_URI`	`None`; prompt if `sync_db`	Postgres connection string; for dev, see `dev_sync_db.yml`
`-d`, `--debug`	`debug`	N/A	`False`	Enable debug logging
`-r`, `--release`	`release`	N/A	`False`	Run in release mode; see below
`-s`, `--seasons`	`seasons`	N/A	`None`	A list of seasons to fetch; see below
`--sentry-url`	`sentry_url`	`SENTRY_URL`	`None`; prompt if `release`	Sentry URL for error reporting; required in release mode, ignored in dev mode
`--use-cache`	`use_cache`	N/A	`False`; always `False` if `release`	Use cached data instead of fetching fresh data. Even if not using cache, cache will still be updated.

Stage switches

You can selectively run certain stages of Ferry. Options below are in the order of the stages. All stages are False by default, in which case Ferry basically does nothing.

CLI flag	Config option	Description
`--save-config`	`save_config`	Save the parsed config options to `config_file`; does nothing if `config_file` is unspecified.
`--crawl-seasons`	`crawl_seasons`	Run the seasons crawler. If `False` and no cached data is available, then `--seasons` must be provided and all be valid seasons.
`--crawl-classes`	`crawl_classes`	Run the class crawler.
`--crawl-evals`	`crawl_evals`	Run the eval crawler.
`--transform`	`transform`	Run the transformer. Always `True` if `sync_db` or `snapshot_tables`.
`--snapshot-tables`	`snapshot_tables`	Generate CSV files capturing data that would be written to DB.
`--sync-db`	`sync_db`	Sync the transformed data to the database.
`--rewrite`	`rewrite`	Use old sync db function to write to database instead of incremental write. Use this during the first run to set up the database. Only has an effect if `sync_db`.
`--generate-diagram`	`generate_diagram`	Generate a DB visualization diagram to `docs/db_diagram.pdf`

For example, to test the transformer in isolation, you may find this handy:

python main.py --transform --snapshot-tables

Release mode

In release mode:

use_cache is always False; we always fetch fresh data.
sentry_url is required; we will send error reports to Sentry.

In non-release mode:

We never initialize Sentry and sentry_url is ignored.

CAS cookie

To get a valid CAS cookie to connect to OCE, first log into https://oce.app.yale.edu/ocedashboard/studentViewer. Refresh with the network inspector tab open, and find the first HTTP request. Copy the Cookie header from the request and paste into the prompt or the cas_cookie option. It should look like:

JSESSIONID=...; ...

Specifying seasons

If the seasons option is unspecified, we default to fetching ALL viable seasons.
If the seasons option contains a single value called LATEST_n, we fetch the latest n seasons.
Otherwise, the seasons option should be a list of seasons to fetch, using the standard season code format: e.g. --seasons 202301 202303.

In almost all cases, it is sufficient to only fetch the last 3 seasons in dev. In fact, ferry will not work when fetching and syncing to DB all seasons in dev due to professor_id mapping requiring legacy seasons. Please clone ferry-data if working with all seasons is necessary.

Linting & formatting

black .
pyright

TODO

Ensure that parallel evals is accurate
Rearchitect GitHub action to create Postgres dump, upload with VC (enable lifecycle management) to Azure Blob, and restart prod postgres container to trigger new pull.
Modernize dependencies
SIGINT handler
Lint and format CI

Name		Name	Last commit message	Last commit date
Latest commit History 1,058 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
config		config
data @ 660d96d		data @ 660d96d
docs		docs
ferry		ferry
lambda		lambda
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
launch_db.sh		launch_db.sh
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ferry v2 - UNDER DEVELOPMENT

Installation

Installing submodules

Running Ferry

Options

Stage switches

Release mode

CAS cookie

Specifying seasons

Linting & formatting

TODO

About

Releases

Packages

Contributors 21

Languages

License

coursetable/ferry

Folders and files

Latest commit

History

Repository files navigation

Ferry v2 - UNDER DEVELOPMENT

Installation

Installing submodules

Running Ferry

Options

Stage switches

Release mode

CAS cookie

Specifying seasons

Linting & formatting

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 21

Languages

Packages