Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn fil-deal-ingestor into automated service #26

Merged
merged 7 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
node_modules
generated
target
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -139,3 +139,6 @@ generated/update-spark-db.sql
# Added by cargo

/target

# Added manually
generated
33 changes: 33 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# build rust binary from src
FROM rust:1.79-slim AS builder

WORKDIR /usr/src/app

COPY ./src ./src
COPY ./Cargo.* .

RUN cargo build --release

# use node image to run the binary
FROM node:22-slim AS runtime

WORKDIR /usr/src/app

# install psql
RUN apt-get update && apt-get install -y postgresql-client curl

# copy built binary from builder
COPY --from=builder /usr/src/app/target/release/fil-deal-ingester .

# copy package.json and package-lock.lock
COPY ./package.json .
COPY ./package-lock.json .

# copy scripts
COPY ./scripts ./scripts
COPY ./run.sh .

# install node modules
RUN npm install

CMD ["./run.sh"]
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,48 @@ DATABASE_URL=postgres://user:password@localhost:5454/spark ./run.sh

> If you want to free up disk space, `generated/StateMarketDeals.ndjson` can weigh in at 40GB or more.
> Feel free to delete all files in the `generated` folder after the script finished running.


## Deployment to fly.io

**NOTE: Make sure you have the fly.io CLI installed and are logged in.**

### One-time setup

_Creating scheduled machine has to be done by hand as it's not possible to set up a schedule inside the fly.toml file._

Set up an app, volume and secrets:

```sh
fly apps create --name=fil-deal-ingester --org=<org-name>
fly volumes create fil_deal_ingester_data --size=80 --app=fil-deal-ingester --region=<region> --snapshot-retention=1
fly secrets set DATABASE_URL=<postgres-connection-string> --app=fil-deal-ingester
fly secrets set SLACK_WEBHOOK_URL=<slack-webhook-url> --app=fil-deal-ingester
```
bajtos marked this conversation as resolved.
Show resolved Hide resolved

Finally, create the machine with the following command:

```sh
fly machine run . \
--app=fil-deal-ingester \
--schedule=daily \
--region=<region> \
--volume fil_deal_ingester_data:/usr/src/app/generated \
--env JSON_CONVERTER_BIN=/usr/src/app/fil-deal-ingester \
--env ENVIRONMENT=docker \
--vm-size=shared-cpu-1x
```
bajtos marked this conversation as resolved.
Show resolved Hide resolved

### Updating existing machine

If you want to update the existing machine with new build, you'd have to get machine id first by running:

```sh
fly machine ls --app fil-deal-ingester
```

And then update the machine with the following command:

```sh
fly machine update <machine-id> --dockerfile Dockerfile
```
23 changes: 19 additions & 4 deletions run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,23 @@
set -e

DATABASE_URL="${DATABASE_URL?Missing required env var: DATABASE_URL}"
# defaults to ./target/release/fil-deal-ingester to support local ingestion and backwards compatibility
JSON_CONVERTER_BIN="${JSON_CONVERTER_BIN:-./target/release/fil-deal-ingester}"
pyropy marked this conversation as resolved.
Show resolved Hide resolved

mkdir -p generated

echo "** Building the JSON->NDJSON converter **"
cargo build --release
if [ $ENVIRONMENT = "docker" ]; then
echo "Skipping JSON->NDJSON converter build in docker environment"
else
echo "** Building the JSON->NDJSON converter **"
cargo build --release
fi

echo "** Downloading the latest market deals state **"
curl --fail -o ./generated/StateMarketDeals.json.zst https://marketdeals.s3.amazonaws.com/StateMarketDeals.json.zst

echo "** Converting from .json.zst to .ndjson **"
./target/release/fil-deal-ingester ./generated/StateMarketDeals.json.zst > generated/StateMarketDeals.ndjson
$JSON_CONVERTER_BIN ./generated/StateMarketDeals.json.zst > generated/StateMarketDeals.ndjson

echo "** Parsing retrievable deals **"
node scripts/parse-retrievable-deals.js
Expand All @@ -27,7 +33,16 @@ psql "$DATABASE_URL" -f generated/update-spark-db.sql | tee generated/dbupdate.l
echo "** Updating client-allocator mappings **"
node scripts/update-allocator-clients.js | tee generated/allocator-update.log

echo "** FINISHED INGESTION OF f05 DEALS**"
MESSAGE=$(
echo "**FINISHED INGESTION OF f05 DEALS**"
grep "^DELETE" < generated/dbupdate.log | awk '{s+=$2} END {print "Deleted: " s}'
grep "^INSERT" < generated/dbupdate.log | awk '{s+=$3} END {print "Added: " s}'
tail -1 generated/allocator-update.log
)

echo $MESSAGE

if [ -n "$SLACK_WEBHOOK_URL" ]; then
echo "** Sending message to slack **"
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$MESSAGE\"}" $SLACK_WEBHOOK_URL
fi