Skip to content

Commit

Permalink
Turn fil-deal-ingestor into automated service (#26)
Browse files Browse the repository at this point in the history
* Turn fil-deal-ingestor into automated service

- Add dockerfile and fly config
- Refactor run script to support running from docker container
- Update .gitignore and add .dockerignore files

* Add deployment docs; Remove fly cfg

* Format readme and add comments to run.sh

* Fix typo in run.sh

Co-authored-by: Miroslav Bajtoš <[email protected]>

* Add steps for updating machine to README

* Apply suggestions from code review

Co-authored-by: Julian Gruber <[email protected]>

* Update README.md

---------

Co-authored-by: Miroslav Bajtoš <[email protected]>
Co-authored-by: Julian Gruber <[email protected]>
  • Loading branch information
3 people authored Jan 8, 2025
1 parent 25f8e97 commit 2147b22
Show file tree
Hide file tree
Showing 5 changed files with 103 additions and 4 deletions.
3 changes: 3 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
node_modules
generated
target
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -139,3 +139,6 @@ generated/update-spark-db.sql
# Added by cargo

/target

# Added manually
generated
33 changes: 33 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# build rust binary from src
FROM rust:1.79-slim AS builder

WORKDIR /usr/src/app

COPY ./src ./src
COPY ./Cargo.* .

RUN cargo build --release

# use node image to run the binary
FROM node:22-slim AS runtime

WORKDIR /usr/src/app

# install psql
RUN apt-get update && apt-get install -y postgresql-client curl

# copy built binary from builder
COPY --from=builder /usr/src/app/target/release/fil-deal-ingester .

# copy package.json and package-lock.lock
COPY ./package.json .
COPY ./package-lock.json .

# copy scripts
COPY ./scripts ./scripts
COPY ./run.sh .

# install node modules
RUN npm install

CMD ["./run.sh"]
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,48 @@ DATABASE_URL=postgres://user:password@localhost:5454/spark ./run.sh

> If you want to free up disk space, `generated/StateMarketDeals.ndjson` can weigh in at 40GB or more.
> Feel free to delete all files in the `generated` folder after the script finished running.

## Deployment to fly.io

**NOTE: Make sure you have the fly.io CLI installed and are logged in.**

### One-time setup

_Creating scheduled machine has to be done by hand as it's not possible to set up a schedule inside the fly.toml file._

Set up an app, volume and secrets:

```sh
fly apps create --name=fil-deal-ingester --org=<org-name>
fly volumes create fil_deal_ingester_data --size=80 --app=fil-deal-ingester --region=<region> --snapshot-retention=1
fly secrets set DATABASE_URL=<postgres-connection-string> --app=fil-deal-ingester
fly secrets set SLACK_WEBHOOK_URL=<slack-webhook-url> --app=fil-deal-ingester
```

Finally, create the machine with the following command:

```sh
fly machine run . \
--app=fil-deal-ingester \
--schedule=daily \
--region=<region> \
--volume fil_deal_ingester_data:/usr/src/app/generated \
--env JSON_CONVERTER_BIN=/usr/src/app/fil-deal-ingester \
--env ENVIRONMENT=docker \
--vm-size=shared-cpu-1x
```

### Updating existing machine

If you want to update the existing machine with new build, you'd have to get machine id first by running:

```sh
fly machine ls --app fil-deal-ingester
```

And then update the machine with the following command:

```sh
fly machine update <machine-id> --dockerfile Dockerfile
```
23 changes: 19 additions & 4 deletions run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,23 @@
set -e

DATABASE_URL="${DATABASE_URL?Missing required env var: DATABASE_URL}"
# defaults to ./target/release/fil-deal-ingester to support local ingestion and backwards compatibility
JSON_CONVERTER_BIN="${JSON_CONVERTER_BIN:-./target/release/fil-deal-ingester}"

mkdir -p generated

echo "** Building the JSON->NDJSON converter **"
cargo build --release
if [ $ENVIRONMENT = "docker" ]; then
echo "Skipping JSON->NDJSON converter build in docker environment"
else
echo "** Building the JSON->NDJSON converter **"
cargo build --release
fi

echo "** Downloading the latest market deals state **"
curl --fail -o ./generated/StateMarketDeals.json.zst https://marketdeals.s3.amazonaws.com/StateMarketDeals.json.zst

echo "** Converting from .json.zst to .ndjson **"
./target/release/fil-deal-ingester ./generated/StateMarketDeals.json.zst > generated/StateMarketDeals.ndjson
$JSON_CONVERTER_BIN ./generated/StateMarketDeals.json.zst > generated/StateMarketDeals.ndjson

echo "** Parsing retrievable deals **"
node scripts/parse-retrievable-deals.js
Expand All @@ -27,7 +33,16 @@ psql "$DATABASE_URL" -f generated/update-spark-db.sql | tee generated/dbupdate.l
echo "** Updating client-allocator mappings **"
node scripts/update-allocator-clients.js | tee generated/allocator-update.log

echo "** FINISHED INGESTION OF f05 DEALS**"
MESSAGE=$(
echo "**FINISHED INGESTION OF f05 DEALS**"
grep "^DELETE" < generated/dbupdate.log | awk '{s+=$2} END {print "Deleted: " s}'
grep "^INSERT" < generated/dbupdate.log | awk '{s+=$3} END {print "Added: " s}'
tail -1 generated/allocator-update.log
)

echo $MESSAGE

if [ -n "$SLACK_WEBHOOK_URL" ]; then
echo "** Sending message to slack **"
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$MESSAGE\"}" $SLACK_WEBHOOK_URL
fi

0 comments on commit 2147b22

Please sign in to comment.