postgres database

We are using POSTGRESQL as the store for the raw scraped data from the various data sources.
The schemas are quite similar to the scraped data structures.

Table of Contents

Instagram
- Remarks
Twitter
Debezium

Instagram

This database is the more sophisticated one and is running in production.

Remarks

internal_picture_url is pointing to the downloaded picture on S3

Twitter

This database is not in production yet and at the moment only dumps the tweaked scraped data.

Debezium

The debezium connector generates a change stream from all change events in postgres (read, create, update, delete) and writes them into a kafka-topic "postgres.public.<table_name>"

To read from this stream you can:

get kafkacat

inspect the topic list in kafka:

$ kafkacat -L -b my-kafka | grep 'topic "postgres'

consume a topic with
```
$ kafkacat -b my-kafka -t <topic_name>
```

The messages are quite verbose, since they include their own schema description. The most interesting part is the value.payload:

$ kafkacat -b my-kafka -topic postgres.public.users | jq '.value | fromjson | .payload'`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

postgres database

Instagram

Remarks

Twitter

Debezium

Files

README.md

Latest commit

History

README.md

File metadata and controls

postgres database

Instagram

Remarks

Twitter

Debezium