Cleans/extracts elements from content bodies. For use with content imports.
Note: This repository requires Docker Engine 18.06.0 or greater as Compose file format 3.7 is used. All commands are assumed to be ran from the project root.
- Clone the repository
- Install the project dependencies via
scripts/yarn.sh
- Start the service via
docker-compose up app
- The service will now be running on
http://0.0.0.0:4986
To run a cleanup/extraction on an HTML string, submit a POST
request to the desired endpoint (see below). All requests MUST contain the Content-Type: text/html
header and provide a raw HTML body (not a JSON body or a JSON encoded HTML body). This can be done via cURL, fetch, or any other tool (e.g. Insomnia). The server will return a JSON response with the extracted values and the HTML (the exact format varies depending on the rule).
Note: It is assumed that posted HTML will be encoded in UTF-8 (and will respond in-kind). As such, ensure character encoding conversions have been completed before using this service.
/pennwell/default
See the rule's documentation for more information.
Because this repository uses Docker, you should not execute Yarn directly. Instead, execute Yarn commands using the provided script. For example, to add a dependency you would run scripts/yarn.sh add [package-name]
from the project root. This works for all Yarn commands, e.g, scripts/yarn.sh [command] [args]
You can execute an interactive terminal (inside the Docker container) via scripts/terminal.sh
. You can also lint the entire project using scripts/lint.sh