-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
18e965f
commit f2571d0
Showing
5 changed files
with
167 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,106 @@ | ||
# Horseman | ||
# 👋 | ||
|
||
This repository is for tracking issues and feature requests for the Horseman crawler. | ||
This repository is for detailed update notes, and for tracking issues and feature requests for the [Horseman](https://gethorseman.app) crawler. | ||
|
||
https://github.com/workeffortwaste/horseman/issues | ||
|
||
👀 | ||
# Horseman v0.3 | ||
|
||
## Current Version | ||
**tl;dr** 🕷️ *Crawl the web with GPT3.5 and use page content with prompts. 🤖Don't know JS? Create snippets with an AI helper instead. 🌌 Deeper exploration with new the Insights feature. 🤯 Huge number of new snippets and much more.* | ||
|
||
The latest version is v0.2.4 for Windows, MacOS (M1 and Intel), and Linux. | ||
## Patch Notes | ||
|
||
Let's get the obligatory patch notes out the way. A lot of bugs have been squashed, elements polished, and the edges removed from many cases. | ||
|
||
## New Features | ||
|
||
With that out the way let’s talk about the big new features in Horseman v0.3. | ||
|
||
### OpenAI GPT 3.5 Integration | ||
|
||
GPT3.5 has been integrated directly in Horseman in a couple of ways. 👇 | ||
|
||
### Prompts In Snippets | ||
|
||
****************Requires an OpenAI API key (add yours under the main Settings menu).**************** | ||
|
||
Query GPT with any prompt by returning the new `prompt` property from your snippet. Combine any piece of page data, or send the entire page to GPT for analysis. | ||
|
||
Example snippets using page content have been added to the *New* button in the Editor. You’ll also find newly created built-in snippets in the library to rewrite meta descriptions, write missing meta descriptions, and summarize page content. | ||
|
||
```jsx | ||
/* horseman-config enable-openai */ | ||
|
||
/* Generate a beautiful poem from the meta description using GPT */ | ||
|
||
/* Fetch the meta description */ | ||
const meta = document.querySelector('meta[name="description"]')?.getAttribute('content') | ||
|
||
/* Skip the snippet if the page doesn't have a meta description */ | ||
if (!meta) return | ||
|
||
/* Ask gpt-3.5-turbo to generate a poem */ | ||
return { | ||
prompt: `Rewrite the following as a short 160 character poem: ${meta}` | ||
} | ||
``` | ||
Further basic examples can be found in the `examples` folder in this repository. | ||
### No-Code AI Helper | ||
****************************Does not require an OpenAI API key.**************************** | ||
Don’t know JavaScript? That’s no longer an issue. You’ll find a new button in the Editor, *No-Code AI*, which will use the power of GPT to automatically create simple Horseman snippets from a description. | ||
Entering a brief description such as `the social meta image` will instantly create a snippet fetching and returning the image used for social shares in the open graph data. | ||
## Insights | ||
A fully explorable report showing statistics about the number of failing or passing pages per snippet. *Think of it as your own customisable audit.* | ||
All built in snippets have been updated to support this feature (where it makes sense) and your own snippets can use this feature with ease. Any snippet returning a `pass` / `fail` string (or the new State property) can enable Insights with the new configuration helper or the `enable-insights` directive. | ||
## States | ||
You are now no longer limited to defining a snippet as passing by only returning a `pass` / `fail` string, with the new state property of the return object you can show the state alongside the data in the results table. | ||
```jsx | ||
/* horseman-config enable-state */ | ||
|
||
/* Using the new state property to send a fail state if num is more than 1 */ | ||
const num = 2 | ||
|
||
return { | ||
cell: num | ||
state: num > 1 ? 'fail' : 'pass' | ||
} | ||
``` | ||
## Deeper Explorations | ||
Explore additional data more easily by viewing it per snippet instead of just by page. Hover over the column header for a snippet with explore enabled to find the new option. | ||
What’s more, when in the additional table, hovering over the column headers you’ll then be able to filter the results by unique values. | ||
## Configuration Helper | ||
As the features of Horseman have grown trying to remember all the snippet directives can be a little tricky. A new configuration helper will toggle your `horseman-config` directives with no fuss. | ||
## Updated Chrome | ||
The version of Chromium used for crawling has been updated to v111 for a faster and more robust crawl. *My favourite new addition?* You can now use the `:has` selector in your snippets for more advanced element selection! | ||
## More Data Types | ||
Automatically split up CSV strings into tags with the new `data-type-tag`. | ||
Visualise colours directly in your crawl with `data-type-color` return any valid CSS colour value and render it as a coloured tag in the results table. | ||
## New Snippets | ||
*Over 40 new snippets have been added to the library!* Including ecommerce snippets which extract product information straight from the DataLayer, AI enhanced snippets to rewrite and analyse content, cookie consent platform discovery, and much more. | ||
## Updated Website | ||
I've given the website a small refresh as the tool get much closer to v1.0, check it out and download the latest version now. https://gethorseman.app/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
/* horseman-config enable-openai, enable-external */ | ||
|
||
/** | ||
* This is an example of intelligently extracting the text content from a page | ||
* (with readability.js) and using it with a prompt. | ||
*/ | ||
|
||
/* Import the readability content extraction package from skypsack */ | ||
const pkg = (await import('https://cdn.skypack.dev/@mozilla/readability')).default | ||
|
||
/* Create a clone of the document for readability to modify */ | ||
const documentClone = document.cloneNode(true); | ||
|
||
/* Remove any elements from the clone that we definitely don't want to extract content from */ | ||
['header', 'footer', 'nav'].forEach(selector => { | ||
documentClone.querySelectorAll(selector).forEach(element => { element.remove() }) | ||
}) | ||
|
||
/* Extract the page content from the clone using readability */ | ||
const content = (new pkg.Readability(documentClone).parse()).textContent | ||
|
||
/* Ask gpt-3.5-turbo to summarize the content */ | ||
return { | ||
prompt: \`Summarize the following content: \${ content }\` | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
/* horseman-config enable-openai */ | ||
|
||
/** | ||
* This is a simple example of using the raw HTML content with a prompt. | ||
*/ | ||
|
||
/* Fetch the raw HTML */ | ||
const html = document.documentElement.outerHTML | ||
|
||
/* Ask gpt-3.5-turbo to generate a new description */ | ||
return { | ||
prompt: \`Do something with the following HTML: \${ html } \` | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
/* horseman-config enable-openai */ | ||
|
||
/** | ||
* This is a simple example of using the raw text content with a prompt. | ||
*/ | ||
|
||
/* Fetch the raw text */ | ||
const html = document.body.innerText | ||
|
||
/* Ask gpt-3.5-turbo to generate a new description */ | ||
return { | ||
prompt: \`Do something with the following text: \${ html } \` | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
/* horseman-config enable-openai */ | ||
|
||
/** | ||
* This is a simple example of using the content from a single HTML element with a prompt. | ||
*/ | ||
|
||
/* Fetch the meta description */ | ||
const meta = document.querySelector('meta[name="description"]')?.getAttribute('content') | ||
|
||
/* Skip the snippet if the page doesn't have a meta description */ | ||
if (!meta) return | ||
|
||
/* Ask gpt-3.5-turbo to generate a new description */ | ||
return { | ||
prompt: \`Rewrite the following as a short 160 character poem: \${ meta } \` | ||
} |