Skip to content

Generating a filtered full text RSS feed

Irfan Charania edited this page Oct 7, 2015 · 11 revisions

Generating a filtered full-text RSS feed from an existing RSS feed

Problem: Your favourite website has a feed but you only wish to read certain types of posts, and their feed only provides summarized text

Solution: Use Huginn to create an RSS feed that has filtered full-text content. The workflow to filter posts and fetch full text is as follows:

  1. RssAgent - to fetch and parse existing RSS feed
  2. TriggerAgent - to filter feed items
  3. WebsiteAgent - to fetch full text for feed item
  4. DataOutputAgent - to output RSS

Examples based on Adventures of Business Cat

1. RssAgent

Name: Example RSS In

{
  "expected_update_period_in_days": "14",
  "clean": "false",
  "url": "http://www.businesscat.happyjar.com/feed/"
}

2. TriggerAgent

Name: Example filter
Event sources: Example RSS In
Propagate immediately: Yes

{
  "expected_receive_period_in_days": "14",
  "keep_event": "true",
  "rules": [
    {
      "type": "regex",
      "value": ".*\\/comic\\/.*",
      "path": "url"
    }
  ]
}

Note: "keep_event": "true" helps pass on original parsed item elements to next agent

3. WebsiteAgent

Name: Example page fetch
Event sources: Example filter
Propagate immediately: Yes

{
  "expected_update_period_in_days": "14",
  "url": "{{url}}",
  "type": "html",
  "mode": "merge",
  "extract": {
    "imgurl": {
      "css": "#comic img",
      "value": "@src"
    }
  }
}

Note: "mode": "merge" helps pass on original parsed item elements to next agent

4. DataOutputAgent

Name: Example Rss out
Event sources: Example page fetch
Propagate immediately: Yes

{
  "secrets": [
    "examplerss"
  ],
  "expected_receive_period_in_days": "14",
  "template": {
    "title": "Business Cat full comic feed",
    "description": "This is a feed of recent Business Cat comics generated by Huginn",
    "item": {
      "title": "{{title}}",
      "description": "<img src=\"{{imgurl}}\" />",
      "link": "{{url}}",
      "pubDate": "{{pubDate}}"
    }
  }
}
Clone this wiki locally