Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-configurable deletions for content normalization #21

Open
ryanfb opened this issue Jan 19, 2017 · 3 comments
Open

User-configurable deletions for content normalization #21

ryanfb opened this issue Jan 19, 2017 · 3 comments

Comments

@ryanfb
Copy link
Contributor

ryanfb commented Jan 19, 2017

It might be nice for users to be able to put an array of strings or regexes in config.yaml that can be used to normalize content before diffing.

For example, I could put 'Scroll down for video' in for deletion for dailymail_diff, or with regexes globemail_diff might be able to remove stock price changes.

Related to #10, there might be a tradeoff for where to put such an array in the YAML hierarchy. Putting it as a top-level key would mean less repetition for people using one config per news source, putting it as a key under each feed would allow people using one config for multiple news sources to have different ones for each.

See also: #14

@ruebot
Copy link
Member

ruebot commented Jan 19, 2017

Happens a lot for hockey scores on CBC and La Presse 😄

Might kinda be related to #7 too? Or that just might be TorStar's wretched "digital platform".

@edsu
Copy link
Member

edsu commented Apr 4, 2017

I had to disable breitbart_diff because diffengine went crazy tweeting when they removed their email subscription link from the body of the story. So this is kind of an important feature to add.

@ruebot
Copy link
Member

ruebot commented Apr 4, 2017

Yeah, canadaland_diff did something similar to that recently, and I have a lot of false positives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants