Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add date of last successfull rss parsing #582

Open
tdziurko opened this issue Feb 24, 2020 · 9 comments · May be fixed by #766
Open

Add date of last successfull rss parsing #582

tdziurko opened this issue Feb 24, 2020 · 9 comments · May be fixed by #766

Comments

@tdziurko
Copy link
Collaborator

Some RSS feeds are incorrect (not a valid RSS), some simply stops working over time (domain expiry, etc.) so we should be able to find such dead/invalid entries and investigate them to notify the author or disable them from the system.

So it would be nice to have a field with information when it was the last time we were able to parse given RSS feed successfully.

@tdziurko
Copy link
Collaborator Author

@mirogordon is on it :)

@malochak
Copy link

Is it still open?

@tdziurko
Copy link
Collaborator Author

Yes, you can work on it :)

@malochak
Copy link

@tdziurko I went through contributing, code-style importing, and initialisation related stuff.

I've started looking at the code, I think I found a place where the update should be done. However, I'm not sure if the solution is correct and some questions also appeared in my head :)

  1. I assume that BlogPostsFetcher#refreshPosts is the entry point. It might be called manually from the admin panel and it is also triggered by the scheduler in given time intervals.
  2. SyndFeedProducer#createFor is the place where the RSS feed is tried to be fetched by all available fetchers (HTTP, HTTPS, WGET...), then the XML is parsed. I guess that more than one fetcher can finish job successfully.
  3. So, in SyndFeedProducer#createFor I could do something like this (forgive me for any mistakes - it's my first contact with vavr):
public void checkForNewEntries(Blog person) {
  log.info("Checking for new entries at {} by {}.", person.getRss(), person.getAuthor());

  syndFeedFactory.createFor(person.getRss())
      .toJavaList()
      .stream()
      .peek(feed -> {
        log.info("For author {} found {} entries in rss feed", person.getAuthor(),
            feed.getEntries().size());
        feed.getEntries().forEach(post -> {
          blogPostService.addOrUpdate(new RssEntryWithAuthor(person, post));
        });
      }).findAny().ifPresent(feed -> {
        log.info("For author {} updated date last fetched to {}", person.getAuthor(),
            LocalDateTime.now());
        blogRepository.updateDateLastFetched(LocalDateTime.now(), person.getId());
      });
}
  1. We also need a new field that will keep information about last successful fetch
  2. Because of 4th point we need also DB migration - not sure how to prepare such, is src/main/resources/db/changelog the place that I should look at?
  3. unit tests obviously :)

@tdziurko
Copy link
Collaborator Author

Hello @malochak

  1. Assumption correct :)
  2. I don't think it could be executed more than once. Parsers are executed in order and next one is called if the previous ones haven't succeeded. If I am clearly mistaken, please let me know :)
  3. More or less something like this, but I think if your point 2 is incorrect, it will look a little bit different.
    4 and 5: Yes, new field on Blog entity I guess and changelog dir is the please you are looking for.
  4. Please, do unit test :) Can be Spock, can be new JUnit, up to you.

@malochak
Copy link

@tdziurko You're right about the parsers part. There're tests that confirms that. So it means that I don't need to modify so much SingleRssChecker, just need to put the update call at the end of processing.

@malochak
Copy link

malochak commented Apr 1, 2023

@tdziurko I cannot push my branch with fix

ERROR: Permission to jvm-bloggers/jvm-bloggers.git denied to malochak.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

@airborn
Copy link
Collaborator

airborn commented Apr 1, 2023

Please fork the repo, push to your own fork and create a pull request from there

@malochak
Copy link

malochak commented Apr 1, 2023

Please fork the repo, push to your own fork and create a pull request from there

Thank's, you're right. I've just cloned the repo. Never contributed to open source project, thought I'm just allowed to push a branch and then create PR 🤦‍♂️

malochak added a commit to malochak/jvm-bloggers that referenced this issue Apr 1, 2023
malochak added a commit to malochak/jvm-bloggers that referenced this issue Apr 1, 2023
@malochak malochak linked a pull request Apr 1, 2023 that will close this issue
malochak added a commit to malochak/jvm-bloggers that referenced this issue Jan 2, 2024
Bring back order of imports in Blog.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants