Crawlers appear to be failing: items in /search/ are out of date #58

ghost · 2014-09-03T16:32:36Z

We've now received three separate reports of this; @dirkbaechle has noted that his SCons bugs are not yet imported, I've noticed that the OpenHatch move from Roundup to GitHub Issues isn't reflected by what's in /search/, and we got an email report from Michael Crusoe at Khmer that their bugs haven't been imported yet despite setting things up in /customs/.

Given the dates of the various reports and the bugs viewable in /search/, I think our bug crawler has been failing for at least a month now. A request in #1010 would make the last crawl status per project viewable on /customs/, but really, we need to get things up to date and figure out what's wrong before we start adding new features.

I've scheduled some time for @paulproteus and I to take a look at this on the weekend. Hopefully we can get this fixed soon.

ehashman · 2014-09-06T18:18:47Z

http://inside.openhatch.org/crawl-logs/scrapy.2014-09-06.bkCH.log

This terrible bug crash death thing may have something to do with this.

paulproteus · 2014-09-06T18:33:42Z

I'm basically sure that the problem was that in our crawl script, curl would wait 1 minute for openhatch.org to generate a bugimporters configuration file, and then after 1 minute, it timed out.

I'm not sure if CloudFlare was making this more interesting; don't know.

But anyway, it works now.

But it'll likely fail tomorrow in the same way, because the caches will be cold. Generating this bugimporters configuration file is extremely slow, and unless we make change to oh-mainline it's going to remain really slow, so it's going to time out periodically.

We have some retry logic in oh-mainline:run_bugimporters.sh (iirc) that retries if the curl fails due to a network error. We could use some retry logic later in the process, and that could probably save us.

ehashman · 2014-09-06T19:03:21Z

Sometimes, we get a CloudFlare error instead of a yaml response; we could make sure we're not accepting that as input to parse if we checked to make sure the first line of the file isn't <!DOCTYPE html> or similar.

dirkbaechle · 2014-09-06T19:15:09Z

Just as an idea: maybe wrapping the run_importer.sh script and calling it for each tracker id explicitly would help? I'm not sure that I really got the big picture of your current bug importing toolchain, but to me it looks as if the underlying database on openhatch.org is the bottleneck, while trying to compile the infos for all known bug trackers at once...

paulproteus · 2014-09-06T19:22:58Z

@dirkbaechle, that is at least a semi-reasonable idea.

The downside is then we'd have to do N requests from the API, which is kind of tragic. But it's a pretty reasonable idea.

We could also use a Python profiler and figure out exactly what makes the call to this API endpoint slow.

paulproteus · 2014-09-06T19:25:43Z

Also, let me say that it is great to see you here, @dirkbaechle !

dirkbaechle · 2014-09-06T19:28:53Z

Sure, I just saw Elana's and your comments and am very interested in this issue...still trying to get things going for the SCons bug tracker. ;)
But I can't really help, or can I?

ehashman · 2014-09-06T19:56:57Z

I think this is fixed now. @paulproteus manually started a scrape (or three) today, and in the future, things should run happily on cron. @dirkbaechle, let's check things tonight or tomorrow and ensure everything is working!

dirkbaechle · 2014-09-06T20:16:01Z

Okay, and thanks for the info.

ehashman · 2014-09-09T22:04:13Z

I'm going to reopen this, as crawlers continue to fail.

@paulproteus, can you take a look at this log (warning 2.5MB: I manually ran this crawl with run_importers.sh, mostly successful with some errors crawl), this one, and this one (automated failing crawls).

sunu assigned paulproteus Sep 3, 2014

sunu added bugimporters labels Sep 3, 2014

ehashman removed the bugimporters label Sep 5, 2014

ehashman added the stat:in-progress label Sep 6, 2014

paulproteus mentioned this issue Sep 6, 2014

Detect more bugimporters api errors openhatch/oh-mainline#1410

Merged

ehashman closed this as completed Sep 6, 2014

ehashman reopened this Sep 9, 2014

ehashman added stat:reopened and removed stat:in-progress labels Sep 9, 2014

dirkbaechle mentioned this issue Jan 4, 2015

SCons: Bug crawling process still throws errors due to empty issue descriptions #108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawlers appear to be failing: items in /search/ are out of date #58

Crawlers appear to be failing: items in /search/ are out of date #58

ghost commented Sep 3, 2014

ehashman commented Sep 6, 2014

paulproteus commented Sep 6, 2014

ehashman commented Sep 6, 2014

dirkbaechle commented Sep 6, 2014

paulproteus commented Sep 6, 2014

paulproteus commented Sep 6, 2014

dirkbaechle commented Sep 6, 2014

ehashman commented Sep 6, 2014

dirkbaechle commented Sep 6, 2014

ehashman commented Sep 9, 2014

Crawlers appear to be failing: items in /search/ are out of date #58

Crawlers appear to be failing: items in /search/ are out of date #58

Comments

ghost commented Sep 3, 2014

ehashman commented Sep 6, 2014

paulproteus commented Sep 6, 2014

ehashman commented Sep 6, 2014

dirkbaechle commented Sep 6, 2014

paulproteus commented Sep 6, 2014

paulproteus commented Sep 6, 2014

dirkbaechle commented Sep 6, 2014

ehashman commented Sep 6, 2014

dirkbaechle commented Sep 6, 2014

ehashman commented Sep 9, 2014