Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I suppose this is the end of packtpub-crawler? #70

Open
lucymhdavies opened this issue May 26, 2017 · 27 comments
Open

I suppose this is the end of packtpub-crawler? #70

lucymhdavies opened this issue May 26, 2017 · 27 comments

Comments

@lucymhdavies
Copy link

lucymhdavies commented May 26, 2017

https://www.packtpub.com/packt/offers/free-learning

screen shot 2017-05-26 at 10 21 35

@juzim
Copy link
Contributor

juzim commented May 26, 2017

They have done it before as part of some a/b tests, hopefully they revert it back after the stats drop (I don't think people manually check the site every day).

Maybe we can contact them, since this script turns a daily chore into a pleasant experience and all their free books are already downloadable from other sources anyways.

But otherwise, we can't do much about it

@deliussed
Copy link

oh no! just started implementing this script with the packtpub Alexa skill yesterday! How frustrating!

@juzim
Copy link
Contributor

juzim commented Jun 1, 2017

I have added the book title and the claim URL in the error messages, this way we can at least check if the book is interesting enough to claim it manually. #71

Still, this is a really stupid move, I immediately lost all interest in visiting packtpub :/

@lucymhdavies
Copy link
Author

That's a useful feature at least. Shame we can't automatically claim them anymore :(

@lucymhdavies
Copy link
Author

going to close this, as #71 has now been merged

@niqdev
Copy link
Owner

niqdev commented Jun 2, 2017

I have created a new branch with a proposal, I don't know if is worth it spend time.

I have fixed the claim, looking at the docs the recaptcha-token field should always be available in the page, but needs to be validated by the client and can be used only once. If you solve the captcha manually and plug the token here you are able to download the book.
If you run the script with an invalid captcha it will download the latest book claimed with the wrong title.

Would be interesting, just for fun, to try to de-couple the claim from the rest, solving only the captcha via mail 😊

By the way, this document (although I think is already obsolete) is an alternative, but I don't think should be the way to go 😞

@niqdev
Copy link
Owner

niqdev commented Jun 20, 2017

Since we have duplicated issues #75 #76 related to this one I will re-open it.

The problem is related to the captha and the error looks like this

[-] <type 'exceptions.IndexError'> list index out of range | spider.py@97
Traceback (most recent call last):
  File "script/spider.py", line 97, in main
    packtpub.runDaily()
  File "/home/ubuntu/Projects/github/packtpub-crawler/script/packtpub.py", line 161, in runDaily
    self.__parseDailyBookInfo(soup)
  File "/home/ubuntu/Projects/github/packtpub-crawler/script/packtpub.py", line 93, in __parseDailyBookInfo
    self.info['url_claim'] = self.__url_base + div_target.select('a.twelve-days-claim')[0]['href']
IndexError: list index out of range

There is a a feature branch with a proposal, but it could be a black hole!

@develsites
Copy link

develsites commented Jun 20, 2017

@niqdev really there is the problem with captcha, still, it doesn't work. Maybe implement it by using two steps with opened the page? as one more option

@niqdev
Copy link
Owner

niqdev commented Jun 22, 2017

@develsites yep that was the idea/proposal in the feature branch, 2 step process solving the captcha manually via email for example, but unfortunately yes at the moment the script is broken and we can't do much

@lucymhdavies
Copy link
Author

Honestly, if you have to solve the captcha manually anyway, then you may as well just go to https://www.packtpub.com/packt/offers/free-learning and claim it manually.

Packtpub-crawler is still useful for notifying what the latest book is though :)

@Nightreaver
Copy link

i had no captcha today... is it an error or did they remove it? Claiming still worked

@niqdev
Copy link
Owner

niqdev commented Jul 17, 2017

umh, something changed for sure, the reCAPTCHA moved to the bottom-right of the page.
Were you able to download the book with the script?

@tpoindessous
Copy link

tpoindessous commented Jul 17, 2017 via email

@brechtm
Copy link

brechtm commented Jul 18, 2017

The CAPTCHA has not yet returned, but the script fails to claim the book with IndexError('list index out of range',).

@Nightreaver
Copy link

Nightreaver commented Jul 24, 2017

yeah, you dont have to "do" anything for the captcha to work... maybe it detects the browser or something?
For me, using chrome, it just works. no box, nothing, but blocking google prints the error "no captcha" or whatever

its new kind of captcha from google?

"insible recaptcha" - https://developers.google.com/recaptcha/docs/invisible

@juzim
Copy link
Contributor

juzim commented Jul 25, 2017 via email

@luk6xff
Copy link

luk6xff commented Aug 22, 2017

Hello, we have managed to solve the captcha to make my script-grabber working, You can use the same solution or check mine at: https://github.com/igbt6/Packt-Publishing-Free-Learning
Regards!

@niqdev
Copy link
Owner

niqdev commented Aug 23, 2017

@igbt6 That's awesome, thanks a lot for sharing with us!

@katka-n
Copy link

katka-n commented Oct 6, 2017

@niqdev I managed to get my Packt grabber working by using Selenium in headless mode AND setting useragent to Chrome (default for headless Chrome is, if I recall correctly, WebdriverChrome).

@niqdev
Copy link
Owner

niqdev commented Oct 6, 2017

@katka-n great! is it easy to integrate with the current project?

@niqdev
Copy link
Owner

niqdev commented Oct 6, 2017

@Hacktoberfest Anyone interested in integrating Anti Captcha or other solutions? Thanks

@katka-n
Copy link

katka-n commented Oct 6, 2017

@niqdev I am not that experienced but I will try to do so, if I succeed I will create a pull request ;)

Update: I got the basic downloading to the user's account working, but the script stops at downloading a file to the drive.

@tjnel
Copy link

tjnel commented Oct 26, 2017

here is a python solution for the recaptcha https://github.com/ecthros/uncaptcha

@niqdev
Copy link
Owner

niqdev commented Oct 27, 2017

Thanks @tjadanel , any interest in integrate it?

@justingiffard
Copy link

I see that they have removed the recaptcha batch from the site? could this mean that recaptcha is removed?
I tried running the script and got list index out of range which either means that recaptcha is still in place or that the structure of the site has changed. Will investigate though. If you don't hear from me either I haven't gotten anywhere or recaptcha is still in place

@luk6xff
Copy link

luk6xff commented Jan 21, 2018

@justingiffard There is still reCaptcha used by Packt, They just switched to so called invisible reCaptcha. Use my script instead: https://github.com/igbt6/Packt-Publishing-Free-Learning which will do the work for you ; )

@justingiffard
Copy link

@igbt6 thanks but you make use of a service which is not free (albeit cheap)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests