Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML is downloaded instead of PDF #1

Open
luckylittle opened this issue Feb 25, 2019 · 2 comments
Open

HTML is downloaded instead of PDF #1

luckylittle opened this issue Feb 25, 2019 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@luckylittle
Copy link
Owner

Random links do not get redirected to the PDF, but rather to a website - it seems like protection against bots. This is the content of the website:

Almost There!

Your Download of {{ refcardz.title }} is Pending

The fix is probably to extract real Location of each PDF, like:
https://dzone.com/storage/assets/2805-rc001-gwt_style_online.pdf

instead of the origin:
https://dzone.com/asset/download/6

@luckylittle luckylittle added the invalid This doesn't seem right label Feb 25, 2019
@luckylittle luckylittle self-assigned this Feb 25, 2019
@luckylittle luckylittle added bug Something isn't working and removed invalid This doesn't seem right labels Feb 26, 2019
@luckylittle
Copy link
Owner Author

An example of the interstitial page randomly encountered when trying downloads:
https://dzone.com/interstitial?asset=2686617&item=333545

Skip to Download request URL:
https://dzone.com/services/internal/action/campaigns-trackClick

It has sent this kind of POST payload to it to continue:
{"item":333545,"type":"rejected","referral":"Web"}

There is also https://dzone.com/services/internal/action/analytics-sendEvent URL, but i don't believe it is required. I will soon test it.

@luckylittle
Copy link
Owner Author

After some further research, it's clear this will not be that easy. The interstitial's page button actually executes JavaScript (e.g. https://dz2cdn2.dzone.com/storage/pub/11369473-combined.js) on the Skip to Download button:

<a href="#" ng-click="download(campaign.itemId, 'rejected')" class="ng-binding">Skip to Download</a>

The click function is:

c.on("click",function(a,c){b.$apply(function(){k(b,{$event:c||a})})});

Go Colly does not support JavaScript execution.

One option would be to use chromedp. This example demonstrates how to evaluate JavaScript and retrieve the result.

Investigating options.

luckylittle added a commit that referenced this issue Mar 12, 2019
luckylittle added a commit that referenced this issue Mar 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant