-
Notifications
You must be signed in to change notification settings - Fork 100
Playwright - How to combine the adblocker with other, simpler blocking rules? #2333
Replies: 1 comment · 4 replies
-
Hi @paul-norman, In general it is a bit tricky to combine the const { chromium } = require('playwright-chromium');
const { PlaywrightBlocker } = require('@cliqz/adblocker-playwright');
const fetch = require('cross-fetch');
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
// Comment this block and images can be blocked...
PlaywrightBlocker.fromPrebuiltAdsAndTracking(fetch).then((blocker) => {
blocker
.blockFonts();
.blockImages();
.enableBlockingInPage(page);
});
await page.goto('https://duckduckgo.com/?q=butterflies', { waitUntil: 'load' }); The above should work (I did not test it). There are a few more methods like |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks for your reply, Rémi! Very useful! I have tried as you suggested, and it works for some parts, but not for others?! I decided to give it a run using my test case to see what was going on and output the completed requests for the three cases (no request interception, adblocker enabled, naive resource type blocking) : const { chromium } = require('playwright-chromium');
const { PlaywrightBlocker } = require('@cliqz/adblocker-playwright');
const fetch = require('cross-fetch');
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext(
{
geolocation: {
latitude: 51.507602,
longitude: -0.127816,
accuracy: 10,
},
locale: 'en-GB',
timezoneId: 'Europe/London',
viewport: { width: 1344, height: 768 },
ignoreHTTPSErrors: true,
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36',
}
);
const page = await context.newPage();
let total_bytes = 0;
let total_requests = 0;
const intercept_requests = true;
const enable_adblocker = true;
const resource_exclusions = ['image', 'font', 'stylesheet', 'media', 'texttrack', 'manifest', 'other'];
if(intercept_requests) {
if(enable_adblocker) {
PlaywrightBlocker.fromPrebuiltAdsAndTracking(fetch).then((blocker) => {
if(resource_exclusions.includes('image')) {
blocker.blockImages();
}
if(resource_exclusions.includes('font')) {
blocker.blockFonts();
}
if(resource_exclusions.includes('stylesheet')) {
blocker.blockStyles();
}
if(resource_exclusions.includes('script')) {
blocker.blockScripts();
}
if(resource_exclusions.includes('media')) {
blocker.blockMedias();
}
blocker.enableBlockingInPage(page);
});
} else {
page.route('**/*', route => {
if(resource_exclusions.includes(route.request().resourceType())) {
console.log('--route blocked: `' + route.request().url() + '` (' + route.request().resourceType() + ')');
return route.abort();
}
return route.continue();
});
}
}
page.on('pageerror', (error) => {
console.log('CHROME CONSOLE ERROR - ' + error.message);
});
page.on('requestfinished', async request => {
const response = await request.response();
const status = response.status();
const sizes = await request.sizes();
const bytes = (sizes.requestBodySize + sizes.requestHeadersSize + sizes.responseBodySize + sizes.responseHeadersSize);
total_bytes += bytes;
++total_requests;
console.log('--request finished: ' + request.url() + '` (' + status + ': ' + request.resourceType() + ' - ' + format_bytes(bytes) + ')');
});
await page.goto('https://duckduckgo.com/?q=butterflies', { waitUntil: 'load' });
await page.waitForTimeout(3000);
console.log('--COMPLETE: ' + total_requests + ' requests finished (' + format_bytes(total_bytes) + ')');
function format_bytes(bytes, decimals = 1) {
if(bytes === 0) {
return '0 Bytes';
}
const k = 1024;
const sizes = ['Bytes', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'];
const i = Math.floor(Math.log(bytes) / Math.log(k));
decimals = decimals < 0 ? 0 : decimals;
return parseFloat((bytes / Math.pow(k, i)).toFixed(decimals)) + ' ' + sizes[i];
} And the results (some lines reordered for legibility):
But for some reason this doesn't block the fonts or the CSS files that were requested to be blocked:
Sadly this doesn't block the tracking related parts:
Ideally, I'd still like to have a function within my control to block requests before they're initialised because for many sites I might want to manually pass in a list of resources to block (e.g. certain scripts or images)... Is there anything that can be done in this case? |
Beta Was this translation helpful? Give feedback.
All reactions
-
It is weird that some of the resources are not blocked. If you want to know exactly how to replace the There are multiple aspects to it and currently it is not designed to be called with either custom handlers of requests, or only the element hiding part, for example. If you only care about network blocking then it should be easier and you can replace The code above should be all you need to use Maybe one last note about this: "within my control to block requests before they're initialized because for many sites I might want to manually pass in a list of resources to block (e.g. certain scripts or images)". I think all of this should be possible to handle with the blocker engine itself without any external logic, by creating new rules and updating the blocker with them. If you'd be interested in knowing more about it I am happy to give you more pointers. |
Beta Was this translation helpful? Give feedback.
All reactions
-
After having messed with this (and learned enough Node syntax to extend classes), it was very simple to replace the await PlaywrightBlocker.fromPrebuiltAdsAndTracking(fetch).then((blocker) => { The README.md file should probably be updated to reflect this. I would certainly be interested in being pointed in the right direction to add temporary rules to the adblocker engine for the duration of a single Playwright context (i.e. not being applied to all future sessions). That would be very helpful indeed! |
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
The following resources would be a good way to learn more about how to create your own rules:
And you can see which of the specific syntax are supported by the Then you can always update the blocker instance with something like (you can specify a list of rules as an array: blocker.updateFromDiff({
added: ["/some/filter/$css"],
}) The syntax of rules is pretty flexible so in general you should be able to update the blocker to block/unblock specific requests instead of creating your own listener in Playwright. |
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1 -
🎉 1
-
Is it possible to use your own
page.route
rule to intercept requests while also using the adblocker?For example, I would like to block all fonts and images because they are of no use to me (they simply consume bandwidth):
Unless the adblocker lines are commented out, the images all load (and in fact never pass through the
page.route
function).I've only been using Playwright for a few days and am new to Node too, so it is quite possibly that I am doing something fundamentally wrong, but there doesn't seem to be another way to abort requests in Playwright (whereas
request.abort()
in thepage.on('request')
could be used in Puppeteer).Can anyone shed some light on this use case?
Thanks in advance, and thanks for making a great tool that can be plugged in so easily!
Beta Was this translation helpful? Give feedback.
All reactions