Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CJK font detection #1641

Open
rviscomi opened this issue Dec 5, 2020 · 12 comments
Open

Improve CJK font detection #1641

rviscomi opened this issue Dec 5, 2020 · 12 comments
Labels
analysis Querying the dataset question Further information is requested
Milestone

Comments

@rviscomi
Copy link
Member

rviscomi commented Dec 5, 2020

WebPageTest locally installs Noto fonts to better emulate how Android devices come preinstalled with it. This means that analyzing network activity for fonts would miss Noto.

My knee-jerk reaction is that it would improve our font detection to ignore locally installed Noto at the WebPageTest level, so we can measure the number and size of font requests over the network. But it's more complicated than that. If we tested on native mobile hardware rather than emulated, using the locally installed fonts would be technically correct.

For example, if a website uses Helvetica, testing on Windows vs Mac would affect whether that font appears as a system vs web font, assuming it doesn't fall back to other system fonts like Arial or sans-serif.

This is something we should discuss and improve for 2021.

@rviscomi rviscomi added the analysis Querying the dataset label Dec 5, 2020
@rviscomi rviscomi added this to the 2020 Backlog milestone Dec 5, 2020
@rviscomi rviscomi added the question Further information is requested label Dec 5, 2020
@rsheeter
Copy link

rsheeter commented Dec 5, 2020

Do we have enough data on @font-face use of local(), the breakdown of users by (browser, OS+version), and what system fonts exist on each OS to do some napkin math on how significantly we believe we might be skewing the result?

@rviscomi
Copy link
Member Author

rviscomi commented Dec 6, 2020

Do we have enough data on @font-face use of local()

Yes, we have more data than we know what to do with when it comes to stylesheet contents, and that's after having written 100+ queries for the CSS chapter! It's a bit tougher to extract granular data like local fonts within @font-face declarations, but possible.

, the breakdown of users by (browser, OS+version), and what system fonts exist on each OS to do some napkin math on how significantly we believe we might be skewing the result?

Not from HTTP Archive data (see the Methodology for more info). The dataset is based on the Chrome UX Report, which does include a coarse phone/tablet/desktop breakdown, but only includes usage from non-iOS Chrome browsers.

@rviscomi
Copy link
Member Author

Since we're primarily interested in detecting web fonts by their network log, I'm inclined to explore the option of disabling the WPT functionality that emulates native mobile system fonts. This would arguably add unrealistic bytes and load time to the page, but I think the advantages outweigh it. @rsheeter do you agree with this approach?

@pmeenan how much flexibility do we have to turn off system fonts like Noto in WPT? Are there any other special case fonts like that that may be skewing our font analysis?

@rsheeter
Copy link

I worry that if we disable system font emulation entirely that might change the results enough to care. That makes me think rather than immediately disabling system font emulation we should try to estimate to see what impact this system font emulation is having.

Or, even better perhaps, run an experiment where we gather a given runs data a second time with native font emulation disabled and see if the results look alarmingly different?

@tunetheweb
Copy link
Member

With Google Fonts dropping use of local(), it might not make as much difference as it used to...

@pmeenan
Copy link
Member

pmeenan commented Apr 28, 2021

FWIW, it's not "emulation". The Noto fonts are installed on the VM's. The only way to disable them at the system level would be to completely uninstall them.

@rsheeter
Copy link

Google Fonts dropping use of local()

We still issue it in specific high traffic cases such as Android Roboto

@rsheeter
Copy link

The Noto fonts are installed on the VM's

If we install only Noto, as opposed to say the exact fonts available on some version of Android, that's likely going to over-represent Android's other system fonts. Less of an issue for iOS as users don't usually fetch those fonts over the network.

Maybe we should back up and ask what environment the VM is meant to match? - initially I thought Android but now I'm less sure.

@pmeenan
Copy link
Member

pmeenan commented Apr 28, 2021

Specifically it is

ttf-mscorefonts-installer fonts-noto fonts-roboto fonts-open-sans

https://github.com/WPO-Foundation/wptagent-install/blob/master/debian.sh#L125

We test both desktop and mobile from the same VM's so it is a mix of Windows, Android and CJK fonts. The goal at the time was to have a representative set of fonts that users in the relevant countries would likely have installed on their systems so we don't over-represent the font bytes downloaded when local fallbacks are used and frequently available.

@rsheeter
Copy link

My gut reaction is that sounds reasonable. We could try to have different VMs that install different fonts to approximate different environments but I'm guessing that would be a significant nuisance.

It would be very interesting to know how much this is influencing the result. Can we tell from the archive data when a font resolves to a local font? If not I suppose an experiment might be needed?

@rviscomi
Copy link
Member Author

Can we tell from the archive data when a font resolves to a local font?

This might be good enough for font-related analysis.

We could also scan all CSS for @font-face declarations but I'm not sure if that'd have too many false positives for sites that never use the font.

@davelab6
Copy link

@bramstein do you know the latest status on this? cc @charlesberret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Querying the dataset question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants