Improve CJK font detection #1641

rviscomi · 2020-12-05T20:47:02Z

WebPageTest locally installs Noto fonts to better emulate how Android devices come preinstalled with it. This means that analyzing network activity for fonts would miss Noto.

My knee-jerk reaction is that it would improve our font detection to ignore locally installed Noto at the WebPageTest level, so we can measure the number and size of font requests over the network. But it's more complicated than that. If we tested on native mobile hardware rather than emulated, using the locally installed fonts would be technically correct.

For example, if a website uses Helvetica, testing on Windows vs Mac would affect whether that font appears as a system vs web font, assuming it doesn't fall back to other system fonts like Arial or sans-serif.

This is something we should discuss and improve for 2021.

rsheeter · 2020-12-05T23:49:26Z

Do we have enough data on @font-face use of local(), the breakdown of users by (browser, OS+version), and what system fonts exist on each OS to do some napkin math on how significantly we believe we might be skewing the result?

rviscomi · 2020-12-06T00:03:04Z

Do we have enough data on @font-face use of local()

Yes, we have more data than we know what to do with when it comes to stylesheet contents, and that's after having written 100+ queries for the CSS chapter! It's a bit tougher to extract granular data like local fonts within @font-face declarations, but possible.

, the breakdown of users by (browser, OS+version), and what system fonts exist on each OS to do some napkin math on how significantly we believe we might be skewing the result?

Not from HTTP Archive data (see the Methodology for more info). The dataset is based on the Chrome UX Report, which does include a coarse phone/tablet/desktop breakdown, but only includes usage from non-iOS Chrome browsers.

rviscomi · 2021-04-28T04:06:51Z

Since we're primarily interested in detecting web fonts by their network log, I'm inclined to explore the option of disabling the WPT functionality that emulates native mobile system fonts. This would arguably add unrealistic bytes and load time to the page, but I think the advantages outweigh it. @rsheeter do you agree with this approach?

@pmeenan how much flexibility do we have to turn off system fonts like Noto in WPT? Are there any other special case fonts like that that may be skewing our font analysis?

rsheeter · 2021-04-28T04:46:07Z

I worry that if we disable system font emulation entirely that might change the results enough to care. That makes me think rather than immediately disabling system font emulation we should try to estimate to see what impact this system font emulation is having.

Or, even better perhaps, run an experiment where we gather a given runs data a second time with native font emulation disabled and see if the results look alarmingly different?

tunetheweb · 2021-04-28T07:09:39Z

With Google Fonts dropping use of local(), it might not make as much difference as it used to...

pmeenan · 2021-04-28T14:27:28Z

FWIW, it's not "emulation". The Noto fonts are installed on the VM's. The only way to disable them at the system level would be to completely uninstall them.

rsheeter · 2021-04-28T18:48:15Z

Google Fonts dropping use of local()

We still issue it in specific high traffic cases such as Android Roboto

rsheeter · 2021-04-28T18:51:44Z

The Noto fonts are installed on the VM's

If we install only Noto, as opposed to say the exact fonts available on some version of Android, that's likely going to over-represent Android's other system fonts. Less of an issue for iOS as users don't usually fetch those fonts over the network.

Maybe we should back up and ask what environment the VM is meant to match? - initially I thought Android but now I'm less sure.

pmeenan · 2021-04-28T19:33:51Z

Specifically it is

ttf-mscorefonts-installer fonts-noto fonts-roboto fonts-open-sans

https://github.com/WPO-Foundation/wptagent-install/blob/master/debian.sh#L125

We test both desktop and mobile from the same VM's so it is a mix of Windows, Android and CJK fonts. The goal at the time was to have a representative set of fonts that users in the relevant countries would likely have installed on their systems so we don't over-represent the font bytes downloaded when local fallbacks are used and frequently available.

rsheeter · 2021-04-29T04:08:16Z

My gut reaction is that sounds reasonable. We could try to have different VMs that install different fonts to approximate different environments but I'm guessing that would be a significant nuisance.

It would be very interesting to know how much this is influencing the result. Can we tell from the archive data when a font resolves to a local font? If not I suppose an experiment might be needed?

rviscomi · 2021-04-29T21:27:36Z

Can we tell from the archive data when a font resolves to a local font?

This might be good enough for font-related analysis.

We could also scan all CSS for @font-face declarations but I'm not sure if that'd have too many false positives for sites that never use the font.

davelab6 · 2024-09-23T20:28:41Z

@bramstein do you know the latest status on this? cc @charlesberret

rviscomi added the analysis Querying the dataset label Dec 5, 2020

rviscomi added this to the 2020 Backlog milestone Dec 5, 2020

rviscomi added the question Further information is requested label Dec 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve CJK font detection #1641

Improve CJK font detection #1641

rviscomi commented Dec 5, 2020

rsheeter commented Dec 5, 2020

rviscomi commented Dec 6, 2020

rviscomi commented Apr 28, 2021

rsheeter commented Apr 28, 2021

tunetheweb commented Apr 28, 2021

pmeenan commented Apr 28, 2021

rsheeter commented Apr 28, 2021

rsheeter commented Apr 28, 2021

pmeenan commented Apr 28, 2021 •

edited

Loading

rsheeter commented Apr 29, 2021

rviscomi commented Apr 29, 2021

davelab6 commented Sep 23, 2024

Improve CJK font detection #1641

Improve CJK font detection #1641

Comments

rviscomi commented Dec 5, 2020

rsheeter commented Dec 5, 2020

rviscomi commented Dec 6, 2020

rviscomi commented Apr 28, 2021

rsheeter commented Apr 28, 2021

tunetheweb commented Apr 28, 2021

pmeenan commented Apr 28, 2021

rsheeter commented Apr 28, 2021

rsheeter commented Apr 28, 2021

pmeenan commented Apr 28, 2021 • edited Loading

rsheeter commented Apr 29, 2021

rviscomi commented Apr 29, 2021

davelab6 commented Sep 23, 2024

pmeenan commented Apr 28, 2021 •

edited

Loading