Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script module data implementation #61658

Merged
merged 12 commits into from
May 30, 2024
91 changes: 91 additions & 0 deletions lib/experimental/script-modules.php
Original file line number Diff line number Diff line change
Expand Up @@ -197,3 +197,94 @@ function gutenberg_dequeue_module( $module_identifier ) {
_deprecated_function( __FUNCTION__, 'Gutenberg 17.6.0', 'wp_dequeue_script_module' );
wp_script_modules()->dequeue( $module_identifier );
}


/**
* Print data associated with Script Modules in Script tags.
*
* This embeds data in the page HTML so that it is available on page load.
*
* Data can be associated with a given Script Module by using the
* `scriptmoduledata_{$module_id}` filter.
*
* The data for a given Script Module will be JSON serialized in a script tag with an ID
* like `wp-scriptmodule-data_{$module_id}`.
*/
function gutenberg_print_script_module_data(): void {
$get_marked_for_enqueue = new ReflectionMethod( 'WP_Script_Modules', 'get_marked_for_enqueue' );
$get_import_map = new ReflectionMethod( 'WP_Script_Modules', 'get_import_map' );
Comment on lines +214 to +215
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the best way to access the necessary data right now, these methods are private.

It's easier in the Core PR (WordPress/wordpress-develop#6433) where the action can invoke a class method.

Copy link
Member

@gziolo gziolo May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://core.trac.wordpress.org/ticket/60597 - they should be exposed as public methods as explained by @johnbillion. It's exactly what you encountered when trying to enhance the current implementation.

See also: https://core.trac.wordpress.org/ticket/60597#comment:9

Alrighty let's bump this. I should be able to work around this via the magic of the reflection API for now.

So I guess, it's a way to go here as well for now.


$modules = array();
foreach ( array_keys( $get_marked_for_enqueue->invoke( wp_script_modules() ) ) as $id ) {
Copy link
Contributor

@ajlende ajlende May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting an error on this line. Seems it's because I'm running PHP 7.4. We support down to 7.2 according to #60680.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll prepare a fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$modules[ $id ] = true;
}
foreach ( array_keys( $get_import_map->invoke( wp_script_modules() )['imports'] ) as $id ) {
$modules[ $id ] = true;
}

foreach ( array_keys( $modules ) as $module_id ) {
/**
* Filters data associated with a given Script Module.
*
* Script Modules may require data that is required for initialization or is essential to
* have immediately available on page load. These are suitable use cases for this data.
*
* This is best suited to a minimal set of data and is not intended to replace the REST API.
*
* If the filter returns no data (an empty array), nothing will be embedded in the page.
*
* The data for a given Script Module, if provided, will be JSON serialized in a script tag
* with an ID like `wp-scriptmodule-data_{$module_id}`.
*
* The dynamic portion of the hook name, `$module_id`, refers to the Script Module ID that
* the data is associated with.
*
* @param array $data The data that should be associated with the array.
*/
$data = apply_filters( "scriptmoduledata_{$module_id}", array() );

if ( is_array( $data ) && ! empty( $data ) ) {
/*
* This data will be printed as JSON inside a script tag like this:
* <script type="application/json"></script>
*
* A script tag must be closed by a sequence beginning with `</`. It's impossible to
sirreal marked this conversation as resolved.
Show resolved Hide resolved
* close a script tag without using `<`. We ensure that `<` is escaped and `/` can
* remain unescaped, so `</script>` will be printed as `\u003C/script\u00E3`.
*
* - JSON_HEX_TAG: All < and > are converted to \u003C and \u003E.
* - JSON_UNESCAPED_SLASHES: Don't escape /.
*
* If the page will use UTF-8 encoding, it's safe to print unescaped unicode:
*
* - JSON_UNESCAPED_UNICODE: Encode multibyte Unicode characters literally (instead of as `\uXXXX`).
* - JSON_UNESCAPED_LINE_TERMINATORS: The line terminators are kept unescaped when
* JSON_UNESCAPED_UNICODE is supplied. It uses the same behaviour as it was
* before PHP 7.1 without this constant. Available as of PHP 7.1.0.
*
* The JSON specification requires encoding in UTF-8, so if the generated HTML page
* is not encoded in UTF-8 then it's not safe to include those literals. They must
* be escaped to avoid encoding issues.
*
* @see https://www.rfc-editor.org/rfc/rfc8259.html for details on encoding requirements.
* @see https://www.php.net/manual/en/json.constants.php for details on these constants.
* @see https://html.spec.whatwg.org/#script-data-state for details on script tag parsing.
*/
$json_encode_flags = JSON_HEX_TAG | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_LINE_TERMINATORS;
if ( 'UTF-8' !== get_option( 'blog_charset' ) ) {
$json_encode_flags = JSON_HEX_TAG | JSON_UNESCAPED_SLASHES;
}

wp_print_inline_script_tag(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of using a script tag per module? That's closer to what we existed before, but it also means that a similar logic would need to be present to every possible module that wants to consume some data. What if several modules need the same data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question and it came up on the proposal. I did share some ideas there, but I'm going to do some profiling so I can share some numbers and better understand the tradeoffs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two main thoughts about this, one is related to usage and the interface, the other is related to performance. I'll break them into comments, here's the usage/interface bit:

What if several modules need the same data?

If we look at how inline scripts are used to provide data to Core Scripts, it doesn't seem like there's a lot of overlap. If things are well encapsulated and provide the functionality we need, we don't need to send the same data to many scripts. apiFetch takes care of the API, other scripts use apiFetch, and internally they don't need to worry about the REST URL because apiFetch deals with that.

If a lot of different things need to know about some piece of data, maybe they should be getting it from the REST API.

Finally, if we reach a point where a lot of different modules really do need to access the same data, and that data needs to be immediately available, is essential, or it is very specific to the current page view, then we could do that within this model pretty easily by a "provider module". Consider:

add_filter( 'scriptmoduledata_@package/data-provider', function ( $data ) { $data[ 'veryInteresting' ] = 'Indeed!'; return $data } );
// @package/data-provider
let data = null
try {
  const text = dom.getElementById( 'wp-scriptmodule-data_@wordpress/interactivity' )?.textContent;
  if ( text ) {
    data = JSON.parse( text );
  }
} catch {}

export { data };

// @package/my-other-module
import { data } from '@package/data-provider';
if ( data?.veryInteresting ) {
  console.log( "Interesting? %s", data?.veryInteresting ); // "Interesting? Indeed!"
}

Here the data provider module can parse the data once and make it available to other scripts that depend on it. A single module is responsible for reaching into the DOM element, parsing the data, heck maybe it parses it with a schema 🤷 , but that module is responsible for dealing with the data from the server and providing it to other modules. This also provides a nice, single filter for that data that is sent into this module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 2! I wanted to understand how the browser behaves differently in each of these cases. When there's very little data embedded in the page, it's unlikely to have any significant impact on the page.

I expected multiple script tags to perform better generally, but I was surprised by the results. I suspect there may be some unusual things going on with the large amount of memory in use.

In general, it seems like multiple script tags are parsed more quickly by the browser, the
individual JSONs are parsed faster, and even when parsing all of the JSON split up into multiple
script tags it outperforms a single script tag.

I'll do a part 3 with a smaller pages to see if these results are consistent.


I generated a few pages with the same data set either in a single script tag or divided in multiple script tags. The items in the data looked like this, each with 1000 properties:

{
  "878cac28b636400d": 0,
  "3dd5389abcb79f72": 1,
  "…": 1000
}

This was either in a single script tag or divided into multiple script tags:

<!-- single -->
<script id="single" type="application/json">
  {
    "1982e3a7bb241055": { "878cac28b636400d": 0, "…": 999 },
    "65cd25028f98f158": { "d0698444ed39c832": 0, "…": 999 }
    "…10,000 items": {}
  }
</script>

<!-- multiple -->
<script id="1982e3a7bb241055" type="application/json">
  { "878cac28b636400d": 0, "…": 999 }
</script>
<script id="65cd25028f98f158" type="application/json">
  { "d0698444ed39c832": 0, "…": 999 }
</script>
<script id="…10,000 tags" type="application/json">
  {}
</script>

The HTML was otherwise minimal. The pages contained a script module tag that would simulate getting embedded data from a script tag.

Some stats on the pages:

Scripts JSON total bytes HTML total bytes Script tags
Single 229,110,001 229,110,626 1
Multiple 228,897,112 229,550,577 1,000
difference -212,889 +439,951 +999

These were plain HTML pages stored on my computer and served via php -S localhost:9090.

I opened Chrome and did some page load performance profiling with a 6x CPU slowdown. I expected multiple to outperform single, but I was surprised by the results when I started testing so I added multiple-all to try to

  • Single - load the page, find the DOM element, parse its JSON and find a value (2 levels deep).
  • Multiple - load the page, find the DOM element, parse its JSON and find a value (1 levels deep).
  • Multiple-all - load the page, find all the script[type="application/json"] DOM elements, parse
    the JSON for each, get it's first key and add it to an array.
Scripts Network FCP DCL parse
Single 23.98s 27.25s 49.36s 21.45s
Multiple 5.711s 5.77s 5.77s 1.27ms
Multiple-all 5.681s 5.74s 23.04s ≈ 17.3s

I'm very surprised about the Network time, I'm not sure why the network request takes so long for
the single page. I tried several different servers, but it was consistent.

DCL (DOMContentLoaded) first after the module has executed, so at that point all parsing is
complete. We can clearly see that parsing all the data from a single JSON is very costly. I did not
calculate the sum of all the parse time on multiple-all, but I was surprised to see that the script
finishes much faster than the single script (23.04s - 5.74s ≈ 17.3s vs 21.45s) when it's doing more
work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a much smaller amount of total JSON, I observe similar results. This time with 10 objects of 1000 elements, ~228910 bytes of JSON. (network was negligible here)

Scripts FCP DCL parse
Single 91.0ms 54.1ms 12.38ms
Multiple 65.2ms 36.0ms 0.77ms
Multiple-all 89.8ms 53.0ms

Multiple, unsurprisingly, outperforms single. Multiple-all is similar to single but it still seems to perform better.

It seems clear to me that multiple script tags are the way to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent analysis. It all makes perfect sense now that you shared the summary of your findings. In particular having multiple JSON objects optimizes the access as it only needs to process the data that is necessary at a given time. I also now remember that there were some performance issues with parsing serialized very large JSON objects. That is probably reflected in multiple-all scenario.

What was again the reasoning behind using the script tag to share data? I was wondering if we could use instead the data attribute on the script tag since it has to be parsed anyway on-demand. We have data to support the
reasoning of being in favor the approach where we expose the data scoped for every individual ES Module. So essentially, I would be curious to see how consolidation of exposed data on the connected script tag would impact the result - smaller HTML size? Easier printing by connecting with a single script tag?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized one thing after some more thinking. When the module script is only listed in the import map, we don't print the script tag. So we rather should discard this idea, as it would only work better in the scenario when the script tag gets printed, but it would make it all more confusing otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think script tags are a better choice than attributes. The contents of these script tags are practically ignored by the browser, whereas attributes will be subject to more parsing rules. This makes the encoding/escaping more difficult and likely implies more work for the browser.

But the big thing is that there's no obvious place that attribute should go. Modules may appear in their own script tags (if enqueued) or in the importmap (if they're dependencies) or in both places. It's not clear which of those places they should go. Given that, I think there would probably need to be some other tag for this and <script type="application/json"> seems largely ideal to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised at some of the perf findings and asked @sgomes to check my findings. In summary, it looks like the performance differences I was finding were not as drastic as it seems, this was likely due to some deoptimizations from the Chrome devtools.

With some adjustments to the benchmarks and other testing methodology, results are much closer. Now it looks like multiple script tags and parsing all of the script tags is about ~6% slower than parsing a single big script tag. This is more in line with what we might expect. This is 6% slower on a very fast operation and not likely cause for concern (a is single, b is multi-all):

Test results for old version (http://127.0.0.1:38080/a.html)
Mean: 278.28ms
Standard deviation: 4.02ms
Slowest measurement: 289.7ms
Fastest measurement: 272.1ms

Test results for new version (http://127.0.0.1:38080/b.html)
Mean: 296.2ms
Standard deviation: 3.25ms
Slowest measurement: 303ms
Fastest measurement: 290.4ms


The new version appears to be SLOWER than the old version.
The new version appears to be 6.44% slower (takes 106.44% of the time of the old version).

Other reasons remain to prefer multiple script tags. For example, it's unlikely that all the data for all the modules will be used immediately on page load, in which case it makes sense to only parse the desired data and not parse data we're not interested in over and over.

wp_json_encode( $data, $json_encode_flags ),
array(
'type' => 'application/json',
'id' => "wp-scriptmodule-data_{$module_id}",
)
);
}
}
}

add_action( 'wp_footer', 'gutenberg_print_script_module_data' );
add_action( 'admin_print_footer_scripts', 'gutenberg_print_script_module_data' );
16 changes: 8 additions & 8 deletions packages/interactivity/src/store.ts
Original file line number Diff line number Diff line change
Expand Up @@ -319,15 +319,15 @@ export function store(
}

export const parseInitialData = ( dom = document ) => {
const storeTag = dom.querySelector(
`script[type="application/json"]#wp-interactivity-data`
);
if ( storeTag?.textContent ) {
const jsonDataScriptTag =
// Preferred Script Module data passing form
dom.getElementById( 'wp-scriptmodule-data_@wordpress/interactivity' ) ??
// Legacy form
dom.getElementById( 'wp-interactivity-data' );
sirreal marked this conversation as resolved.
Show resolved Hide resolved
if ( jsonDataScriptTag?.textContent ) {
try {
return JSON.parse( storeTag.textContent );
} catch ( e ) {
// Do nothing.
}
return JSON.parse( jsonDataScriptTag.textContent );
} catch {}
}
return {};
};
Expand Down
Loading