Content filtering #170

jaspermdegroot · 2015-09-14T16:46:41Z

This PR replaces #144

peterwilsoncc · 2015-09-14T22:25:58Z

wp-tevko-responsive-images.php

-
-	remove_filter( 'editor_max_image_size', 'tevkori_editor_image_size' );
+function tevkori_filter_content_images( $content ) {
+	preg_match_all( '/src="([^"]*)"/', $content, $matches );


Is it possible to skip images with srcset in this line, regexp isn't my strong point sorry.

@peterwilsoncc

I am not a regex expert either, but I don't think they should be used to check if something is not there. If it's possible to write a regex like that it will probably not perform very well.

We should limit this to image elements because iframes also have a src attribute.

jaspermdegroot · 2015-09-15T09:22:18Z

wp-tevko-responsive-images.php

-		// Build the data-sizes attribute if sizes were returned.
-		$sizes = tevkori_get_sizes( $id, $size );
-		$sizes = $sizes ? 'data-sizes="' . $sizes . '"' : '';
+	$attachments = tevkori_attachment_urls_to_loop( $urls );


The URLs that we pass to the function contain the size suffix (eg. -300x200) if present. I think we have to strip that like we do in https://github.com/ResponsiveImagesCG/wp-tevko-responsive-images/pull/170/files#diff-517abe0a5b7ee2c55c6b5659d0c9350eR52 but would be nice if we don't have to do it at two places.

jaspermdegroot · 2015-09-15T10:38:51Z

wp-tevko-responsive-images.php

-		return;
-	}
+	foreach ( $paths as $k => $path ) {
+		$image_path = parse_url( $path );


Here we could probably add the check if the file path matches with the upload dir and skip if not, as suggested by @joemcgill in #144 (comment)

jaspermdegroot · 2015-09-15T10:40:35Z

All tests pass but for some reason it doesn't work if I test on a local WordPress installation. I don't get any errors, but no srcset and sizes attributes are added.

jaspermdegroot · 2015-09-15T14:29:31Z

I now understand that our tests only test specific functions. We don't have a general test yet that checks if an image in a piece of content gets the correct srcset and sizes attributes (@joemcgill will work on that). So the fact that all test pass doesn't mean content filtering is working.

@peterwilsoncc - Did your code work for you or didn't have a chance to test it?

peterwilsoncc · 2015-09-15T22:11:29Z

@jaspermdegroot I tested it against a post with two images & it appeared to work. I'll investigate further this weekend. I'll throw it against various http & https & FORCE_SSL configs.

…cset attributes Fixes gh-83 Fixes gh-72

See commit 915d8c3 in content-filtering-2 branch.

jaspermdegroot · 2015-09-19T10:58:01Z

Currently we have the following approaches:

Get the ID and size name from the core classes in the image markup - commit 4d3b6ae
Use the image url to get ID and size info - commit 9bfea06
Use the image url (2), but get all attachments at once to reduce DB requests - commit 2e77803
Get the ID and size name from classes (1) and only use the image url (2) as fallback - PR Content filter: try parsing the HTML before a DB query #175

The reason we switched from the first to the second approach was because the core classes could be removed from the image markup. I don't know if there are filters to remove those before the markup is inserted, but I think it would be very bad practise if a plugin (or even worse, a theme) does this because it can't be reverted when removing the plugin or switching theme. Of course a user can remove classes in the editor after the image markup has been inserted, but I think we are talking about an edge case here.

The downside of using the image url to get the ID and size info is that we need to strip the size (-###x###) and "edited" (-e#############) suffixes from the url. That's tricky because the original file name could contain such a pattern.
Besides that there is the performance concern because it adds another DB request. The performance test results in the old PR (#144 (comment)) showed indeed that it is slower, but the difference is very small.

@peterwilsoncc has been working on the third approach which would reduce DB request and should improve performance, but the code didn't seem to work yet so there are no performance test results of this.

However, even if we could improve performance, I think that we should first choose the most reliable way to get the ID and size, and after that optimize for performance.
That's why I am in favor of the fourth approach as suggested by @joemcgill in PR #175, to try to get the ID and size name from the classes and only use the image url as fallback.

If everyone agrees with that approach I suggest the following next steps:

Merge PR Content filter: try parsing the HTML before a DB query #175
Check the old PR Use content filtering instead of the image editor #144 to see if all comments there have been addressed
Run performance tests against PHP 5.6 and 5.2 and display the results in a comment on this PR
Review the code and see if we can make more (performance) improvements

Update: After taking a closer look at PR #175 I noticed that with @joemcgill his solution we don't need to use a regex to strip the size part from the url to find the image in the database. So the only downside that remains is the extra DB request, but since it's only used as fallback that isn't really a problem.

joemcgill · 2015-09-19T19:53:10Z

Great summary @jaspermdegroot.

Another improvement I'd like to add to #175 is to use wp_get_attachment_metada() to get the size when we're able to get the ID by parsing the HTML but not the size. For example, if someone uses the resize handles in the editor to manually resize the image, the wp-image-{id} class will remain, but the size-{size} class will be removed.

peterwilsoncc · 2015-09-20T03:58:21Z

class-srcset-callback.php

+		$sizes = $srcset = '';
+
+		// Check if srcset attribute is not already present.
+		if ( false !== strpos( 'srcset="', $atts ) ) {


Correction:

if ( false === strpos( $atts, 'srcset="' ) ) {

peterwilsoncc · 2015-09-20T04:13:23Z

I've added some inline notes with bug fixes, you can cherry pick from 9cad921 in my fork.

I can add a pull request to this branch if you wish, but PR inception seems excessive for so few changes.

peterwilsoncc · 2015-09-20T05:27:24Z

Threw some SSL related tests at this: front end accessible over SSL, site scheme set to http in options. Brain dump follows.

Inserting the image in the editor using:

HTTP: srcset missed
relative scheme: srcset missed
HTTP: srcset added

set_url_scheme( $url, 'relative'); in WP_Ricg_Content_Filter's constructor and the callback recovered but it risks hitting externally hosted images, othersite.com/wp-contents/uploads/2015/09/kitten.gif could end up with the srcset intended for mysite.com/wp-contents/uploads/2015/09/kitten.gif

tevkori_attachment_urls_to_loop matches both the HTTP and HTTPS sources, it doesn't match the relative scheme source: parse_url doesn't set the scheme if it's relative (tested php 5.6.2).

I'm not sure of the solution as of yet, some wrappers around the WP scheme functions might be in order to better handle relative URLs.

joemcgill · 2015-09-22T02:43:37Z

@jaspermdegroot and @peterwilsoncc:

I don't want to lose the progress that has been made towards limiting the potential DB hits needed, so instead of merging the work in #175 into this branch, I've squashed those changes into #177 so it can be merged directly into dev.

In the event that we need to further improve the DB query approach in #177, we can come back to this branch for reference and see the full history.

Sound good?

jaspermdegroot · 2015-09-22T07:33:07Z

@joemcgill

Sounds good. I will address the issues mentioned by @peterwilsoncc in this branch.

peterwilsoncc · 2015-09-22T23:05:41Z

@joemcgill @jaspermdegroot

👍, sounds like a plan.

joemcgill · 2015-09-23T01:14:07Z

Closing in favor of #177, but we can reopen as needed.

jaspermdegroot · 2015-09-23T08:00:17Z

Ok. I pushed a few more commits to the branch to fix the issues that @peterwilsoncc mentioned, but there is still more work to do if we want to make this approach work. Since we decided to go with #177 let's not invest time in this now.

peterwilsoncc · 2015-09-24T00:39:35Z

Looks like the needle/haystack are still reversed on the srcset attribute check if you need to come back to it.

jaspermdegroot mentioned this pull request Sep 14, 2015

Use content filtering instead of the image editor #144

Closed

peterwilsoncc reviewed Sep 14, 2015
View reviewed changes

jaspermdegroot force-pushed the content-filtering branch 2 times, most recently from 7508cae to 8362131 Compare September 15, 2015 09:14

jaspermdegroot reviewed Sep 15, 2015
View reviewed changes

jaspermdegroot force-pushed the content-filtering branch 3 times, most recently from 0b29f60 to 7ffecaf Compare September 15, 2015 10:08

jaspermdegroot reviewed Sep 15, 2015
View reviewed changes

joemcgill mentioned this pull request Sep 16, 2015

Add tests for content filtering #171

Merged

jaspermdegroot added 2 commits September 16, 2015 16:53

Use content filtering instead of the image editor to add sizes and sr…

4d3b6ae

…cset attributes Fixes gh-83 Fixes gh-72

Use the image url to get the ID and size

9bfea06

jaspermdegroot force-pushed the content-filtering branch from 7ffecaf to cdfd417 Compare September 16, 2015 14:57

Add tests for content filtering

264222c

jaspermdegroot force-pushed the content-filtering branch from b176b9f to 567a2c3 Compare September 16, 2015 15:32

joemcgill mentioned this pull request Sep 18, 2015

Content filter: try parsing the HTML before a DB query #175

Closed

jaspermdegroot and others added 2 commits September 19, 2015 10:58

Correction in unit test

7c7c775

See commit 915d8c3 in content-filtering-2 branch.

Get all attachments at once

2e77803

jaspermdegroot force-pushed the content-filtering branch from 85a3cba to 2e77803 Compare September 19, 2015 08:59

peterwilsoncc reviewed Sep 20, 2015
View reviewed changes

joemcgill mentioned this pull request Sep 22, 2015

Add srcset and sizes in content with a filter #177

Merged

joemcgill closed this Sep 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content filtering #170

Content filtering #170

jaspermdegroot commented Sep 14, 2015

peterwilsoncc Sep 14, 2015

jaspermdegroot Sep 14, 2015

jaspermdegroot Sep 14, 2015

jaspermdegroot Sep 15, 2015

jaspermdegroot Sep 15, 2015

jaspermdegroot commented Sep 15, 2015

jaspermdegroot commented Sep 15, 2015

peterwilsoncc commented Sep 15, 2015

jaspermdegroot commented Sep 19, 2015

joemcgill commented Sep 19, 2015

peterwilsoncc Sep 20, 2015

peterwilsoncc commented Sep 20, 2015

peterwilsoncc commented Sep 20, 2015

joemcgill commented Sep 22, 2015

jaspermdegroot commented Sep 22, 2015

peterwilsoncc commented Sep 22, 2015

joemcgill commented Sep 23, 2015

jaspermdegroot commented Sep 23, 2015

peterwilsoncc commented Sep 24, 2015

Content filtering #170

Content filtering #170

Conversation

jaspermdegroot commented Sep 14, 2015

peterwilsoncc Sep 14, 2015

Choose a reason for hiding this comment

jaspermdegroot Sep 14, 2015

Choose a reason for hiding this comment

jaspermdegroot Sep 14, 2015

Choose a reason for hiding this comment

jaspermdegroot Sep 15, 2015

Choose a reason for hiding this comment

jaspermdegroot Sep 15, 2015

Choose a reason for hiding this comment

jaspermdegroot commented Sep 15, 2015

jaspermdegroot commented Sep 15, 2015

peterwilsoncc commented Sep 15, 2015

jaspermdegroot commented Sep 19, 2015

joemcgill commented Sep 19, 2015

peterwilsoncc Sep 20, 2015

Choose a reason for hiding this comment

peterwilsoncc commented Sep 20, 2015

peterwilsoncc commented Sep 20, 2015

joemcgill commented Sep 22, 2015

jaspermdegroot commented Sep 22, 2015

peterwilsoncc commented Sep 22, 2015

joemcgill commented Sep 23, 2015

jaspermdegroot commented Sep 23, 2015

peterwilsoncc commented Sep 24, 2015