Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content filtering #170

Closed
wants to merge 5 commits into from
Closed

Content filtering #170

wants to merge 5 commits into from

Conversation

jaspermdegroot
Copy link
Member

This PR replaces #144

See #83


remove_filter( 'editor_max_image_size', 'tevkori_editor_image_size' );
function tevkori_filter_content_images( $content ) {
preg_match_all( '/src="([^"]*)"/', $content, $matches );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to skip images with srcset in this line, regexp isn't my strong point sorry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterwilsoncc

I am not a regex expert either, but I don't think they should be used to check if something is not there. If it's possible to write a regex like that it will probably not perform very well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should limit this to image elements because iframes also have a src attribute.

@jaspermdegroot jaspermdegroot force-pushed the content-filtering branch 2 times, most recently from 7508cae to 8362131 Compare September 15, 2015 09:14
// Build the data-sizes attribute if sizes were returned.
$sizes = tevkori_get_sizes( $id, $size );
$sizes = $sizes ? 'data-sizes="' . $sizes . '"' : '';
$attachments = tevkori_attachment_urls_to_loop( $urls );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URLs that we pass to the function contain the size suffix (eg. -300x200) if present. I think we have to strip that like we do in https://github.com/ResponsiveImagesCG/wp-tevko-responsive-images/pull/170/files#diff-517abe0a5b7ee2c55c6b5659d0c9350eR52 but would be nice if we don't have to do it at two places.

@jaspermdegroot jaspermdegroot force-pushed the content-filtering branch 3 times, most recently from 0b29f60 to 7ffecaf Compare September 15, 2015 10:08
return;
}
foreach ( $paths as $k => $path ) {
$image_path = parse_url( $path );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could probably add the check if the file path matches with the upload dir and skip if not, as suggested by @joemcgill in #144 (comment)

@jaspermdegroot
Copy link
Member Author

All tests pass but for some reason it doesn't work if I test on a local WordPress installation. I don't get any errors, but no srcset and sizes attributes are added.

@jaspermdegroot
Copy link
Member Author

I now understand that our tests only test specific functions. We don't have a general test yet that checks if an image in a piece of content gets the correct srcset and sizes attributes (@joemcgill will work on that). So the fact that all test pass doesn't mean content filtering is working.

@peterwilsoncc - Did your code work for you or didn't have a chance to test it?

@peterwilsoncc
Copy link

@jaspermdegroot I tested it against a post with two images & it appeared to work. I'll investigate further this weekend. I'll throw it against various http & https & FORCE_SSL configs.

jaspermdegroot and others added 2 commits September 19, 2015 10:58
@jaspermdegroot
Copy link
Member Author

Currently we have the following approaches:

  1. Get the ID and size name from the core classes in the image markup - commit 4d3b6ae
  2. Use the image url to get ID and size info - commit 9bfea06
  3. Use the image url (2), but get all attachments at once to reduce DB requests - commit 2e77803
  4. Get the ID and size name from classes (1) and only use the image url (2) as fallback - PR Content filter: try parsing the HTML before a DB query #175

The reason we switched from the first to the second approach was because the core classes could be removed from the image markup. I don't know if there are filters to remove those before the markup is inserted, but I think it would be very bad practise if a plugin (or even worse, a theme) does this because it can't be reverted when removing the plugin or switching theme. Of course a user can remove classes in the editor after the image markup has been inserted, but I think we are talking about an edge case here.

The downside of using the image url to get the ID and size info is that we need to strip the size (-###x###) and "edited" (-e#############) suffixes from the url. That's tricky because the original file name could contain such a pattern.
Besides that there is the performance concern because it adds another DB request. The performance test results in the old PR (#144 (comment)) showed indeed that it is slower, but the difference is very small.

@peterwilsoncc has been working on the third approach which would reduce DB request and should improve performance, but the code didn't seem to work yet so there are no performance test results of this.

However, even if we could improve performance, I think that we should first choose the most reliable way to get the ID and size, and after that optimize for performance.
That's why I am in favor of the fourth approach as suggested by @joemcgill in PR #175, to try to get the ID and size name from the classes and only use the image url as fallback.

If everyone agrees with that approach I suggest the following next steps:

Update: After taking a closer look at PR #175 I noticed that with @joemcgill his solution we don't need to use a regex to strip the size part from the url to find the image in the database. So the only downside that remains is the extra DB request, but since it's only used as fallback that isn't really a problem.

@joemcgill
Copy link
Member

Great summary @jaspermdegroot.

Another improvement I'd like to add to #175 is to use wp_get_attachment_metada() to get the size when we're able to get the ID by parsing the HTML but not the size. For example, if someone uses the resize handles in the editor to manually resize the image, the wp-image-{id} class will remain, but the size-{size} class will be removed.

$sizes = $srcset = '';

// Check if srcset attribute is not already present.
if ( false !== strpos( 'srcset="', $atts ) ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction:

if ( false === strpos( $atts, 'srcset="' ) ) {

@peterwilsoncc
Copy link

I've added some inline notes with bug fixes, you can cherry pick from 9cad921 in my fork.

I can add a pull request to this branch if you wish, but PR inception seems excessive for so few changes.

@peterwilsoncc
Copy link

Threw some SSL related tests at this: front end accessible over SSL, site scheme set to http in options. Brain dump follows.

Inserting the image in the editor using:

  • HTTP: srcset missed
  • relative scheme: srcset missed
  • HTTP: srcset added

set_url_scheme( $url, 'relative'); in WP_Ricg_Content_Filter's constructor and the callback recovered but it risks hitting externally hosted images, othersite.com/wp-contents/uploads/2015/09/kitten.gif could end up with the srcset intended for mysite.com/wp-contents/uploads/2015/09/kitten.gif

tevkori_attachment_urls_to_loop matches both the HTTP and HTTPS sources, it doesn't match the relative scheme source: parse_url doesn't set the scheme if it's relative (tested php 5.6.2).

I'm not sure of the solution as of yet, some wrappers around the WP scheme functions might be in order to better handle relative URLs.

@joemcgill
Copy link
Member

@jaspermdegroot and @peterwilsoncc:

I don't want to lose the progress that has been made towards limiting the potential DB hits needed, so instead of merging the work in #175 into this branch, I've squashed those changes into #177 so it can be merged directly into dev.

In the event that we need to further improve the DB query approach in #177, we can come back to this branch for reference and see the full history.

Sound good?

@jaspermdegroot
Copy link
Member Author

@joemcgill

Sounds good. I will address the issues mentioned by @peterwilsoncc in this branch.

@peterwilsoncc
Copy link

@joemcgill @jaspermdegroot

👍, sounds like a plan.

@joemcgill
Copy link
Member

Closing in favor of #177, but we can reopen as needed.

@joemcgill joemcgill closed this Sep 23, 2015
@jaspermdegroot
Copy link
Member Author

Ok. I pushed a few more commits to the branch to fix the issues that @peterwilsoncc mentioned, but there is still more work to do if we want to make this approach work. Since we decided to go with #177 let's not invest time in this now.

@peterwilsoncc
Copy link

Looks like the needle/haystack are still reversed on the srcset attribute check if you need to come back to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants