Content filter: try parsing the HTML before a DB query #175

joemcgill · 2015-09-18T14:55:01Z

This is another approach to a content filter based off the work in #170.

This filter only matches images with a src including the path to our uploads directory to slim down on the number of images it attempts to parse. Then, we first try to determine the post ID and size by parsing the value of the class attribute and finally, run a database query only as a last resort.

This still needs some perf testing, but I wanted to get feedback on the approach.

joemcgill · 2015-09-18T15:06:45Z

I think there's something wrong with the filter test, and not the code, but will confirm.

joemcgill · 2015-09-18T22:20:40Z

@jaspermdegroot I figured out what went wrong with the tests and went back and brought in a few more of your original ideas from #144. Mainly, it's possible for the order of the classes to be different depending on which image size you use (which is strange, but so it is).

jaspermdegroot · 2015-09-19T11:27:29Z

@joemcgill

I like the approach of first trying to get the ID and size name from the classes in the image markup and only use the image url as fallback. I have written the reasons why I am in favor of this in a comment on the "main" PR for content filtering (#170 (comment)) to keep the discussion at one place.

When I first looked at your code I had some concerns about the simplified way to get the ID and size name from the classes, but you already addressed those. I will review the code again and add inline comments.

I already applied your change in the unit test to the content-filtering branch (7c7c775) because I was curious if the tests would pass, but Travis seems to have issues at the moment so I still don't know.
We can squash your commits and then rebase against the content-filtering branch to make the commit that we will be merging only contain changes in the code itself.

jaspermdegroot · 2015-09-19T11:29:14Z

wp-tevko-responsive-images.php

@@ -18,7 +18,6 @@
 // Don't load the plugin directly
 defined( 'ABSPATH' ) or die( "No script kiddies please!" );

-// List includes


The comment shouldn't be removed

jaspermdegroot · 2015-09-19T13:35:08Z

@joemcgill

At first I didn't understand why you didn't need to use a regex to strip the size part from the url but now I have taken a closer look I get it. So your solution is even better than I thought.
I tested the fallback, also with an editted image, and it works great!

Nice work!!

joemcgill · 2015-09-19T22:29:27Z

@jaspermdegroot I think I've addressed all the issues specific to the content filter. If you agree, I'll squash all of this into a single commit and resolve conflicts with the original content filtering branch so we can merge this into that branch. Once we've done so, we can do any final cleanup before merging into dev.

jaspermdegroot · 2015-09-20T11:48:48Z

@joemcgill

Sounds good!
The only thing I didn't see in the code is the early check if the image already has a srcset. Or did I miss something? Just want to make sure we don't forget to implement that.

joemcgill · 2015-09-20T19:55:52Z

Oh no, good point. I'll add the early rip cord when a srcset is already present.

joemcgill · 2015-09-22T01:19:54Z

I just finished doing some performance profiling, using similar methods to what @jaspermdegroot used in #144. The test site is running locally in a VM running VVV.

Each test shows two results (times in milliseconds).
A: w/o content filter (average : median)
B: with content filter (average : median)

Single Image 225 words

A: 64 : 63
B: 69 : 66

6 images 2000 words (mixed filter methods)

A: 73.438 : 69
B: 89.5 : 85

20K words, 20 images using standard WP markup

A: 133.355 : 121
B: 197.688 : 185

20K words, 20 images with WP classes stripped (forcing a postmeta query)

A: 149.556 : 116
B: 225.552 : 191

Looks like this hits all of our use cases and shows similar results to the ~3ms per image results @jaspermdegroot found with the earlier iteration on the filter.

joemcgill · 2015-09-22T02:35:15Z

Replaced by #177

peterwilsoncc and others added 3 commits September 16, 2015 17:31

Get all attachments at once

567a2c3

Content filter: try parsing the HTML before a DB query

b7286db

Remove requirement for removed class-srcset-callback.php

7fcb70d

Updated the filter logic and tests

915d8c3

joemcgill added 2 commits September 18, 2015 20:10

Sanity check value of incoming $image variable

b29c09b

Update docs for tevkori_filter_content_images and callback

07ffe14

jaspermdegroot force-pushed the content-filtering branch from 85a3cba to 2e77803 Compare September 19, 2015 08:59

jaspermdegroot mentioned this pull request Sep 19, 2015

Content filtering #170

Closed

jaspermdegroot reviewed Sep 19, 2015
View reviewed changes

joemcgill added 5 commits September 19, 2015 09:56

Filter: Make closing space and slash optional in regex pattern.

bbcb69b

Follow WP standards for inline docs

758f6e9

Content filter: Try getting medata using an ID.

75f99ad

Cast as integer when retrieving from class name

8bb2f33

Replace '// List includes' that was removed

b5b0484

Content filter: Bail early if srcset present.

610974c

joemcgill force-pushed the content-filtering-2 branch from 5c90df8 to 610974c Compare September 20, 2015 22:14

joemcgill added 2 commits September 21, 2015 20:51

Tests: Account for all content filter patterns

b774f40

Tests: Add test for preexisting srcset attribute

334141b

joemcgill mentioned this pull request Sep 22, 2015

Add srcset and sizes in content with a filter #177

Merged

joemcgill closed this Sep 22, 2015

joemcgill deleted the content-filtering-2 branch October 4, 2015 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content filter: try parsing the HTML before a DB query #175

Content filter: try parsing the HTML before a DB query #175

joemcgill commented Sep 18, 2015

joemcgill commented Sep 18, 2015

joemcgill commented Sep 18, 2015

jaspermdegroot commented Sep 19, 2015

jaspermdegroot Sep 19, 2015

jaspermdegroot commented Sep 19, 2015

joemcgill commented Sep 19, 2015

jaspermdegroot commented Sep 20, 2015

joemcgill commented Sep 20, 2015

joemcgill commented Sep 22, 2015

joemcgill commented Sep 22, 2015

Content filter: try parsing the HTML before a DB query #175

Content filter: try parsing the HTML before a DB query #175

Conversation

joemcgill commented Sep 18, 2015

joemcgill commented Sep 18, 2015

joemcgill commented Sep 18, 2015

jaspermdegroot commented Sep 19, 2015

jaspermdegroot Sep 19, 2015

Choose a reason for hiding this comment

jaspermdegroot commented Sep 19, 2015

joemcgill commented Sep 19, 2015

jaspermdegroot commented Sep 20, 2015

joemcgill commented Sep 20, 2015

joemcgill commented Sep 22, 2015

Single Image 225 words

6 images 2000 words (mixed filter methods)

20K words, 20 images using standard WP markup

20K words, 20 images with WP classes stripped (forcing a postmeta query)

joemcgill commented Sep 22, 2015