Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content filter: try parsing the HTML before a DB query #175

Closed
wants to merge 14 commits into from

Conversation

joemcgill
Copy link
Member

This is another approach to a content filter based off the work in #170.

This filter only matches images with a src including the path to our uploads directory to slim down on the number of images it attempts to parse. Then, we first try to determine the post ID and size by parsing the value of the class attribute and finally, run a database query only as a last resort.

This still needs some perf testing, but I wanted to get feedback on the approach.

@joemcgill
Copy link
Member Author

I think there's something wrong with the filter test, and not the code, but will confirm.

@joemcgill
Copy link
Member Author

@jaspermdegroot I figured out what went wrong with the tests and went back and brought in a few more of your original ideas from #144. Mainly, it's possible for the order of the classes to be different depending on which image size you use (which is strange, but so it is).

@jaspermdegroot
Copy link
Member

@joemcgill

I like the approach of first trying to get the ID and size name from the classes in the image markup and only use the image url as fallback. I have written the reasons why I am in favor of this in a comment on the "main" PR for content filtering (#170 (comment)) to keep the discussion at one place.

When I first looked at your code I had some concerns about the simplified way to get the ID and size name from the classes, but you already addressed those. I will review the code again and add inline comments.

I already applied your change in the unit test to the content-filtering branch (7c7c775) because I was curious if the tests would pass, but Travis seems to have issues at the moment so I still don't know.
We can squash your commits and then rebase against the content-filtering branch to make the commit that we will be merging only contain changes in the code itself.

@@ -18,7 +18,6 @@
// Don't load the plugin directly
defined( 'ABSPATH' ) or die( "No script kiddies please!" );

// List includes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment shouldn't be removed

@jaspermdegroot
Copy link
Member

@joemcgill

At first I didn't understand why you didn't need to use a regex to strip the size part from the url but now I have taken a closer look I get it. So your solution is even better than I thought.
I tested the fallback, also with an editted image, and it works great!

Nice work!!

@joemcgill
Copy link
Member Author

@jaspermdegroot I think I've addressed all the issues specific to the content filter. If you agree, I'll squash all of this into a single commit and resolve conflicts with the original content filtering branch so we can merge this into that branch. Once we've done so, we can do any final cleanup before merging into dev.

@jaspermdegroot
Copy link
Member

@joemcgill

Sounds good!
The only thing I didn't see in the code is the early check if the image already has a srcset. Or did I miss something? Just want to make sure we don't forget to implement that.

@joemcgill
Copy link
Member Author

Oh no, good point. I'll add the early rip cord when a srcset is already present.

@joemcgill
Copy link
Member Author

I just finished doing some performance profiling, using similar methods to what @jaspermdegroot used in #144. The test site is running locally in a VM running VVV.

Each test shows two results (times in milliseconds).
A: w/o content filter (average : median)
B: with content filter (average : median)

Single Image 225 words

A: 64 : 63
B: 69 : 66

6 images 2000 words (mixed filter methods)

A: 73.438 : 69
B: 89.5 : 85

20K words, 20 images using standard WP markup

A: 133.355 : 121
B: 197.688 : 185

20K words, 20 images with WP classes stripped (forcing a postmeta query)

A: 149.556 : 116
B: 225.552 : 191


Looks like this hits all of our use cases and shows similar results to the ~3ms per image results @jaspermdegroot found with the earlier iteration on the filter.

@joemcgill
Copy link
Member Author

Replaced by #177

@joemcgill joemcgill closed this Sep 22, 2015
@joemcgill joemcgill deleted the content-filtering-2 branch October 4, 2015 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants