Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why top 8x8 pixels of the Matrix are selected at PerceptualHash? #52

Open
nitotm opened this issue Feb 21, 2020 · 19 comments
Open

Why top 8x8 pixels of the Matrix are selected at PerceptualHash? #52

nitotm opened this issue Feb 21, 2020 · 19 comments

Comments

@nitotm
Copy link

nitotm commented Feb 21, 2020

The matrix is 32x32, are the first 8x8 ordered in any relevant way?

Also which hash is performing better for you?

@jenssegers
Copy link
Owner

I don't think only the first 8x8 is taken. The perceptual hash one is not fully ready yet. The others work pretty well.

@nitotm
Copy link
Author

nitotm commented Feb 22, 2020

What I understand is:
The final hash, is the first 8x8 pixels of the Matrix, compared against the median or average (of the same 8x8), which gives a 0 or 1. But matrix is 32x32.

The whole 32x32 matrix has to be calculated *(NOT TRUE, findings in next post) to produce good results, and the first 8x8 also is gives the best results, according to my empiric testing, not because I understand how the matrix is done (I'm still reading about Discrete Cosine Transformations to figure it out).

I was trying to figure out if it’s worth to store more "hash", from the matrix, if the pixels beyond the 8x8 could be relevant, since they are already calculated.
Now for what I understand is not worth it unless you want a second hash with less relevant and more specific data points to further order by similarity the images matched by the first hash.

What’s lacking to the perceptual hash to be ready? I believe according to my tests that is the best performer. I'm probably going to use it, actually I might integrate it to my app within a couple of days.

@nitotm
Copy link
Author

nitotm commented Feb 22, 2020

I was not seeing why the whole 32*32 matrix was calculated, and before I checked incorrectly that was necessary (at least for the code to produce the same results), so I tried again.

In short, now I'm producing only a 8x8 matrix, which gives the same exact final hash result, AND I reduced the total execution time by a +75%, from 210ms to 50ms for an image, or 12 sec to 2,8 sec for my entire benchmark test.

I'm just doing :
$matrix_size=8;
...
$rows[$y] = $this->calculateDCT($row,$matrix_size);
...
for ($x = 0; $x < $matrix_size; $x++) {
...
$matrix[$x] = $this->calculateDCT($col,$matrix_size);
...
function calculateDCT(array $matrix, $this_size=false){
...
$size_i = ( $this_size ?: $size );
for ($i = 0; $i < $size_i; $i++) {
...
$pixels = array_merge(...$matrix); // Removing the pixels 'for' loop, this is faster

After this findings I'm not confident on how the DCT matrix is done.

UPDATE: I can confirm the DCT is correct, I test it with an exemple ->" _a_Letter-a-8x8 "on wikipedia and I got the correct results, exemple column A I got (you have to add colors r+g+b raw):
https://en.wikipedia.org/wiki/File:Dct-table.png

        [0] => 6.1917162
        [1] => 0.2205060937849
        [2] => 1.0422963188867
        [3] => -0.23402753571438
        [4] => 0.2750022
        [5] => 0.065252207972911
        [6] => 0.31692732314466
        [7] => -0.29696260903528

https://en.wikipedia.org/wiki/Discrete_cosine_transform

@nitotm
Copy link
Author

nitotm commented Feb 23, 2020

Well I did some heat maps of the matrix and it showed expected result, so I'm guessing the DCT is working ok. Its supposed to give a top-let to bottom-right in diagonal result.

Also note that the first bit seems to be always 1, I don't know if theoretically but in practice, so it can be discarded, or changed by a 0 making it easier to handle on software without unsigned 64bit integer, or add the bit "65", so move it one place.
At phash.org, they start the matrix at 2x2 (1x1 if we start at 0) instead of 1x1, discarding first row and column, for example 1x8 which is not really low frequency, the idea was to discard the very low frequency data, since its less useful, but as said it’s supposed to advance diagonally, I have to say that in some heat maps I actually observed lower frequencies on row and column 1 but, not enough to start the matrix at 2x2.
My view on this is discard 1x1 for sure, and maybe 1x2 and 2x1, but not more.

As I said the order of the matrix is relevant, it is not the most optimal to just pick the first 8x8 from the 32x32 image, the reason for the 8x8 in the first place, is to get the lower frequencies of the DCT matrix (maybe discarding the super low) because the higher ones are not good for our intensions. So with that in mind, if the matrix frequency advances diagonally it only makes sense to order it diagonally, this will provide:

  • More information for our 8bytes of data, not only the bits will contain information, but the order too, more bang for our bucket.

  • This provides the possibility to partition a long hash, and you now the first bits are the most likely to match other similar images, not something everybody will apply in some form but now it’s a possibility.

  • Finally and more important, a difference on the first bits would be more relevant, so you could apply a formula to add more weight for earlier differences. You might not be able to do that on a SQL query but it can be done with a result set of hashes.

The green and yellow heatmap 32x32 image, is the one I made.

Low-Middle-and-High-frequency-distribution-in-a-DCT-block
Visual-speech-features-extraction-28-low-spatial-frequency-DCT-coefficients-are
my heatmap

@nitotm
Copy link
Author

nitotm commented Feb 23, 2020

function diagonalMatrix($mat,$size=11,$half=true){
    $mode = 0;
    $it = 0;
    $lower = 0; 
    $result=[];
    $max = ( $half ? ceil( (($size*$size)/2)+($size*0.5) ) : 0 );
    for ($t = 0; $t < (2 * $size - 1); $t++) { 
        $t1 = $t; 
        if ($t1 >= $size) { 
            $mode++; 
            $t1 = $size - 1; 
            $it--; 
            $lower++; 
        } 
        else { 
            $lower = 0; 
            $it++; 
        } 
        for ($i = $t1; $i >= $lower; $i--) { 
            if ( $half && count($result)>=$max){ return $result;}
            if ( ($t1 + $mode) % 2 == 0 ) { 
                $result[] = $mat[$i][$t1 + $lower - $i]; 
            } 
            else { 
                $result[] = $mat[$t1 + $lower - $i][$i]; 
            } 
        } 
    } 
    return $result; 
} 

@jenssegers
Copy link
Owner

My initial version of the phash was a port of some existing code I found. Looks like you got quite a good understanding of how the algorithm works! Do you see code improvements you could do?

@nitotm
Copy link
Author

nitotm commented Feb 23, 2020

Apart from the already commented (the unnecessary calculations, the diagonal order of the final hash, and discarding the first bit), now I'm looking at the median/average method:
I was ready to use the “median” and almost finish the job here, because it’s giving me better results than “average” and its what’s used on phash.org, but I realized that with median we are not really getting all 2^64 possibilities, it's inherit in the way the algorithm is done, it happens with average too but much more with median because 50% of bits will always be 0 and the other 1, so we are not getting the full range that 64bit can offer (not that I'm aiming for a 100% but I think it could be better), increasing the possibility of unwanted collisions.

I’m trying to think of solution that would provide more entropy without damaging the efficacy of finding similar images.
But I don't know if its possible, I tried to calculate against a transposed matrix (multiply and change sign), but happened what I expected, I gained entropy, but it lost too much accuracy, 82% to 77%, while both standard average and median are at +82%, so I'm not sure what to try next or if there's even a way.

That or use average, but for that I will need to increase the size of my benchmark to get more solid decisions.

@nitotm
Copy link
Author

nitotm commented Feb 24, 2020

Well I decided to just use average, I believe that in the long run will be the better choice, and accuracy results against the median are very similar.
Regarding the further changes I made, now I’m partially calculating an 11x11 matrix, the 11 rows, but then the final matrix I only calculates a little bit more than a half.
In order to do this, before the loop where the matrix is created, add:

$row_matrix_size=$matrix_size;
… and then
$matrix[$x] = $this->calculateDCT($col,$row_matrix_size);
$row_matrix_size--;

$pixels = diagonalMatrix($matrix,$matrix_size);
$pixels = array_slice($pixels,1,64); // I discard the first one and cut to size since we have 66 points here

By the way the second bit, 60% of cases is '1' in case anyone wants to discard it too.

And thats it, here is a heat map of the 11x11 matrix and then the only portion I calculate. Which I clarify, we are really making a 32*32 DCT matrix, but I only calculate what's necessary to get the data we want from it.
11x11 random img both

@nitotm
Copy link
Author

nitotm commented Feb 25, 2020

Well I integrated it and I realized it was slower than I thought, when I looked it in comparison with the other image manipulations that my application does, it was a lot.
After the >75% improvement where I went from 210ms to 50ms, with all the diagonal 11x11 I went up to 58ms.
So I pre-calculated everything in the DCT function, even I made an array for the cosines. That made me go from 58ms to 28ms, I still wanted to go lower. I wanted to work with an image of 26-22px, but I realized that some cosines are very near 0, basically 0, and when the DCT is calculated, it will make the info multiplied by it almost worthless, it still works, it's just a few, but it’s not the ideal matrix. The matrixes 16,32 and 64 don’t have any 0 cosines, every other matrix size has 0s cosines.
The case is 16x16 performed better than 26px, very very near of 32px, so I went with the evidence and I’m using 16x16, and I’m at 14ms, and with this 93% improvement from the start, now I'm done.

@nitotm
Copy link
Author

nitotm commented Feb 25, 2020

PD: finally discarted
I did one last change, I applied an “inverse” multiplication table I done with the heat tables, I got a small improvement but consistent results, up to +0.8% similarity accuracy, and down 1% (0.5% absolute) in different images. I was not sure how that would turn out, I was afraid that would “damage” the data the DCT calculates but apparently it turned ok. Important to note I do this before calculating the average.

@scratcher28
Copy link

@smithtek, could you please share the code and explain how to use it?

@nitotm
Copy link
Author

nitotm commented Mar 7, 2020

@smithtek, could you please share the code and explain how to use it?

Hi, which part?

I finally discarted the "inverse multiplication table", I was not sure of it, and on new benchmaks it performed worse.

If you are talking about the "pre-calculated", yes I could share, but its just calculating in advance to gain efficiency.

Other than that with the fragments I shared you can make the same system, I guess I could paste it all toghether, but keep in mind I changed some things from your code.

@nitotm
Copy link
Author

nitotm commented Mar 7, 2020

Here is a compilation of everything, you may need to make some small changes to work with your existing code.

At function __construct() you add:

$this->size_sqrt = sqrt(2 / $size);
$this->dct_11[32]=[ // Important to use key 32 so in case somebody changes size without precalculating it will show an error
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[0.99879546,0.98917651,0.97003125,0.94154407,0.90398929,0.85772861,0.80320753,0.74095113,0.67155895,0.5956993,0.51410274,0.42755509,0.33688985,0.24298018,0.14673047,0.04906767,-0.04906767,-0.14673047,-0.24298018,-0.33688985,-0.42755509,-0.51410274,-0.5956993,-0.67155895,-0.74095113,-0.80320753,-0.85772861,-0.90398929,-0.94154407,-0.97003125,-0.98917651,-0.99879546],
[0.99518473,0.95694034,0.88192126,0.77301045,0.63439328,0.47139674,0.29028468,0.09801714,-0.09801714,-0.29028468,-0.47139674,-0.63439328,-0.77301045,-0.88192126,-0.95694034,-0.99518473,-0.99518473,-0.95694034,-0.88192126,-0.77301045,-0.63439328,-0.47139674,-0.29028468,-0.09801714,0.09801714,0.29028468,0.47139674,0.63439328,0.77301045,0.88192126,0.95694034,0.99518473],
[0.98917651,0.90398929,0.74095113,0.51410274,0.24298018,-0.04906767,-0.33688985,-0.5956993,-0.80320753,-0.94154407,-0.99879546,-0.97003125,-0.85772861,-0.67155895,-0.42755509,-0.14673047,0.14673047,0.42755509,0.67155895,0.85772861,0.97003125,0.99879546,0.94154407,0.80320753,0.5956993,0.33688985,0.04906767,-0.24298018,-0.51410274,-0.74095113,-0.90398929,-0.98917651],
[0.98078528,0.83146961,0.55557023,0.19509032,-0.19509032,-0.55557023,-0.83146961,-0.98078528,-0.98078528,-0.83146961,-0.55557023,-0.19509032,0.19509032,0.55557023,0.83146961,0.98078528,0.98078528,0.83146961,0.55557023,0.19509032,-0.19509032,-0.55557023,-0.83146961,-0.98078528,-0.98078528,-0.83146961,-0.55557023,-0.19509032,0.19509032,0.55557023,0.83146961,0.98078528],
[0.97003125,0.74095113,0.33688985,-0.14673047,-0.5956993,-0.90398929,-0.99879546,-0.85772861,-0.51410274,-0.04906767,0.42755509,0.80320753,0.98917651,0.94154407,0.67155895,0.24298018,-0.24298018,-0.67155895,-0.94154407,-0.98917651,-0.80320753,-0.42755509,0.04906767,0.51410274,0.85772861,0.99879546,0.90398929,0.5956993,0.14673047,-0.33688985,-0.74095113,-0.97003125],
[0.95694034,0.63439328,0.09801714,-0.47139674,-0.88192126,-0.99518473,-0.77301045,-0.29028468,0.29028468,0.77301045,0.99518473,0.88192126,0.47139674,-0.09801714,-0.63439328,-0.95694034,-0.95694034,-0.63439328,-0.09801714,0.47139674,0.88192126,0.99518473,0.77301045,0.29028468,-0.29028468,-0.77301045,-0.99518473,-0.88192126,-0.47139674,0.09801714,0.63439328,0.95694034],
[0.94154407,0.51410274,-0.14673047,-0.74095113,-0.99879546,-0.80320753,-0.24298018,0.42755509,0.90398929,0.97003125,0.5956993,-0.04906767,-0.67155895,-0.98917651,-0.85772861,-0.33688985,0.33688985,0.85772861,0.98917651,0.67155895,0.04906767,-0.5956993,-0.97003125,-0.90398929,-0.42755509,0.24298018,0.80320753,0.99879546,0.74095113,0.14673047,-0.51410274,-0.94154407],
[0.92387953,0.38268343,-0.38268343,-0.92387953,-0.92387953,-0.38268343,0.38268343,0.92387953,0.92387953,0.38268343,-0.38268343,-0.92387953,-0.92387953,-0.38268343,0.38268343,0.92387953,0.92387953,0.38268343,-0.38268343,-0.92387953,-0.92387953,-0.38268343,0.38268343,0.92387953,0.92387953,0.38268343,-0.38268343,-0.92387953,-0.92387953,-0.38268343,0.38268343,0.92387953],
[0.90398929,0.24298018,-0.5956993,-0.99879546,-0.67155895,0.14673047,0.85772861,0.94154407,0.33688985,-0.51410274,-0.98917651,-0.74095113,0.04906767,0.80320753,0.97003125,0.42755509,-0.42755509,-0.97003125,-0.80320753,-0.04906767,0.74095113,0.98917651,0.51410274,-0.33688985,-0.94154407,-0.85772861,-0.14673047,0.67155895,0.99879546,0.5956993,-0.24298018,-0.90398929],
[0.88192126,0.09801714,-0.77301045,-0.95694034,-0.29028468,0.63439328,0.99518473,0.47139674,-0.47139674,-0.99518473,-0.63439328,0.29028468,0.95694034,0.77301045,-0.09801714,-0.88192126,-0.88192126,-0.09801714,0.77301045,0.95694034,0.29028468,-0.63439328,-0.99518473,-0.47139674,0.47139674,0.99518473,0.63439328,-0.29028468,-0.95694034,-0.77301045,0.09801714,0.88192126]
];

You precalcule "manually" this array with the following, you don't need it after that, I would add it as a comment, in case somebody wants to calculate 16 or 64:

print '$dct_11[32]=['."\n";
for ($i = 0; $i < 11; $i++) {
    print ( $i ? ",\n" : '').'[';
    for ($j = 0; $j < 32; $j++) {
        print ( $j ? ',' : '').round(cos($i * M_PI * ($j + 0.5) / 32),$decimals=8);
    }
    print "]"; 
} 
print "\n];";

Then on the for loops, keep in mind I'm using imagick directly.

$matrix_size=11;

for ($y = 0; $y < $this->size; $y++) {
	for ($x = 0; $x < $this->size; $x++) {
		$ipixel = $imagick->getImagePixelColor($x, $y);
		$rgb = $ipixel->getColor(); 
		$row[$x] = (int) floor(($rgb['r'] * 0.299) + ($rgb['g'] * 0.587) + ($rgb['b'] * 0.114)); 
	}
	$rows[$y] = $this->calculateDCT($row,$matrix_size);
}

$row_matrix_size=$matrix_size;

for ($x = 0; $x < $matrix_size; $x++) {
	for ($y = 0; $y < $this->size; $y++) {
		$col[$y] = $rows[$y][$x];
	}
	$matrix[$x] = $this->calculateDCT($col,$row_matrix_size);
	$row_matrix_size--;
}

$pixels = diagonalMatrix($matrix,$matrix_size);

$pixels = array_slice($pixels,1,64); // discart first and cut to size

$compare = $this->average($pixels);

Then the two functions you need, the diagonalMatrix() is already pasted on a previous comment, and the calculateDCT():

    protected function calculateDCT($matrix, $partial_size)
    {
        $dct_cos = $this->dct_11[$this->size];
        $transformed = [];
        
        for ($i = 0; $i < $partial_size; $i++) {
            $sum = 0;
            for ($j = 0; $j < $this->size; $j++) { 
                $sum += $matrix[$j] * $dct_cos[$i][$j];
            }
            $sum *= $this->size_sqrt; 
            if ($i === 0) {
                $sum *= 0.70710678118655; 
            }
            $transformed[$i] = $sum;
        }
        return $transformed;
    }

@b1rdex b1rdex mentioned this issue Jul 9, 2020
3 tasks
@b1rdex
Copy link

b1rdex commented Jul 9, 2020

@smithtek please review #58 if you have time. I applied all your suggestions and made a separate implementation w/ them.
Also, I generated DCTs for 16x and 64x. Would be great if you can validate them because generator code was for 32x and I just replaced all the 32s in it.

@nitotm
Copy link
Author

nitotm commented Jul 9, 2020

@smithtek please review #58 if you have time. I applied all your suggestions and made a separate implementation w/ them.
Also, I generated DCTs for 16x and 64x. Would be great if you can validate them because generator code was for 32x and I just replaced all the 32s in it.

Generated DCTs looks good, I checked. I probably won't have time to test your implementation.

VincentChalnot pushed a commit to VincentChalnot/imagehash that referenced this issue Nov 22, 2020
…rdex

See issue jenssegers#52

Squashed commit of the following:

commit 39baec8
Author: Anatoly Pashin <[email protected]>
Date:   Thu Jul 9 12:45:26 2020 +1000

    Fix cs

commit aa31683
Author: Anatoly Pashin <[email protected]>
Date:   Thu Jul 9 12:45:17 2020 +1000

    Fix cs

commit df4555d
Author: Anatoly Pashin <[email protected]>
Date:   Thu Jul 9 12:40:40 2020 +1000

    Fix typo

commit b422ea4
Author: Anatoly Pashin <[email protected]>
Date:   Thu Jul 9 12:39:20 2020 +1000

    Add PerceptualHash2
@freearhey
Copy link

@smithtek Sorry, but why didn't you just create a pull request with all the changes and instead describe all your steps here?

@VincentChalnot
Copy link
Contributor

I'm currently testing the PR made by @b1rdex using the code from @smithtek, I will report any progress here.

@b1rdex
Copy link

b1rdex commented Nov 23, 2020

FYI, we use #58 in production since I created the PR for 2-5M images. The number is dependent on the currently active goods number available to buy. We haven't detected any issues with it.

@VincentChalnot
Copy link
Contributor

VincentChalnot commented Nov 23, 2020

#58 seems to work really well, it seems to find more meaningfull duplicates up until a distance of 10 and sometimes even more.
Also, I find much less WTF similarities, event at higher distances.

Take for example:
image
The two images are actually very similar.

Still, here's a funny duplicate:
image

At higher distances, it still finds some similar pictures but consistency tends to drop around 11~12:
image
For this example, I don't understand how the two images on the top can be separated by the same distance as the two images on the bottom...

I think #58 is really good as-is, no need for additional improvements.

Additional thoughts in the discussion concerning collisions here #27.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants