Conflict between ngram implementation and comments #1048

Kyle-Kyle · 2021-07-28T14:29:26Z

It looks like there is some conflict between the implementation and the comments about ngram.
According to the comment:

      /* "For efficiency, we propose to hash the tuple as a key into the
         hit_count map as (prev_block_trans << 1) ^ curr_block_trans, where
         prev_block_trans = (block_trans_1 ^ ... ^ block_trans_(n-1)" */

It should be (prev_block_trans << 1) ^ curr_block_trans. But in the implementation, the << 1 part is missing.

In normal cases, PrevLoc stores previous curr_loc << 1 and PrevLocTrans = PrevLoc;, which means PrevLocTrans has << 1 already in edge coverage. But this is forgotten in ngram, making the algorithm prev_block_trans ^ curr_block_trans

The text was updated successfully, but these errors were encountered:

Kyle-Kyle · 2021-07-28T14:31:09Z

Also, I noticed that the ngram implementation is borrowed from here, which is also contradictory to the comment

vanhauser-thc · 2021-07-29T07:35:46Z

@Kyle-Kyle yes true. I basically just ripped the code from that repo :)
I think the comment is the right way to do it, the code is not. what do you think?
and @adrianherrera and @hexcoder- and @andreafioraldi ?

adrianherrera · 2021-07-30T02:45:50Z

Hi all,

Yes, I have noticed this contradiction also :)

As you noted, the implementation is largely taken from the qemu version implemented for the "be sensitive and collaborative" paper. And yes, this implementation does not strictly follow what is written in the paper (which is where my comment quotes from).

So I just picked the same implementation (because that's what I was evaluating at the time). Not sure what the correct answer is, or even if it matters.

Kyle-Kyle · 2021-07-30T02:53:57Z

If the original implementation is like this and it has been evaluated, I think it is fine we keep it as it is.
We just need to change the comment for clarification.

vanhauser-thc · 2021-07-30T03:22:37Z

I think I see an issue.
if prev_block_trans is a looping block, the block id stays the same and (depending on if ngram-3, ..-5, ...) will just xor with itself and potentially be 0.
IMHO the shift is necessary. what do you think?

Kyle-Kyle · 2021-07-30T03:28:01Z

This issue persists even with the left shift because prev_block_trans is just a xor of the bounded history. And if the history consists of loop addresses, it will lead to only two results.
But yes, adding the left shift will definitely lessen the issue to some degree.
relevant code is here:

      if (ngram_size)
        PrevLocTrans =
            IRB.CreateZExt(IRB.CreateXorReduce(PrevLoc), IRB.getInt32Ty());

vanhauser-thc · 2021-07-30T04:01:52Z

I think why it was not done like it was described in the paper is that left shift does not work. the map size is not necessarily 2^16, it can also be e.g. 2^15 if changed in config.h. a left shift + xor will result in values larger than the map and hence results in a crash. so it would need % MAP_SIZE but this would make the instrumentation slower.
so the best is to do a right shift instead, right?

adrianherrera · 2021-08-01T20:59:24Z

Yeah what you say makes sense. It could also just be a mistake :)

I will contact the authors and see what their reason is, because I am very curious to know!

domenukk · 2022-03-13T00:47:22Z

Any news on this? Can the issue be closed or will we change the impl?

* update commit ids * new experiment, new variant * new changes to cmplog and schedule * update cmplog, try notrim * cmplog combine variant * fix wrong commit * new cmplog experiment * new commit ids * remove cmplog debug output * new experiment * new cmplog variants, new experiment * fix builder files * default level for cmplog * fix for older afl++ * fix lint * one day I might learn python * weizz does not support the new option * final fixes * fix merge conflict * fix fuzzer.py * fix presubmit hopefully * remove libaflfuzzer * resubmit failed experiment Co-authored-by: Abhishek Arya <[email protected]>

vanhauser-thc · 2022-05-08T12:55:27Z

It looks like there is some conflict between the implementation and the comments about ngram. According to the comment:
      /* "For efficiency, we propose to hash the tuple as a key into the
         hit_count map as (prev_block_trans << 1) ^ curr_block_trans, where
         prev_block_trans = (block_trans_1 ^ ... ^ block_trans_(n-1)" */
It should be (prev_block_trans << 1) ^ curr_block_trans. But in the implementation, the << 1 part is missing.

In normal cases, PrevLoc stores previous curr_loc << 1 and PrevLocTrans = PrevLoc;, which means PrevLocTrans has << 1 already in edge coverage. But this is forgotten in ngram, making the algorithm prev_block_trans ^ curr_block_trans

currently looking at this and to me it looks like the << 1 is there, however when saving the prev_loc, not when using it?

      if (ngram_size) {

        Value *ShuffledPrevLoc = IRB.CreateShuffleVector(
            PrevLoc, UndefValue::get(PrevLocTy), PrevLocShuffleMask);
        Value *UpdatedPrevLoc = IRB.CreateInsertElement(
            ShuffledPrevLoc, IRB.CreateLShr(CurLoc, (uint64_t)1), (uint64_t)0);  // <= HERE is a left shift

        Store = IRB.CreateStore(UpdatedPrevLoc, AFLPrevLoc);
        Store->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));

      } else

adrianherrera mentioned this issue Aug 2, 2021

n-gram coverage discrepancy bitsecurerlab/afl-sensitive#4

Open

vanhauser-thc self-assigned this May 3, 2022

vanhauser-thc added the on my TODO list :) on my TODO list :) label May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conflict between ngram implementation and comments #1048

Conflict between ngram implementation and comments #1048

Kyle-Kyle commented Jul 28, 2021

Kyle-Kyle commented Jul 28, 2021

vanhauser-thc commented Jul 29, 2021

adrianherrera commented Jul 30, 2021

Kyle-Kyle commented Jul 30, 2021

vanhauser-thc commented Jul 30, 2021

Kyle-Kyle commented Jul 30, 2021

vanhauser-thc commented Jul 30, 2021

adrianherrera commented Aug 1, 2021

domenukk commented Mar 13, 2022

vanhauser-thc commented May 8, 2022

Conflict between ngram implementation and comments #1048

Conflict between ngram implementation and comments #1048

Comments

Kyle-Kyle commented Jul 28, 2021

Kyle-Kyle commented Jul 28, 2021

vanhauser-thc commented Jul 29, 2021

adrianherrera commented Jul 30, 2021

Kyle-Kyle commented Jul 30, 2021

vanhauser-thc commented Jul 30, 2021

Kyle-Kyle commented Jul 30, 2021

vanhauser-thc commented Jul 30, 2021

adrianherrera commented Aug 1, 2021

domenukk commented Mar 13, 2022

vanhauser-thc commented May 8, 2022