Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflict between ngram implementation and comments #1048

Open
Kyle-Kyle opened this issue Jul 28, 2021 · 10 comments
Open

Conflict between ngram implementation and comments #1048

Kyle-Kyle opened this issue Jul 28, 2021 · 10 comments
Assignees
Labels
on my TODO list :) on my TODO list :)

Comments

@Kyle-Kyle
Copy link
Contributor

It looks like there is some conflict between the implementation and the comments about ngram.
According to the comment:

      /* "For efficiency, we propose to hash the tuple as a key into the
         hit_count map as (prev_block_trans << 1) ^ curr_block_trans, where
         prev_block_trans = (block_trans_1 ^ ... ^ block_trans_(n-1)" */

It should be (prev_block_trans << 1) ^ curr_block_trans. But in the implementation, the << 1 part is missing.

In normal cases, PrevLoc stores previous curr_loc << 1 and PrevLocTrans = PrevLoc;, which means PrevLocTrans has << 1 already in edge coverage. But this is forgotten in ngram, making the algorithm prev_block_trans ^ curr_block_trans

@Kyle-Kyle
Copy link
Contributor Author

Also, I noticed that the ngram implementation is borrowed from here, which is also contradictory to the comment

@vanhauser-thc
Copy link
Member

@Kyle-Kyle yes true. I basically just ripped the code from that repo :)
I think the comment is the right way to do it, the code is not. what do you think?
and @adrianherrera and @hexcoder- and @andreafioraldi ?

@adrianherrera
Copy link
Contributor

Hi all,

Yes, I have noticed this contradiction also :)

As you noted, the implementation is largely taken from the qemu version implemented for the "be sensitive and collaborative" paper. And yes, this implementation does not strictly follow what is written in the paper (which is where my comment quotes from).

So I just picked the same implementation (because that's what I was evaluating at the time). Not sure what the correct answer is, or even if it matters.

@Kyle-Kyle
Copy link
Contributor Author

If the original implementation is like this and it has been evaluated, I think it is fine we keep it as it is.
We just need to change the comment for clarification.

@vanhauser-thc
Copy link
Member

I think I see an issue.
if prev_block_trans is a looping block, the block id stays the same and (depending on if ngram-3, ..-5, ...) will just xor with itself and potentially be 0.
IMHO the shift is necessary. what do you think?

@Kyle-Kyle
Copy link
Contributor Author

This issue persists even with the left shift because prev_block_trans is just a xor of the bounded history. And if the history consists of loop addresses, it will lead to only two results.
But yes, adding the left shift will definitely lessen the issue to some degree.
relevant code is here:

      if (ngram_size)
        PrevLocTrans =
            IRB.CreateZExt(IRB.CreateXorReduce(PrevLoc), IRB.getInt32Ty());

@vanhauser-thc
Copy link
Member

I think why it was not done like it was described in the paper is that left shift does not work. the map size is not necessarily 2^16, it can also be e.g. 2^15 if changed in config.h. a left shift + xor will result in values larger than the map and hence results in a crash. so it would need % MAP_SIZE but this would make the instrumentation slower.
so the best is to do a right shift instead, right?

@adrianherrera
Copy link
Contributor

Yeah what you say makes sense. It could also just be a mistake :)

I will contact the authors and see what their reason is, because I am very curious to know!

@domenukk
Copy link
Member

Any news on this? Can the issue be closed or will we change the impl?

abertschi pushed a commit to mattweingarten/AFLplusplus that referenced this issue Apr 21, 2022
* update commit ids

* new experiment, new variant

* new changes to cmplog and schedule

* update cmplog, try notrim

* cmplog combine variant

* fix wrong commit

* new cmplog experiment

* new commit ids

* remove cmplog debug output

* new experiment

* new cmplog variants, new experiment

* fix builder files

* default level for cmplog

* fix for older afl++

* fix lint

* one day I might learn python

* weizz does not support the new option

* final fixes

* fix merge conflict

* fix fuzzer.py

* fix presubmit hopefully

* remove libaflfuzzer

* resubmit failed experiment

Co-authored-by: Abhishek Arya <[email protected]>
@vanhauser-thc vanhauser-thc self-assigned this May 3, 2022
@vanhauser-thc vanhauser-thc added the on my TODO list :) on my TODO list :) label May 3, 2022
@vanhauser-thc
Copy link
Member

It looks like there is some conflict between the implementation and the comments about ngram. According to the comment:

      /* "For efficiency, we propose to hash the tuple as a key into the
         hit_count map as (prev_block_trans << 1) ^ curr_block_trans, where
         prev_block_trans = (block_trans_1 ^ ... ^ block_trans_(n-1)" */

It should be (prev_block_trans << 1) ^ curr_block_trans. But in the implementation, the << 1 part is missing.

In normal cases, PrevLoc stores previous curr_loc << 1 and PrevLocTrans = PrevLoc;, which means PrevLocTrans has << 1 already in edge coverage. But this is forgotten in ngram, making the algorithm prev_block_trans ^ curr_block_trans

currently looking at this and to me it looks like the << 1 is there, however when saving the prev_loc, not when using it?

      if (ngram_size) {

        Value *ShuffledPrevLoc = IRB.CreateShuffleVector(
            PrevLoc, UndefValue::get(PrevLocTy), PrevLocShuffleMask);
        Value *UpdatedPrevLoc = IRB.CreateInsertElement(
            ShuffledPrevLoc, IRB.CreateLShr(CurLoc, (uint64_t)1), (uint64_t)0);  // <= HERE is a left shift

        Store = IRB.CreateStore(UpdatedPrevLoc, AFLPrevLoc);
        Store->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));

      } else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on my TODO list :) on my TODO list :)
Projects
None yet
Development

No branches or pull requests

4 participants