Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out std string functions taking non-char character types #775

Closed
wants to merge 6 commits into from

Conversation

stbergmann
Copy link

This had caused an assert SIGABRT with LibreOffice (see https://git.libreoffice.org/core/+/e2c9ac71cec0f205b1d4864538e8158c22558296%5E%21 "ofz#30767 Build-Failure") at

#3 0x00007ffff7a07026 in GI___assert_fail (assertion=0x7ffff79c3b98 "isString() && "Not a string"", file=0x7ffff79c3800 "~/llvm/inst/include/llvm/IR/Constants.h", line=661, function=0x7ffff79c3bb5 "llvm::StringRef llvm::ConstantDataSequential::getAsString() const") at /usr/src/debug/glibc-2.32-37-g760e1d2878/assert/assert.c:101
#4 0x00007ffff79b9dd1 in llvm::ConstantDataSequential::getAsString (this=0xcb75f90) at ~/llvm/inst/include/llvm/IR/Constants.h:661
#5 0x00007ffff79b8645 in (anonymous namespace)::AFLdict2filePass::runOnModule (this=0xd175d50, M=...) at ~/AFLplusplus/instrumentation/afl-llvm-dict2file.so.cc:406
#6 0x000000000550fb63 in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=) at ~/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1550
#7 llvm::legacy::PassManagerImpl::run (this=0x9925a90, M=...) at ~/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541
#8 0x000000000550feb9 in llvm::legacy::PassManager::run (this=this@entry=0x7fffffff91c0, M=...) at ~/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1677
#9 0x000000000653efb3 in (anonymous namespace)::EmitAssemblyHelper::EmitAssembly (this=this@entry=0x7fffffff9670, Action=Action@entry=clang::Backend_EmitObj, OS=std::unique_ptrllvm::raw_pwrite_stream = {...}) at ~/llvm/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1015
#10 0x0000000006540856 in clang::EmitBackendOutput (Diags=..., HeaderOpts=..., CGOpts=..., TOpts=..., LOpts=..., TDesc=..., M=0x944b6f0, Action=, OS=...) at /usr/include/c++/10/bits/move.h:76
#11 0x000000000689383c in clang::BackendConsumer::HandleTranslationUnit (this=0x944a210, C=...) at ~/llvm/llvm-project/clang/include/clang/Basic/TargetInfo.h:1076
#12 0x00000000078fe1c9 in clang::ParseAST (S=..., PrintStats=, SkipFunctionBodies=) at ~/llvm/llvm-project/clang/lib/Parse/ParseAST.cpp:171
#13 0x00000000067b9729 in clang::FrontendAction::Execute (this=this@entry=0x941b1a0) at ~/llvm/llvm-project/clang/lib/Frontend/FrontendAction.cpp:949
#14 0x00000000066f6586 in clang::CompilerInstance::ExecuteAction (this=this@entry=0x940f390, Act=...) at ~/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:949
#15 0x000000000686ecfb in clang::ExecuteCompilerInvocation (Clang=Clang@entry=0x940f390) at /llvm/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:278
#16 0x00000000039f6f04 in cc1_main (Argv=..., Argv0=0x7fffffffcc0c "
/llvm/inst/bin/clang-13", MainAddr=MainAddr@entry=0x39f0a60 <GetExecutablePath[abi:cxx11](char const*, bool)>) at ~/llvm/llvm-project/clang/tools/driver/cc1_main.cpp:246
#17 0x00000000039f054d in ExecuteCC1Tool (ArgV=...) at ~/llvm/llvm-project/clang/tools/driver/driver.cpp:330
#18 0x00000000039f25c5 in main (argc
=, argc
@entry=145, argv_=, argv_@entry=0x7fffffffc3d8) at ~/llvm/llvm-project/clang/tools/driver/driver.cpp:407

when (in frame #5) FuncName is "_ZNKSt17basic_string_viewIDsSt11char_traitsIDsEE4findEPKDsm" (i.e.,

std::basic_string_view<char16_t, std::char_traits<char16_t> >::find(char16_t const*, unsigned long) const

) and thus isStdString is true.

vanhauser-thc and others added 6 commits February 24, 2021 12:02
This had caused an assert SIGABRT with LibreOffice (see
<https://git.libreoffice.org/core/+/
e2c9ac71cec0f205b1d4864538e8158c22558296%5E%21> "ofz#30767 Build-Failure") at

> AFLplusplus#3  0x00007ffff7a07026 in __GI___assert_fail (assertion=0x7ffff79c3b98 "isString() && \"Not a string\"", file=0x7ffff79c3800 "~/llvm/inst/include/llvm/IR/Constants.h", line=661, function=0x7ffff79c3bb5 "llvm::StringRef llvm::ConstantDataSequential::getAsString() const") at /usr/src/debug/glibc-2.32-37-g760e1d2878/assert/assert.c:101
> AFLplusplus#4  0x00007ffff79b9dd1 in llvm::ConstantDataSequential::getAsString (this=0xcb75f90) at ~/llvm/inst/include/llvm/IR/Constants.h:661
> AFLplusplus#5  0x00007ffff79b8645 in (anonymous namespace)::AFLdict2filePass::runOnModule (this=0xd175d50, M=...) at ~/AFLplusplus/instrumentation/afl-llvm-dict2file.so.cc:406
> AFLplusplus#6  0x000000000550fb63 in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=<optimized out>) at ~/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1550
> AFLplusplus#7  llvm::legacy::PassManagerImpl::run (this=0x9925a90, M=...) at ~/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541
> AFLplusplus#8  0x000000000550feb9 in llvm::legacy::PassManager::run (this=this@entry=0x7fffffff91c0, M=...) at ~/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1677
> AFLplusplus#9  0x000000000653efb3 in (anonymous namespace)::EmitAssemblyHelper::EmitAssembly (this=this@entry=0x7fffffff9670, Action=Action@entry=clang::Backend_EmitObj, OS=std::unique_ptr<llvm::raw_pwrite_stream> = {...}) at ~/llvm/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1015
> AFLplusplus#10 0x0000000006540856 in clang::EmitBackendOutput (Diags=..., HeaderOpts=..., CGOpts=..., TOpts=..., LOpts=..., TDesc=..., M=0x944b6f0, Action=<optimized out>, OS=...) at /usr/include/c++/10/bits/move.h:76
> AFLplusplus#11 0x000000000689383c in clang::BackendConsumer::HandleTranslationUnit (this=0x944a210, C=...) at ~/llvm/llvm-project/clang/include/clang/Basic/TargetInfo.h:1076
> AFLplusplus#12 0x00000000078fe1c9 in clang::ParseAST (S=..., PrintStats=<optimized out>, SkipFunctionBodies=<optimized out>) at ~/llvm/llvm-project/clang/lib/Parse/ParseAST.cpp:171
> AFLplusplus#13 0x00000000067b9729 in clang::FrontendAction::Execute (this=this@entry=0x941b1a0) at ~/llvm/llvm-project/clang/lib/Frontend/FrontendAction.cpp:949
> AFLplusplus#14 0x00000000066f6586 in clang::CompilerInstance::ExecuteAction (this=this@entry=0x940f390, Act=...) at ~/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:949
> AFLplusplus#15 0x000000000686ecfb in clang::ExecuteCompilerInvocation (Clang=Clang@entry=0x940f390) at ~/llvm/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:278
> AFLplusplus#16 0x00000000039f6f04 in cc1_main (Argv=..., Argv0=0x7fffffffcc0c "~/llvm/inst/bin/clang-13", MainAddr=MainAddr@entry=0x39f0a60 <GetExecutablePath[abi:cxx11](char const*, bool)>) at ~/llvm/llvm-project/clang/tools/driver/cc1_main.cpp:246
> AFLplusplus#17 0x00000000039f054d in ExecuteCC1Tool (ArgV=...) at ~/llvm/llvm-project/clang/tools/driver/driver.cpp:330
> AFLplusplus#18 0x00000000039f25c5 in main (argc_=<optimized out>, argc_@entry=145, argv_=<optimized out>, argv_@entry=0x7fffffffc3d8) at ~/llvm/llvm-project/clang/tools/driver/driver.cpp:407

when (in frame AFLplusplus#5) FuncName is
"_ZNKSt17basic_string_viewIDsSt11char_traitsIDsEE4findEPKDsm" (i.e.,

> std::basic_string_view<char16_t, std::char_traits<char16_t> >::find(char16_t const*, unsigned long) const

) and thus isStdString is true.
@vanhauser-thc vanhauser-thc changed the base branch from stable to dev March 2, 2021 09:22
@vanhauser-thc
Copy link
Member

Thanks for the notification!
Please note we only accept PRs to dev so I switched the base.

If I understand the issue correctly, the instrumentation stumbles over unicode?
then your fix would not be good, because those data bytes would be missed.
so instead of "continue" it needs different functionality to extract the data, hmmm.

(note to self: we have similar code in SanitizeCoverageLTO.cpp which also needs to be fixed.)

@stbergmann
Copy link
Author

If I understand the issue correctly, the instrumentation stumbles over unicode?

yes, see the "char16_t" in the demangled FuncName

@vanhauser-thc
Copy link
Member

I am deep in a different project, can you quickly help me with a piece of code that lets me replicate the issue?

#include <stdio.h>
#include <string>
#include <string_view>

int main() {
    const std::u16string_view deg = u"deg";
    const std::u16string_view grad = u"grad";
    const std::u16string_view rad = u"rad";
    std::u16string rString = u"grad";

    if (std::u16string_view::npos != rString.find(deg))
    {
        printf("x\n");
    }
    else if (std::u16string_view::npos != rString.find(grad))
    {
        printf("y\n");
    }
    else if (std::u16string_view::npos != rString.find(rad))
    {
        printf("z\n");
    }
    return 0;
}

tells me

test.c:6:16: error: ‘u16string_view’ in namespace ‘std’ does not name a type; did you mean ‘u16string’?
    6 |     const std::u16string_view deg = u"deg";

@stbergmann
Copy link
Author

test.c:6:16: error: ‘u16string_view’ in namespace ‘std’ does not name a type; did you mean ‘u16string’?
    6 |     const std::u16string_view deg = u"deg";

You presumably compile as C++14 or older, but std::u16string_view was only introduced in C++17.

@vanhauser-thc
Copy link
Member

@stbergmann
my example code does not crash and somehow seems to behave very different to yours. I blind-coded a fix, but sadly cannot test if it works (by not crashing and still obtaining the values).
Can you test it somehow? -> c269c39

@stbergmann
Copy link
Author

Can you test it somehow? -> c269c39

While that does not crash, I don't think it makes much sense: When run on LibreOffice's sax/source/tools/converter.cxx, it generates the same $AFL_LLVM_DICT2FILE output as my patch here, and that output does not contain any of the three relevant UTF-16 string literals u"deg", u"grad", u"rad". When

Str2 = Array->getRawDataValues().str();

(in AFLdict2filePass::runOnModule) now forces the content of such a UTF-16 string literal into a std::string, depending on target endianness, either Str2[0] or Str2[1] will be NUL, so the later

if (optLen < 3)  // too short? skip
  continue;

(also in AFLdict2filePass::runOnModule) will ignore it anyway.

@stbergmann
Copy link
Author

(Re "I don't think it makes much sense": Or maybe you're fine with that, producing endianness-dependent output for just those UTF-16 and UTF-32 string literals in the source that happen to not have a null byte among the leading bytes of their getRawDataValues() representation. I know too little what all this code is actually about.)

@vanhauser-thc
Copy link
Member

Can you test it somehow? -> c269c39

While that does not crash, I don't think it makes much sense: When run on LibreOffice's sax/source/tools/converter.cxx, it generates the same $AFL_LLVM_DICT2FILE output as my patch here, and that output does not contain any of the three relevant UTF-16 string literals u"deg", u"grad", u"rad". When

Str2 = Array->getRawDataValues().str();

(in AFLdict2filePass::runOnModule) now forces the content of such a UTF-16 string literal into a std::string, depending on target endianness, either Str2[0] or Str2[1] will be NUL, so the later

if (optLen < 3)  // too short? skip
  continue;

but this is not how optLen is generated:

          if (HasStr1)
            thestring = Str1;
          else
            thestring = Str2;

          optLen = thestring.length()

so 8 characters of unicode 16 = std::string with a length of 16 bytes, null bytes are just normal characters.

(Re "I don't think it makes much sense": Or maybe you're fine with that, producing endianness-dependent output for just those UTF-16 and UTF-32 string literals in the source that happen to not have a null byte among the leading bytes of their getRawDataValues() representation. I know too little what all this code is actually about.)

that is totally fine. you cannot run the instrumented binary on a different endian platform either :)

When I try to compile libreoffice/sax/source/tools/converter.cxx it does fail because an include is not found that does not exist in the checkout, I guess it is generated by the build system ...
sigh ... looks like I have to look further how to get those unicode strings ...

@stbergmann
Copy link
Author

but this is not how optLen is generated:

          if (HasStr1)
            thestring = Str1;
          else
            thestring = Str2;

          optLen = thestring.length()

so 8 characters of unicode 16 = std::string with a length of 16 bytes, null bytes are just normal characters.

From just having scanned through the code, I thought it was

// ensure we do not have garbage
size_t offset = thestring.find('\0', 0);
if (offset + 1 < optLen) optLen = offset + 1;
thestring = thestring.substr(0, optLen);

that would shrink both optLen and thestring to anything up to the first embedded null byte.

When I try to compile libreoffice/sax/source/tools/converter.cxx it does fail because an include is not found that does not exist in the checkout, I guess it is generated by the build system ...
sigh ... looks like I have to look further how to get those unicode strings ...

A very simple reproducer I came up with is

#include <string>
void f(std::u16string s) {
    s.find(u"foobar");
    s.find(u"\u0123\u4567\u89AB\uCDEF");
}

With your patch, it drops u"foobar" (see above) but includes something like "#\x01gE\xab\x89\xef\xcd\x00" for u"\u0123\u4567\u89AB\uCDEF" in the output.

@vanhauser-thc
Copy link
Member

From just having scanned through the code, I thought it was

// ensure we do not have garbage
size_t offset = thestring.find('\0', 0);
if (offset + 1 < optLen) optLen = offset + 1;
thestring = thestring.substr(0, optLen);

that would shrink both optLen and thestring to anything up to the first embedded null byte.

that what happens when you try to fix code while in deep in something else :) good catch.
I fixed this now all works perfectly, thank you!
will push the new commit id to oss-fuzz now.

A very simple reproducer I came up with is

#include <string>
void f(std::u16string s) {
    s.find(u"foobar");
    s.find(u"\u0123\u4567\u89AB\uCDEF");
}

that helped a lot!

With your patch, it drops u"foobar" (see above) but includes something like "#\x01gE\xab\x89\xef\xcd\x00" for u"\u0123\u4567\u89AB\uCDEF" in the output.

"\u0123\u4567\u89AB\uCDEF" => "#\x01gE\xab\x89\xef\xcd\x00"
is correct, only binary characters are encoded. e.g. \u0123 is \x23 \x01 and \x23 is #. \u4567 becomes gE etc.

@stbergmann
Copy link
Author

I fixed this now all works perfectly, thank you!
will push the new commit id to oss-fuzz now.

So lets abandon this one then. Thanks for your quick reaction!

@stbergmann stbergmann closed this Mar 2, 2021
abertschi pushed a commit to mattweingarten/AFLplusplus that referenced this pull request Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants