Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring (step up module architecture, establish central location for common functionality, cleanup linux analyzer) #58

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

luis261
Copy link
Contributor

@luis261 luis261 commented May 4, 2024

Module excecution improvements (for now only applied to the linux analyzer, other modules will be modified in dedicated PRs):

  • the simplest modules, which don't need any upfront input can simply be run via import
  • more involved modules shoud provide the following definitions:
    • run: a function that takes in all required arguments as parameters, which would otherwise be provided via sys.argv if running the module directly in a shell (instead of inside Python via import)
    • main: a function which wraps around run, executing it with sys.argv input
    • an if statement that runs the main function if __name__ == "__main__"
  • in qu1cksc0pe.py, we can then import modules as required without the need for subshells/os commands (if they need input, we pass it via <modulename>.run)
  • this architecture let's us avoid the need for workarounds to pass inputs to analyzers, e.g. via the .path_handler tempfile (instead, we just pass any such needed information to a given module via a parameter of its local run function)

Note

changes below were already communicated/agreed upon previously:

Conversion of Modules directory into a Python package

  • this is worth it I think alone for sharing functionality (see below), but there is more to it than just that:

Having had enough experience myself with Python's quirky import system, I think I have some understanding for why it might be preferable to avoid dealing with it in some circumstances. Can you tell me the specific reasons though why you execute modules via os.system instead of staying inside a single Python interpreter instance if there are any?

If you're ok with it, I'll slowly move us towards staying inside a single Python instance by leveraging import statements instead of invoking modules directly via the operating system, since I think the advantages outweigh/justify using the established import system:

  • performance (launching a subshell via os.system every time we execute a module is more expensive in terms of overhead instead of simply importing into the current Python thread)
  • security (less risk of accidentally exposing/running different files on the system that aren't part of the program)

ideally we would rename the directory to be lowercase as part of this change to conform to Python module naming conventions .. I held off on it for now, since it's not strictly necessary and I'm unsure of the implications in terms of the other locations we'd have to adjust

Extraction of utility function previously strictly local to main module into separate module

  • might not seem sensible for now given that it only houses a single, tiny function
  • I'm sure it'll pay off in the long haul/soon though
    • can continue extracting duplicate functionality there, to be reused across different modules

@luis261
Copy link
Contributor Author

luis261 commented May 4, 2024

Hello @CYB3RMX, I just need a yes/no from you for now on whether you're ok with my current trajectory so I can keep the changes coming. If not, let me know and I'll try to adjust.

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 4, 2024

Hello @luis261 !

Thank you for your work and support. I think this architecture will make the project's codebase more understandable and maintainable. I am also considering adjusting the codebase to make packaging this project easier for future processes.

@luis261
Copy link
Contributor Author

luis261 commented May 4, 2024

Hi @CYB3RMX (:

I'm excited to hear that! 🚀

I'll continue my work tomorrow and probably ping you in a few days when I've built up some more substantial changes (having the ability to import shared functionality definitely enables me to do that)

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 6, 2024

Hello again @luis261 !

When I was checking your changes, the code threw these errors.

image

Could you please also try to run program after your changes?

@luis261
Copy link
Contributor Author

luis261 commented May 6, 2024

Hi @CYB3RMX,

thank your for making the effort and trying to test the changes already!
Yes, you make a good point, there are still importing issues, as you discovered, this isn't ready to be merged yet.

Let me work on some other changes first, I already know the cause of the issue you're talking about.
That's what I meant with Python's quirky import system, it can be quite annoying to get it setup right.
(but once you do, it's worth it)

This issue is related to us running the modules via os.system at the moment instead of actual import's.
I'll get back to you once I've worked on some other, related changes in a few days. I estimate to get this in a state ready to be merged towards end of this week. I'll convert the PR from draft mode once it's ready (:

@luis261
Copy link
Contributor Author

luis261 commented May 8, 2024

I'm currently working on refactoring the linux analyzer as part of this overall change.

Just a small question only indirectly related to the refactoring: why do you consider it "bad" to have an NX bit?
I'm assuming you do because it is printed as red if present. Is that intentional?

nxstr = "[bold red]True"

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 9, 2024

I added this feature to check if the target sample has binary protections such as NX or PIE.

@luis261
Copy link
Contributor Author

luis261 commented May 9, 2024

Yes, I understand the point of the feature, I just don't get why we currently print "True" in bold red if NX is present, since it's a positive feature which increases security? Shouldn't it be green? I guess it's not that important, I'll address that point in a separate PR after the refactoring

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 9, 2024

I thought that if the malware sample has some binary protections, it wouldn't be good for us. By the way, did you run the program after your changes?

@luis261
Copy link
Contributor Author

luis261 commented May 9, 2024

The way I understood, having an NX bit is probably good overall and shouldn't affect the binary instrumentation process? https://ctf101.org/binary-exploitation/no-execute/

But I'm not sure, you know more about analysis than me and I guess it makes sense for it to remain as is.

Regarding the state of the program: yes, I did run it, although I don't have a great testing setup for Linux right now. I'm still working on refactoring the linux analyzer overall, to show you an example of what a refactored module could look like. I think I'll get that done today/tomorrow. After that, I'll fix the overall import situation (a simple, albeit annoying fix, because I'll need to modify the line in every module again)

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 9, 2024

The error is similar as the previous one. But dont worry we can fix this anyway.

image

@luis261
Copy link
Contributor Author

luis261 commented May 9, 2024

Yes, I know. I appreciate you trying to run it already, but currently there is no sense in trying that though. This branch will still be unstable until at least tomorrow. I will ping you though once it's ready ok?

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 9, 2024

Okay no problem. Thank you :)

@luis261
Copy link
Contributor Author

luis261 commented May 9, 2024

You're welcome, I like working on this project (:

Thank you for your patience and ongoing support!

@luis261
Copy link
Contributor Author

luis261 commented May 9, 2024

Just pushed a few more commits. Still not ready.

Making some progress though, the linux analyzer specific part of the refactoring is almost done now.

@luis261
Copy link
Contributor Author

luis261 commented May 9, 2024

In case you're reading this soon: please continue to ignore my changes for now. But it'll be ready soon, I'll probably keep working and move this out of the draft state before sunrise (we're in a similar timezone). Until then, I still have to:

  • finish cleaning up/fixing post-refactoring for the linux analyzer
  • take care of the import issue
  • explain considerations regarding the import setup
  • create a small summary/writeup for the refactoring so it's easier for you to review

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 24, 2024

Hello @luis261 !

I checked your changes and tried to perform a scan against a simple Linux sample. Everything worked properly in this case. After that, I installed Qu1cksc0pe on my system using the --install argument and tried to analyze the target sample again. After this, the program threw this error:

image

I think we should also do something for the utility/helper modules. We can import them from the sc0pe_base folder, I guess.

@luis261
Copy link
Contributor Author

luis261 commented May 24, 2024

Hello @CYB3RMX (:

thank you for your comprehensive testing, good finding, I'll look into it right away!

@luis261
Copy link
Contributor Author

luis261 commented May 24, 2024

@CYB3RMX

ok, so at first glance, I don't think I understand the installation process.

you copy the qu1cksc0pe.py file to /usr/bin/qu1cksc0pe here:

cd $sc0pe_path && cp qu1cksc0pe.py /usr/bin/qu1cksc0pe && chmod +x /usr/bin/qu1cksc0pe

but the Modules folder itself is never moved/copied right? So how can execution from the entrypoint in usr/bin ever succeed when modules are needed for carrying out analysis?

@luis261
Copy link
Contributor Author

luis261 commented May 24, 2024

Oh ok I understand now, it works via a config file:

if os.path.exists("/usr/bin/qu1cksc0pe") == True and os.path.exists(f"/etc/qu1cksc0pe.conf") == True:

That makes things a lot more complicated import-wise. I'll have to see if I can work something out this evening. But I don't think I can make it work without also copying the Modules directory over to /usr/bin. We'll see I guess, I might have some questions for you later.

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 24, 2024

Okay man :) Feel free to ask

@luis261
Copy link
Contributor Author

luis261 commented May 24, 2024

Thank you mate, I'll try to figure sth out once I'm off the clock and then present you with a list of options tonight/tomorrow (:

@luis261
Copy link
Contributor Author

luis261 commented May 25, 2024

@CYB3RMX

ok, so here's my current understanding of the install mechanism (Linux only, not taking Docker into account):

  • initially, the project obviously just sits at whatever location in the local filesystem which the installing user chose to clone it to
  • the user is expected to run setup.sh for proper setup
    • this bash script installs various requirements; Jadx among other things, which is downloaded to /home/$USER/sc0pe_Base/jadx in particular
  • when executing the cloned qu1cksc0pe.py file with the --install switch, the installer.sh script under ./Modules/ is executed
    • qu1cksc0pe.py passes the location of the cloned repo to the bash script
    • the installer script creates a config file /etc/qu1cksc0pe.conf, containing a value that references a directory of location /opt/Qu1cksc0pe
    • using the passed path, the cloned files are then copied to /opt/Qu1cksc0pe
    • a copy of the "main" Python entrypoint is finally written to /usr/bin/qu1cksc0pe and marked as executable, owned by the current user

This means for a full Qu1cksc0pe installation running on Linux, the intended entrypoint is /usr/bin/qu1cksc0pe, as seen above in your screenshot. On execution, it looks for the config file under /etc and if present, utilizes the referenced path to execute modules as needed, lying under /opt/Qu1cksc0pe/Modules/. If no such config file is present, this indicates no proper install was carried out. As such, the entrypoint just runs against the current working directory and assumes the modules to execute are located there (this works for running directly in the cloned location or running the copy under /opt/Qu1cksc0pe/).

The problem here is that the entrypoint under /usr/bin/ can't find the Modules package, which sits at a different location(s).

Based on this understanding, here are the options, odered by rising complexity/disruptiveness to the current way of doing things:

  1. since we're currently using "module-unaware" imports inside the Python modules anyway, we could just permanently stick to that way of importing. I'd have to remove the linAnalyzer import though and replace it with an execution via shell again. I'd also have to provide local copies of required utilities for the entrypoint (the analyzer modules can still share functionality via the locally importable module though). This would still be manageable, since the entrypoint is only using utils.err_exit, which could easily be defined inline instead.
  • advantages: no adjustments to the current way of doing things required
  • disadvantages: we'll be stuck on executing the Python modules in subshells, can never import them from the entrypoint directly; as such the entrypoint itself also won't have access to shared functionality defined in /opt/Qu1cksc0pe/Modules/utils.py
  1. sys path hack: in the entrypoint, I could append the known location of the Modules package under /opt/Qu1cksc0pe/ to sys.path. That way, Python will be able to find the Module package when we try to import it (the current working directory is part of sys.path anyway, so the other cases are taken care off by default anyway).
  • advantages: targeted modification at arguably the right location/source of the issue, without affecting the rest of the project and avoiding the need for further workarounds
  • disadvantages: I consider this to be an OK solution; however, it's not considered clean from a Python perspective
  1. place a simple bash entrypoint under /usr/bin/, which simply redirects to the actual location under /opt/, or falls back on the clone location
  • advantages: pragmatic solution, avoids duplication
  • disadvantages: unsure about the details of this, might introduce problems for passing around the file to analyze?
  1. copy all project files to /usr/bin/
  • advantages: pragmatic solution, avoids duplicated entrypoints
  • disadvantages: idk, not the proper UNIX way of doing things? might not work?

I suggest we go with option 2, but let me know what you think please. Then I can implement the chosen option (:

@luis261
Copy link
Contributor Author

luis261 commented May 25, 2024

@CYB3RMX

actually I took the liberty of just implementing the 2nd option already, that way you can see what I mean and if you're ok with the solution we can finally get this done (:

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 25, 2024

Exactly. The second option seems more logical. By the way, while adding updates, make sure to consider the --install part as well, my friend :)

@luis261
Copy link
Contributor Author

luis261 commented May 25, 2024

Awesome, thanks for the swift feedback 🤗

yes, I'll keep this projects ways of performing installation in mind for future changes.
Regarding this specific PR though, we should be fine now when it comes to the impact on the installation process right?

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 25, 2024

Absolutely. But our priority should always be ensuring that the program runs smoothly. This way, people using the program won't experience any issues :)

@luis261
Copy link
Contributor Author

luis261 commented May 25, 2024

That's true for sure, but I'm not sure I get what you're trying to say: do you mean just in general or is there something else I need to consider regarding this specific PR before we can hopefully get it merged?

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 25, 2024

What I mean is that we can approve the PR once we are sure everything is working correctly.

@luis261
Copy link
Contributor Author

luis261 commented May 25, 2024

That makes sense (:

Do you think you could run it again with the --install option for me, like you did above? Does not have to be now/today if you don't have time.

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 25, 2024

Alright. Send your changes so we can test them. If there are no problems, we'll complete the PR :)

@luis261
Copy link
Contributor Author

luis261 commented May 25, 2024

Awesome, thank you so much 🙏

I already pushed and committed ba52c65bf9e6a2d1fd2ab56e6457437b57f0267e this morning. That small change already constitutes the implementation of option 2, thus hopefully solving our problem (:

@luis261 luis261 changed the title WIP: refactoring Refactoring (step up module architecture, establish central location for common functionality, cleanup for linux analyzer) May 25, 2024
@luis261 luis261 changed the title Refactoring (step up module architecture, establish central location for common functionality, cleanup for linux analyzer) Refactoring (step up module architecture, establish central location for common functionality, cleanup linux analyzer) May 25, 2024
@CYB3RMX
Copy link
Owner

CYB3RMX commented May 27, 2024

Hello @luis261 !

I checked your code, and it seems to be working properly (LinuxAnalyzer module). I also need to check the other modules.

Note: I checked the --console mode. When we try to analyze a simple sample, it throws this error:
image

That one is for the Windows side:
image

I think we should add a default value for the report section

@luis261
Copy link
Contributor Author

luis261 commented May 27, 2024

@CYB3RMX thank you so much for your effort and attention to detail! Good to know the current status, I'm glad to hear that at least the actual analysis internals of the linux analyzer are stable again (:

As for the report issue when executing the module directly: yes, that's on me, careless mistake to access that index without a prior check, here's the situation:

  • there already is a default value for the actual functionality in the analyzer class' analyze method signature
  • it's just failing right now because I'm accessing in an unsafe manner, thus causing the IndexError
  • I've just pushed an update that includes a length check, this should fix it (780bbe3)
def analyze(self, indicators_by_category, emit_report=False):
    ...

@luis261
Copy link
Contributor Author

luis261 commented May 27, 2024

I'm a bit disappointed (in myself) regarding the integration changes and delay this refactoring is causing.
Thank you for your patience though. Going forwards, I'll be able to avoid changes across such a wide breadth of modules to circumvent the need for such involved testing (across different modules, different ways of execution etc).

This means the risk for upcoming changes will be more predictable/of smaller impact or scope. For now, there was no way to address some points of improvement without affecting all modules though.

@luis261

This comment was marked as outdated.

@luis261

This comment was marked as outdated.

@luis261

This comment was marked as outdated.

@luis261
Copy link
Contributor Author

luis261 commented May 27, 2024

@CYB3RMX

Sorry about the earlier information overload, I investigated this further in the meantime, here are the facts, as compact as I can manage:

  • when filtering out any occurences related to "file", only 14 occurences of raw sys.argv indexing remain
  • I analyzed the context for each of these code snippets: it's not problematic in these cases because all of these analyzers only get called from qu1cksc0pe.py in a manner where all expected values are present (so sys.argv is long enough => no IndexError)

Overall, we should be fine now. Again though, these IndexErrors weren't introduced by my refactoring, as evidenced by: https://github.com/CYB3RMX/Qu1cksc0pe/blob/master/Modules/winAnalyzer.py#L643 (main branch)
(so the errors should also be reproducible on main at the current version: e11dfe4)

Qu1cksc0pe>git grep "sys.argv\[" | python txtfilter.py -x Modules/utils.py file get_argv
Modules/VTwrapper.py: apikey = str(sys.argv[1])
Modules/andro_familydetect.py:targetApk = sys.argv[1]
Modules/apkAnalyzer.py:targetAPK = sys.argv[1]
Modules/apkAnalyzer.py: if sys.argv[3] == "JAR":
Modules/apkAnalyzer.py: if sys.argv[3] == "DEX":
Modules/apkAnalyzer.py: if sys.argv[2] == "True":
Modules/email_analyzer.py:target_eml = sys.argv[1]
Modules/hashScanner.py: if str(sys.argv[1]) == '--db_update':
Modules/hashScanner.py: if str(sys.argv[2]) == '--normal':
Modules/packerAnalyzer.py: if str(sys.argv[1]) == '--single':
Modules/packerAnalyzer.py: elif str(sys.argv[1]) == '--multiscan':
Modules/pcap_analyzer.py:target_pcap = sys.argv[1]
Modules/powershell_analyzer.py:target_pwsh = sys.argv[1]
Modules/windows_dynamic_analyzer.py:target_pid = int(sys.argv[1])

@CYB3RMX
Copy link
Owner

CYB3RMX commented May 27, 2024

No problem, man. I just meant to say let's come up with a solution for this too.

@luis261
Copy link
Contributor Author

luis261 commented May 27, 2024

Yes, thank you my friend (:

My most recent changes should fix the issues you found 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants