Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dataflowengineoss] handle python imports in globalFromLiteral #4576

Merged
merged 2 commits into from
May 21, 2024

Conversation

xavierpinho
Copy link
Contributor

Noticed a regression after #4559, in which flows from imported members would not longer propagate, e.g.

from models import FOOBAR # defined as e.g. FOOBAR = "123" in `models.py`
print(FOOBAR)             # this FOOBAR (argument) would not be tainted by "123"

At play is the representation of import statements, which look like FOOBAR = import("models", "FOOBAR").

Prior to #4559, globalFromLiteral would yield FOOBAR in the aforementioned import statement, since it would recursively look for literals in assignments. That wasn't bullet-proof because then in x = foo(20) we would also yield x, leading to an extra (incorrect) data-flow from 20 to x (see #4549.)

In #4559, we changed that to only look for literals as the proper RHS of assignments, which then misses this particular representation of import statements.

So in this patch we extend globalFromLiteral to also account for literals found inside import statements.

@xavierpinho xavierpinho added bug Something isn't working python Relates to pysrc2cpg dataflow engine Relates to dataflowengineoss labels May 20, 2024
@xavierpinho xavierpinho force-pushed the xavierp/python/import-dataflow branch from d6e9e81 to 8c7f351 Compare May 20, 2024 14:52
Copy link
Collaborator

@DavidBakerEffendi DavidBakerEffendi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This was also picked up yesterday by others, good timing and nice tests

…ckage.scala


name -> nameExact

Co-authored-by: David Baker Effendi <[email protected]>
@xavierpinho xavierpinho merged commit 58c94fe into master May 21, 2024
5 checks passed
@xavierpinho xavierpinho deleted the xavierp/python/import-dataflow branch May 21, 2024 10:03
@DavidBakerEffendi
Copy link
Collaborator

@xavierpinho Seems this case is still failing: https://github.com/dbMundada/privado-python-test/blob/main/src/basic/tag-index-access-sources.py#L14

Likely, since these are represented by <operator>.dictLiteral, they are not conventional literals. Probably not a difficult fix but it's EOD for me at the time of this comment so won't look further into it immediately

@xavierpinho
Copy link
Contributor Author

xavierpinho commented May 21, 2024

@DavidBakerEffendi yeah... :( I stumbled in a similar issue and am currently investigating with the following sample:

x = foo("X")
def run():
  bar = {"x": x}
  print(10)

Here I'm querying for any literal that taints x (the one used in the dictionary). Currently this set is empty. The interesting bits so far are that:

  • if we elide the method run but keep its statements, we get results.
  • if we exclude the print(10) statement, we get results.
  • if we elide the foo call but keep the assignment to "X", we get results.

Thanks for pointing that sample. Will be useful to add a bunch more unit-tests around this stuff.

EDIT: A tentative fix could be to recover globalFromLiteral's original behaviour only for DdgGenerator, as the original issue with globalFromLiteral was when invoking reachableBy and computing the starting points....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dataflow engine Relates to dataflowengineoss python Relates to pysrc2cpg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants