Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package_managers.py uses ET.ElementTree against untrusted data #96

Open
Hritik14 opened this issue Jul 28, 2021 · 5 comments
Open

package_managers.py uses ET.ElementTree against untrusted data #96

Hritik14 opened this issue Jul 28, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@Hritik14
Copy link

The python docs mention:

Warning

The xml.etree.ElementTree module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities.

This is used in line 265 here:
https://github.com/nexB/vulnerablecode/blob/369897fb947584e44581df075c6e76638737f2ca/vulnerabilities/package_managers.py#L250-L266

The docs further suggest to use defusedxml instead:

defusedxml is a pure Python package with modified subclasses of all stdlib XML parsers that prevent any potentially malicious operation. Use of this package is recommended for any server code that parses untrusted XML data. The package also ships with example exploits and extended documentation on more XML exploits such as XPath injection.

@Hritik14
Copy link
Author

Just realized a lot of importers also use ElementTree. Imo, all of them need to be refactored, importer data is certainly not trusted data.

Here's quick grep of the importers using ElementTree
vulnerabilities/data_source.py:import xml.etree.ElementTree as ET
vulnerabilities/data_source.py:    def _fetch(self) -> Tuple[Mapping, Iterable[ET.ElementTree]]:
vulnerabilities/data_source.py:        Return a two-tuple of ({mapping of Package URL data}, it's ET.ElementTree)
vulnerabilities/data_source.py:    def get_data_from_xml_doc(self, xml_doc: ET.ElementTree, pkg_metadata={}) -> List[Advisory]:
vulnerabilities/data_source.py:        OVAL xml ElementTree into a list of `Advisory`.
vulnerabilities/importers/debian_oval.py:import xml.etree.ElementTree as ET
vulnerabilities/importers/debian_oval.py:                ET.ElementTree(ET.fromstring(resp.decode("utf-8"))),
vulnerabilities/importers/gentoo.py:import xml.etree.ElementTree as ET
vulnerabilities/importers/openssl.py:import xml.etree.ElementTree as ET
vulnerabilities/importers/ubuntu.py:import xml.etree.ElementTree as ET
vulnerabilities/importers/ubuntu.py:                ET.ElementTree(ET.fromstring(extracted.decode("utf-8"))),
vulnerabilities/lib_oval.py:    >>> tree = ElementTree()
vulnerabilities/lib_oval.py:    >>> tree = ElementTree()    
vulnerabilities/lib_oval.py:from xml.etree import ElementTree
vulnerabilities/lib_oval.py:from xml.etree.ElementTree import Element
vulnerabilities/lib_oval.py:        #         if not tree or not isinstance(tree, ElementTree):
vulnerabilities/lib_oval.py:            self.tree = ElementTree.ElementTree(root)
vulnerabilities/lib_oval.py:        Load an OVAL document from a filename and parse that into an ElementTree
vulnerabilities/lib_oval.py:                self.tree = ElementTree.parse(filename)
vulnerabilities/lib_oval.py:        Initializes the ElementTree by parsing the xmltext as XML
vulnerabilities/lib_oval.py:                root = ElementTree.fromstring(xmltext)
vulnerabilities/lib_oval.py:                self.tree = ElementTree(root)
vulnerabilities/lib_oval.py:        return ElementTree.tostring(root, "UTF-8", "xml").decode("utf-8")
vulnerabilities/lib_oval.py:        in the OVAL ElementTree.
vulnerabilities/lib_oval.py:        or None if there is no ElementTree or if a matching item could not be found
vulnerabilities/lib_oval.py:        Adds the element to the ElementTree for this OVAL document
vulnerabilities/lib_oval.py:        if the ElementTree does not already contain it.
vulnerabilities/lib_oval.py:        Get the raw xml.etree.ElementTree.Element for this node.  Can be used to directly manipulate the
vulnerabilities/lib_oval.py:            xml.etree.ElementTree.register_namespace("", namespace)
vulnerabilities/lib_oval.py:            xml.etree.ElementTree.register_namespace(
vulnerabilities/lib_oval.py:            xml.etree.ElementTree.register_namespace(
vulnerabilities/lib_oval.py:            xml.etree.ElementTree.register_namespace(
vulnerabilities/lib_oval.py:            # Create a new ElementTree with this element as the root
vulnerabilities/lib_oval.py:            tree = ElementTree(e)
vulnerabilities/lib_oval.py:            tree = ElementTree.ElementTree()
vulnerabilities/oval_parser.py:import xml.etree.ElementTree as ET
vulnerabilities/oval_parser.py:    def __init__(self, translations: Dict, oval_document: ET.ElementTree):
vulnerabilities/package_managers.py:import xml.etree.ElementTree as ET
vulnerabilities/package_managers.py:        xml_resp = ET.ElementTree(ET.fromstring(resp.decode("utf-8")))
vulnerabilities/package_managers.py:    def extract_versions(xml_response: ET.ElementTree) -> Set[str]:
vulnerabilities/tests/test_data_source.py:import xml.etree.ElementTree as ET
vulnerabilities/tests/test_debian_oval.py:import xml.etree.ElementTree as ET
vulnerabilities/tests/test_gentoo.py:import xml.etree.ElementTree as ET
vulnerabilities/tests/test_package_managers.py:import xml.etree.ElementTree as ET
vulnerabilities/tests/test_suse.py:import xml.etree.ElementTree as ET
vulnerabilities/tests/test_ubuntu.py:import xml.etree.ElementTree as ET

@pombredanne
Copy link
Contributor

This makes 100% sense! good catch

@pombredanne
Copy link
Contributor

@Hritik14 do you mind to check (may with quick search) if there are other repos impacted?

@Hritik14
Copy link
Author

@pombredanne I ran a quick test. Potentially following repos are affected:

  • fetchcode
  • pymaven
  • scancode-toolkit
  • vulnerablecode
Here's quick grep of the codebases using ElementTree
./fetchcode/src/fetchcode/vcs/pip/_internal/index/collector.py:    import xml.etree.ElementTree
./fetchcode/src/fetchcode/vcs/pip/_internal/index/collector.py:    HTMLElement = xml.etree.ElementTree.Element
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/__init__.py:          elementtree-like interface (known to work with ElementTree,
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/__init__.py:          cElementTree and lxml.etree).
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/__init__.py:        xml.etree.ElementTree or cElementTree (Currently applies to the "etree"
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/etree.py:def getETreeBuilder(ElementTreeImplementation):
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/etree.py:    ElementTree = ElementTreeImplementation
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/etree.py:    ElementTreeCommentType = ElementTree.Comment("asd").tag
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/etree.py:        """Given the particular ElementTree representation, this implementation,
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/etree.py:            elif node.tag == ElementTreeCommentType:
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree_lxml.py:        # Create the root document and add the ElementTree to it
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/__init__.py:          ElementTree-like interface, defaulting to xml.etree.cElementTree if
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/__init__.py:          available and xml.etree.ElementTree if not.
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/__init__.py:        types). A module implementing the tree type e.g. xml.etree.ElementTree
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/__init__.py:        or xml.etree.cElementTree.
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:def getETreeBuilder(ElementTreeImplementation, fullTree=False):
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:    ElementTree = ElementTreeImplementation
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:    ElementTreeCommentType = ElementTree.Comment("asd").tag
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:            self._element = ElementTree.Element(self._getETreeTag(name,
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:            self._element = ElementTree.Comment(data)
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:            elif element.tag == ElementTreeCommentType:
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:            if isinstance(element, ElementTree.ElementTree):
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:            elif element.tag == ElementTreeCommentType:
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:        implementation = ElementTreeImplementation
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/_utils.py:    import xml.etree.cElementTree as default_etree
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/_utils.py:    import xml.etree.ElementTree as default_etree
./vulnerablecode/vulnerabilities/importer.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/importer.py:    def _fetch(self) -> Tuple[Mapping, Iterable[ET.ElementTree]]:
./vulnerablecode/vulnerabilities/importer.py:        Return a two-tuple of ({mapping of Package URL data}, it's ET.ElementTree)
./vulnerablecode/vulnerabilities/importer.py:        self, xml_doc: ET.ElementTree, pkg_metadata={}
./vulnerablecode/vulnerabilities/importer.py:        OVAL xml ElementTree into a list of `Advisory`.
./vulnerablecode/vulnerabilities/importers/openssl.py:import defusedxml.ElementTree as DET
./vulnerablecode/vulnerabilities/importers/gentoo.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/importers/debian_oval.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/importers/debian_oval.py:                ET.ElementTree(ET.fromstring(resp.decode("utf-8"))),
./vulnerablecode/vulnerabilities/importers/ubuntu.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/importers/ubuntu.py:                ET.ElementTree(ET.fromstring(extracted.decode("utf-8"))),
./vulnerablecode/vulnerabilities/oval_parser.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/oval_parser.py:    def __init__(self, translations: Dict, oval_document: ET.ElementTree):
./vulnerablecode/vulnerabilities/tests/test_openssl.py:import defusedxml.ElementTree as DET
./vulnerablecode/vulnerabilities/tests/test_suse.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/tests/test_debian_oval.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/tests/test_data_source.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/tests/test_package_managers.py:        import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/tests/test_ubuntu.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/tests/test_gentoo.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/lib_oval.py:    >>> tree = ElementTree()
./vulnerablecode/vulnerabilities/lib_oval.py:    >>> tree = ElementTree()    
./vulnerablecode/vulnerabilities/lib_oval.py:from xml.etree import ElementTree
./vulnerablecode/vulnerabilities/lib_oval.py:from xml.etree.ElementTree import Element
./vulnerablecode/vulnerabilities/lib_oval.py:        #         if not tree or not isinstance(tree, ElementTree):
./vulnerablecode/vulnerabilities/lib_oval.py:            self.tree = ElementTree.ElementTree(root)
./vulnerablecode/vulnerabilities/lib_oval.py:        Load an OVAL document from a filename and parse that into an ElementTree
./vulnerablecode/vulnerabilities/lib_oval.py:                self.tree = ElementTree.parse(filename)
./vulnerablecode/vulnerabilities/lib_oval.py:        Initializes the ElementTree by parsing the xmltext as XML
./vulnerablecode/vulnerabilities/lib_oval.py:                root = ElementTree.fromstring(xmltext)
./vulnerablecode/vulnerabilities/lib_oval.py:                self.tree = ElementTree(root)
./vulnerablecode/vulnerabilities/lib_oval.py:        return ElementTree.tostring(root, "UTF-8", "xml").decode("utf-8")
./vulnerablecode/vulnerabilities/lib_oval.py:        in the OVAL ElementTree.
./vulnerablecode/vulnerabilities/lib_oval.py:        or None if there is no ElementTree or if a matching item could not be found
./vulnerablecode/vulnerabilities/lib_oval.py:        Adds the element to the ElementTree for this OVAL document
./vulnerablecode/vulnerabilities/lib_oval.py:        if the ElementTree does not already contain it.
./vulnerablecode/vulnerabilities/lib_oval.py:        Get the raw xml.etree.ElementTree.Element for this node.  Can be used to directly manipulate the
./vulnerablecode/vulnerabilities/lib_oval.py:            xml.etree.ElementTree.register_namespace("", namespace)
./vulnerablecode/vulnerabilities/lib_oval.py:            xml.etree.ElementTree.register_namespace(
./vulnerablecode/vulnerabilities/lib_oval.py:            xml.etree.ElementTree.register_namespace(
./vulnerablecode/vulnerabilities/lib_oval.py:            xml.etree.ElementTree.register_namespace(
./vulnerablecode/vulnerabilities/lib_oval.py:            # Create a new ElementTree with this element as the root
./vulnerablecode/vulnerabilities/lib_oval.py:            tree = ElementTree(e)
./vulnerablecode/vulnerabilities/lib_oval.py:            tree = ElementTree.ElementTree()
./vulnerablecode/vulnerabilities/package_managers.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/package_managers.py:            xml_resp = ET.ElementTree(ET.fromstring(response.decode("utf-8")))
./vulnerablecode/vulnerabilities/package_managers.py:    def extract_versions(xml_response: ET.ElementTree) -> Iterable[PackageVersion]:
./pymaven/pymaven/client.py:    from xml.etree import cElementTree as ElementTree
./pymaven/pymaven/client.py:    from xml.etree import ElementTree
./pymaven/pymaven/client.py:            metadata = ElementTree.parse(fh)
./scancode-toolkit/tests/licensedcode/data/datadriven/external/fossology-tests/BSD/__init__.py:   pre-processed text into an ElementTree.
./scancode-toolkit/tests/licensedcode/data/datadriven/external/fossology-tests/BSD/__init__.py:3. A bunch of "treeprocessors" are run against the ElementTree. One such
./scancode-toolkit/tests/licensedcode/data/datadriven/external/fossology-tests/BSD/__init__.py:   treeprocessor runs InlinePatterns against the ElementTree, detecting inline
./scancode-toolkit/tests/licensedcode/data/datadriven/external/fossology-tests/BSD/__init__.py:4. Some post-processors are run against the text after the ElementTree has
./scancode-toolkit/tests/cluecode/data/ics/markdown-markdown/html4.py:# Taken from ElementTree 1.3 preview with slight modifications
./scancode-toolkit/tests/cluecode/data/ics/markdown-markdown/html4.py:# The ElementTree toolkit is

@Hritik14
Copy link
Author

Here's a better grep with tests excluded.
Following repos are affected:

  • fetchcode
  • pymaven
  • vulnerablecode
Here's quick grep of the codebases using ElementTree
./fetchcode/src/fetchcode/vcs/pip/_internal/index/collector.py:    import xml.etree.ElementTree
./fetchcode/src/fetchcode/vcs/pip/_internal/index/collector.py:    HTMLElement = xml.etree.ElementTree.Element
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/__init__.py:          elementtree-like interface (known to work with ElementTree,
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/__init__.py:          cElementTree and lxml.etree).
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/__init__.py:        xml.etree.ElementTree or cElementTree (Currently applies to the "etree"
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/etree.py:def getETreeBuilder(ElementTreeImplementation):
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/etree.py:    ElementTree = ElementTreeImplementation
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/etree.py:    ElementTreeCommentType = ElementTree.Comment("asd").tag
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/etree.py:        """Given the particular ElementTree representation, this implementation,
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treewalkers/etree.py:            elif node.tag == ElementTreeCommentType:
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree_lxml.py:        # Create the root document and add the ElementTree to it
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/__init__.py:          ElementTree-like interface, defaulting to xml.etree.cElementTree if
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/__init__.py:          available and xml.etree.ElementTree if not.
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/__init__.py:        types). A module implementing the tree type e.g. xml.etree.ElementTree
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/__init__.py:        or xml.etree.cElementTree.
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:def getETreeBuilder(ElementTreeImplementation, fullTree=False):
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:    ElementTree = ElementTreeImplementation
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:    ElementTreeCommentType = ElementTree.Comment("asd").tag
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:            self._element = ElementTree.Element(self._getETreeTag(name,
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:            self._element = ElementTree.Comment(data)
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:            elif element.tag == ElementTreeCommentType:
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:            if isinstance(element, ElementTree.ElementTree):
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:            elif element.tag == ElementTreeCommentType:
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/treebuilders/etree.py:        implementation = ElementTreeImplementation
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/_utils.py:    import xml.etree.cElementTree as default_etree
./fetchcode/src/fetchcode/vcs/pip/_vendor/html5lib/_utils.py:    import xml.etree.ElementTree as default_etree
./vulnerablecode/vulnerabilities/importer.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/importer.py:    def _fetch(self) -> Tuple[Mapping, Iterable[ET.ElementTree]]:
./vulnerablecode/vulnerabilities/importer.py:        Return a two-tuple of ({mapping of Package URL data}, it's ET.ElementTree)
./vulnerablecode/vulnerabilities/importer.py:        self, xml_doc: ET.ElementTree, pkg_metadata={}
./vulnerablecode/vulnerabilities/importer.py:        OVAL xml ElementTree into a list of `Advisory`.
./vulnerablecode/vulnerabilities/importers/openssl.py:import defusedxml.ElementTree as DET
./vulnerablecode/vulnerabilities/importers/gentoo.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/importers/debian_oval.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/importers/debian_oval.py:                ET.ElementTree(ET.fromstring(resp.decode("utf-8"))),
./vulnerablecode/vulnerabilities/importers/ubuntu.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/importers/ubuntu.py:                ET.ElementTree(ET.fromstring(extracted.decode("utf-8"))),
./vulnerablecode/vulnerabilities/oval_parser.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/oval_parser.py:    def __init__(self, translations: Dict, oval_document: ET.ElementTree):
./vulnerablecode/vulnerabilities/lib_oval.py:from xml.etree import ElementTree
./vulnerablecode/vulnerabilities/lib_oval.py:from xml.etree.ElementTree import Element
./vulnerablecode/vulnerabilities/package_managers.py:import xml.etree.ElementTree as ET
./vulnerablecode/vulnerabilities/package_managers.py:            xml_resp = ET.ElementTree(ET.fromstring(response.decode("utf-8")))
./vulnerablecode/vulnerabilities/package_managers.py:    def extract_versions(xml_response: ET.ElementTree) -> Iterable[PackageVersion]:
./pymaven/pymaven/client.py:    from xml.etree import cElementTree as ElementTree
./pymaven/pymaven/client.py:    from xml.etree import ElementTree
./pymaven/pymaven/client.py:            metadata = ElementTree.parse(fh)

@TG1999 TG1999 transferred this issue from aboutcode-org/vulnerablecode Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants