Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCons.Util.Flatten doesn't work for nested sets, None #4393

Open
mwichmann opened this issue Aug 10, 2023 · 7 comments
Open

SCons.Util.Flatten doesn't work for nested sets, None #4393

mwichmann opened this issue Aug 10, 2023 · 7 comments

Comments

@mwichmann
Copy link
Collaborator

This diff to UtilTests.py illustrates non-working combos (there might be more):

@@ -1108,6 +1108,23 @@ class flattenTestCase(unittest.TestCase):
         result = flatten(items.values())
         self.assertEqual(sorted(result), [1, 2, 3])
 
+    def test_nested(self):
+        """Test flattening nested lists"""
+        # just lists, just one level
+        with self.subTest():
+            result = list(flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
+            self.assertEqual(result, [1, 2, 3, 4, 5, 6, 7, 8, 9], result)
+        # mix list, tuple, set, str and multi-level
+        with self.subTest():
+            result = list(flatten([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"]))
+            self.assertEqual(result, [1, 2, 3, 4, 5, 6, 7, 8, "99"], result)
+
+    def test_None(self):
+        """Test that flattening None raises error"""
+        with self.assertRaises(TypeError):
+            result = list(flatten(None))
+
+
 
 class OsEnviron:
     """Used to temporarily mock os.environ"""

With this outcome:

$ ./runtest.py SCons/UtilTests.py
1/1 (100.00%) /home/mats/.pyenv/versions/venv-system/bin/python SCons/UtilTests.py
.........................................F.F.........
======================================================================
FAIL: test_None (__main__.flattenTestCase.test_None)
Test that flattening None raises error
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mats/github/scons/SCons/UtilTests.py", line 1124, in test_None
    with self.assertRaises(TypeError):
AssertionError: TypeError not raised

======================================================================
FAIL: test_nested (__main__.flattenTestCase.test_nested) (<subtest>)
Test flattening nested lists
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mats/github/scons/SCons/UtilTests.py", line 1120, in test_nested
    self.assertEqual(result, [1, 2, 3, 4, 5, 6, 7, 8, "99"], result)
AssertionError: Lists differ: [1, 2, 3, 4, {5, 6}, 7, 8, '99'] != [1, 2, 3, 4, 5, 6, 7, 8, '99']

First differing element 4:
{5, 6}
5

Second list contains 1 additional elements.
First extra element 8:
'99'

- [1, 2, 3, 4, {5, 6}, 7, 8, '99']
?              -    -

+ [1, 2, 3, 4, 5, 6, 7, 8, '99'] : [1, 2, 3, 4, {5, 6}, 7, 8, '99']

----------------------------------------------------------------------
Ran 53 tests in 0.007s

FAILED (failures=2)
@jcbrill
Copy link
Contributor

jcbrill commented Nov 2, 2023

@mwichmann Isn't the implementation behavior for None correct and the test expectation incorrect?

The docstring contains:

    Flatten a sequence to a non-nested list.

    Converts either a single scalar or a nested sequence to a non-nested list.
    Note that :func:`flatten` considers strings
    to be scalars instead of sequences like pure Python would.
    """

None is a scalar data type and value.

@jcbrill
Copy link
Contributor

jcbrill commented Nov 2, 2023

The behavior for sets is probably also correct as a set is not a sequence (i.e., sets cannot be indexed and are not ordered).

A namedtuple is a sequence and would be expanded/flattened which in most use cases is likely undesirable. A namedtuple is probably better handled as a scalar rather than a sequence. Food for thought.

Edit: namedtuple does serialize to json as a list though.

@mwichmann
Copy link
Collaborator Author

Interesting thoughts. Will have to chew on it some more. Don't trust the docstring too much, as I've adjusted it already trying to convey what I think it does. Old manpage used to say:

Takes a sequence (that is, a Python list or tuple) that may contain nested sequences and returns a flattened list containing all of the individual elements in any sequence. This can be helpful for collecting the lists returned by calls to Builders; other Builders will automatically flatten lists specified as input, but direct Python manipulation of these lists does not.

And old docstring:

Flatten() converts either a single scalar or a nested sequence to a non-nested list. Note that flatten() considers strings to be scalars instead of sequences like Python would.

The implication of those in combination is unclear: one says "takes a sequence" and the other scalar or sequence.

@jcbrill
Copy link
Contributor

jcbrill commented Nov 3, 2023

In the general case, a scalar or sequence argument is useful as the return value is always is a list which in certain circumstances could eliminate type-specific testing (i.e., sequence/string vs scalar) prior to invocation.

For example, calling flatten to get a list of system paths. A scalar string representing a path would be converted to a list within flatten and a lists of paths would be flattened as expected. Either way the result is a list without having to check if the argument passed in is a scalar/string/sequence.

Whether or not the scalar None is meaningful is context dependent. In some contexts, None might be a valid member of nested lists or as a scalar. In other cases, None might imply bad input in the code around the flatten call (which might be guarded by a test for evaluating as true: if scalar_or_sequence: flat_list = flatten(scalar_or_sequence)).

@bdbaddog
Copy link
Contributor

bdbaddog commented Nov 3, 2023

Not sure I agree that sets shouldn't be flattened.
There's no explicit guarantee that Flatten() retains order, so the fact that sets don't have an order or can be indexed seems not to matter.

That said, do any of our internal datastructures use or return sets currently?
Or is this only a concern if a user were to use a set instead of a list or a dictionary, or other container of items?

@jcbrill
Copy link
Contributor

jcbrill commented Nov 3, 2023

Keep in mind what follows is intended to be helpful.

Not sure I agree that sets shouldn't be flattened.

Sets probably should be flattened and/or have a boolean keyword argument indicating how sets should be processed.

There's no explicit guarantee that Flatten() retains order, so the fact that sets don't have an order or can be indexed seems not to matter.

I was just pointing out that the existing code type checks are for sequences. From a type perspective, sets and frozensets are not sequences which are ordered and indexible.

That said, do any of our internal datastructures use or return sets currently?

Above my pay grade.

Or is this only a concern if a user were to use a set instead of a list or a dictionary, or other container of items?

Sets are useful proxies for lists as they guarantee uniqueness. A dictionary is not going to be flattened.

The existing do_flatten, flatten, and flatten_sequence implementations have bug(s) lurking but they are not exposed due to their current invocation usage in SCons.

For example (annotated):

def do_flatten(  # pylint: disable=redefined-outer-name,redefined-builtin
    sequence,
    result,
    isinstance=isinstance,  # <-- possibly user-defined kwarg
    StringTypes=StringTypes,  # <-- possibly user-defined kwarg
    SequenceTypes=SequenceTypes,  # <-- possibly user-defined kwarg
) -> None:
    for item in sequence:
        if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
            result.append(item)
        else:
            do_flatten(item, result)  # <-- any user-defined kwargs are discarded

The same problem appears in flatten and flatten_sequence.

At a minimum, the flatten and flatten_sequence methods should probably accept None as the default keywords in some cases and explicitly set the desired type tuples as necessary. If this is done, then it is trivial to extend including sets/frozen sets as "sequences" for purposes of flattening.

The script below is also attached in a zip file. The current flatten routines were renamed with names ending in _original. The revised implementations make use of the existing keyword arguments with the bugs fixed.

Notes:

  • This was done quickly so there may be code issues.
  • namedtuple was not addressed.

flatten.py source file in attached flatten.zip:

from collections import UserString, UserList, deque
from collections.abc import MappingView

# proxy for import
StringTypes = (str, UserString)
SequenceTypes = (list, tuple, deque, UserList, MappingView)
SetTypes = (set, frozenset)

# shadowed in functions: save using internal names
_StringTypes = StringTypes
_SequenceTypes = SequenceTypes
_SetTypes = (set, frozenset)

# synthetic types
_FlattenTypes = _SequenceTypes + _SetTypes

##### ORIGINAL CODE

def do_flatten_original(  # pylint: disable=redefined-outer-name,redefined-builtin
    sequence,
    result,
    isinstance=isinstance,        # <-- possibly user-defined kwarg
    StringTypes=StringTypes,      # <-- possibly user-defined kwarg
    SequenceTypes=SequenceTypes,  # <-- possibly user-defined kwarg
) -> None:
    for item in sequence:
        if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
            result.append(item)
        else:
            do_flatten_original(item, result) # <- any user-defined kwargs are discarded

def flatten_original(  # pylint: disable=redefined-outer-name,redefined-builtin
    obj,
    isinstance=isinstance,        # <-- possibly user-defined kwarg
    StringTypes=StringTypes,      # <-- possibly user-defined kwarg
    SequenceTypes=SequenceTypes,  # <-- possibly user-defined kwarg
    do_flatten=do_flatten_original,
) -> list:
    """Flatten a sequence to a non-nested list.

    Converts either a single scalar or a nested sequence to a non-nested list.
    Note that :func:`flatten` considers strings
    to be scalars instead of sequences like pure Python would.
    """
    if isinstance(obj, StringTypes) or not isinstance(obj, SequenceTypes):
        return [obj]
    result = []
    # effectively do_flatten code: replace with do_flatten call
    for item in obj:
        if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
            result.append(item)
        else:
            do_flatten(item, result)  # <- any user-defined kwargs are discarded
    return result

def flatten_sequence_original(  # pylint: disable=redefined-outer-name,redefined-builtin
    sequence,
    isinstance=isinstance,        # <-- possibly user-defined kwarg
    StringTypes=StringTypes,      # <-- possibly user-defined kwarg
    SequenceTypes=SequenceTypes,  # <-- possibly user-defined kwarg
    do_flatten=do_flatten_original,
) -> list:
    """Flatten a sequence to a non-nested list.

    Same as :func:`flatten`, but it does not handle the single scalar case.
    This is slightly more efficient when one knows that the sequence
    to flatten can not be a scalar.
    """
    result = []
    # effectively do_flatten code: replace with do_flatten call
    for item in sequence:
        if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
            result.append(item)
        else:
            do_flatten(item, result)  # <- any user-defined kwargs are discarded
    return result

##### MODIFIED CODE

def do_flatten(  # pylint: disable=redefined-outer-name,redefined-builtin
    sequence,
    result,
    isinstance=isinstance,
    StringTypes=StringTypes,
    SequenceTypes=SequenceTypes,
) -> None:
    for item in sequence:
        if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
            result.append(item)
        else:
            do_flatten(
                item,
                result,
                isinstance=isinstance,
                StringTypes=StringTypes,
                SequenceTypes=SequenceTypes,
            )

def flatten(  # pylint: disable=redefined-outer-name,redefined-builtin
    obj,
    isinstance=isinstance,
    StringTypes=None,
    SequenceTypes=None,
    do_flatten=do_flatten,
) -> list:
    """Flatten a sequence to a non-nested list.

    Converts either a single scalar or a nested sequence to a non-nested list.
    Note that :func:`flatten` considers strings
    to be scalars instead of sequences like pure Python would.
    """
    if StringTypes is None:
        StringTypes = _StringTypes

    if SequenceTypes is None:
        SequenceTypes = _FlattenTypes

    result = []
    if isinstance(obj, StringTypes) or not isinstance(obj, SequenceTypes):
        result.append(obj)
    else:
        do_flatten(
            obj,
            result,
            isinstance=isinstance,
            StringTypes=StringTypes,
            SequenceTypes=SequenceTypes,
        )
    return result


def flatten_sequence(  # pylint: disable=redefined-outer-name,redefined-builtin
    sequence,
    isinstance=isinstance,
    StringTypes=None,
    SequenceTypes=None,
    do_flatten=do_flatten,
) -> list:
    """Flatten a sequence to a non-nested list.

    Same as :func:`flatten`, but it does not handle the single scalar case.
    This is slightly more efficient when one knows that the sequence
    to flatten can not be a scalar.
    """
    if StringTypes is None:
        StringTypes = _StringTypes

    if SequenceTypes is None:
        SequenceTypes = _FlattenTypes

    result = []
    do_flatten(
        sequence,
        result,
        isinstance=isinstance,
        StringTypes=StringTypes,
        SequenceTypes=SequenceTypes,
    )
    return result

if __name__ == '__main__':

# just lists, just one level

    result = list(flatten_original([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result

    result = list(flatten_sequence_original([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result

    result = list(flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result

    result = list(flatten_sequence([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result

# mix list, tuple, set, str and multi-level

    # illustrate bug in original code: kwarg type discarded during recursion

    try:
        result = list(flatten_original([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
        assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
    except AssertionError:
        pass
    else:
        raise RuntimeError('expected AssertionError')

    try:
        result = list(flatten_sequence_original([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
        assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
    except AssertionError:
        pass
    else:
        raise RuntimeError('expected AssertionError')

    # illustrate modified code preserves kwarg which treats sets as scalars

    try:
        result = list(flatten([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_SequenceTypes))
        assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
    except AssertionError:
        pass
    else:
        raise RuntimeError('expected AssertionError')

    try:
        result = list(flatten_sequence([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_SequenceTypes))
        assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
    except AssertionError:
        pass
    else:
        raise RuntimeError('expected AssertionError')

    # illustrate modified code handles sets as expected

    result = list(flatten([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

    result = list(flatten_sequence([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

    result = list(flatten([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

    result = list(flatten_sequence([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

# mix list, tuple, set containing tuple of frozen sets, str and multi-level

    result = list(flatten([[1, [2]], (3, 4, {(frozenset([5]), frozenset([6]))}, 7), 8, "99"]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

    result = list(flatten_sequence([[1, [2]], (3, 4, {(frozenset([5]), frozenset([6]))}, 7), 8, "99"]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

flatten.zip

@mwichmann
Copy link
Collaborator Author

I have long had a rewritten set of flatten util routines (although the name may say Flatten, that's really just a wrapper to expose it in the Environment), so I was aware of the current buggy nature. They've just never risen to the point where it seemed right to push the PR....

The way the usage is pitched this needn't be terribly comprehensive - there are places where you may call things acting on nodes, which produce a list of answers, and do so in a loop, so you get a list-of-lists. My impression is that's what Flatten was written for and everything else is just stretching it, for completeness, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants