Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python automation.Stack.up() orphans _watch_logs thread #16095

Open
pgcamus opened this issue May 1, 2024 · 2 comments
Open

Python automation.Stack.up() orphans _watch_logs thread #16095

pgcamus opened this issue May 1, 2024 · 2 comments
Labels
area/automation-api kind/bug Some behavior is incorrect or out of spec language/python

Comments

@pgcamus
Copy link

pgcamus commented May 1, 2024

What happened?

We're using the Python automation API.

One of our calls to an instance of Stack.up() ends up leaving an orphaned _watch_logs thread lying around which then hangs shutdown of our driver process.

Example

stack = automation.select_stack(stack_name=stackname, work_dir=workdir)
stack.up()

I don't have a good repro case, unfortunately. But by code inspection (see "Additional Context") I think I know what the bug is.

Output of pulumi about

CLI          
Version      3.108.1
Go Version   go1.22.0
Go Compiler  gc

Plugins
NAME           VERSION
command        0.9.2
gcp            7.17.0
google-native  0.32.0
grafana        0.3.0
kubernetes     4.9.1
python         unknown
random         4.16.0

Host     
OS       ubuntu
Version  22.04
Arch     aarch64

This project is written in python: executable='/home/builder/.cache/bazel/_bazel_builder/111d6a28ddfe7cb2f314c9d3cd6a7751/execroot/ritta/bazel-out/aarch64-dbg/bin/infra/pulumi/pulumi.runfiles/python3_10_aarch64-unknown-linux-gnu/bin/python3' version='3.10.9'

Backend        
Name           builderdev.pg.serengeti.localdomain
URL            gs://camus-infra-pulumistate/core
User           builder
Organizations  
Token type     personal

Dependencies:
NAME                         VERSION
absl-py                      1.4.0
debugpy                      1.8.1
dnspython                    2.6.1
gcsfs                        2024.3.1
google-cloud-asset           3.26.0
google-cloud-bigquery        3.20.1
google-cloud-bigtable        2.23.1
google-cloud-compute         1.18.0
google-cloud-pubsub          2.21.1
google-cloud-secret-manager  2.19.0
grafanalib                   0.7.1
influxdb                     5.3.1
Jinja2                       3.1.3
kubernetes                   24.2.0
Pint                         0.23
pip                          22.3.1
prometheus-client            0.19.0
pulumi-command               0.9.2
pulumi_gcp                   7.17.0
pulumi-google-native         0.32.0
pulumi_kubernetes            4.9.1
pulumi-policy                1.10.0
pulumi_random                4.16.0
pulumiverse-grafana          0.3.0
PySocks                      1.7.1
python-json-logger           2.0.7
python-socks                 2.4.4
stringcase                   1.2.0

Additional context

This is the definition of _watch_logs:

def _watch_logs(filename: str, callback: OnEvent):
    with open(filename, encoding="utf-8") as f:
        while True:
            line = f.readline()

            # sleep if file hasn't been updated
            if not line:
                time.sleep(0.1)
                continue

            event = EngineEvent.from_json(json.loads(line))
            callback(event)

            # if this is the cancel event, stop watching logs.
            if event.cancel_event:
                break

The behavior of f.readline() is as follows (link)

This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.

My theory is that the thing writing to the event log terminates without writing the event JSON entry. At this point, f.readline() returns the empty string on every invocation, and we sleep and loop forever.

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@pgcamus pgcamus added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels May 1, 2024
@justinvp justinvp added area/automation-api language/python and removed needs-triage Needs attention from the triage team labels May 3, 2024
@justinvp
Copy link
Member

justinvp commented May 3, 2024

Thanks for opening the issue and sorry for the trouble! The additional context you provided sounds plausible at first glance. We'll take a closer look.

@tgummerer
Copy link
Collaborator

This might potentially be related to #6768 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/automation-api kind/bug Some behavior is incorrect or out of spec language/python
Projects
None yet
Development

No branches or pull requests

3 participants