Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No logging in container for python scripts nodes #3222

Closed
paloma-rebuelta opened this issue Apr 9, 2024 · 4 comments · Fixed by #3227
Closed

No logging in container for python scripts nodes #3222

paloma-rebuelta opened this issue Apr 9, 2024 · 4 comments · Fixed by #3227
Assignees
Labels
component:pipeline-editor pipeline editor good first issue Good for newcomers kind:bug Something isn't working

Comments

@paloma-rebuelta
Copy link
Contributor

Describe the issue
Hello! I have a problem with my elyra pipelines not showing logs.
I have looked for issues on github but couldn't find anything related, and i am not sure if I am missing some needed config or if this is a bug.
When I run an elyra pipeline with two nodes, one a jupyter notebook and one a python script, only the jupyter notebook one shows the logging. I am not sure why this is the case as they are both managed the same way.
This problem also happens if the pipeline only has python scripts, just using this example as it shows the difference well.
I have added screenshots of my example, and in the logs screenshot there's the jupyter notebook one that clearly shows the expected outputs, whereas the other one doesn't have anything
Any help would be really appreciated!

To Reproduce
Steps to reproduce the behavior:

  1. Create a pipeline with a python script node. This python script should have print functions or a logging.logger
  2. trigger the pipeline using kubeflow pipelines and look into the logs. the logging is missing

Screenshots or log output
Screenshot 2024-04-08 at 18 00 40
Screenshot 2024-04-08 at 18 00 31
Screenshot 2024-04-08 at 18 01 13
Screenshot 2024-04-08 at 18 08 12
Screenshot 2024-04-08 at 18 08 24

Expected behavior
I would expect identical logging for both nodes, given the fact that the code is the same. I haven't managed to get anything to output in the python script nodes.

Deployment information
Describe what you've deployed and how:

  • Elyra version: [e.g. 3.15.0]
  • Installation source: custom container image
  • Deployment type: Kubernetes, Kubeflow [notebook server]
@shalberd
Copy link
Contributor

shalberd commented Apr 17, 2024

A coworker of mine just today approached me with the same problem. Will verify ... I am having him submit both python and jupyter generic pipeline nodes / tasks as well. He mentioned the same: why don't I see the log infos in Airflow Web GUI Task logs ... very similar to me to your problem described here. The issue is independent of the runtime, be it Kubeflow pipelines or Airflow.

Verified with us, too:
Any python output, be it print(1), logger.info("sometext") raise "exception text" does not show up in the runtime logs in case of python nodes. In contrast, in jupyter nodes, those infos show up.

pythonFileOp and RFileOp have similar snytax, subprocess.run, in both kfp and airflow bootstrapper.py

https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L521

https://github.com/elyra-ai/elyra/blob/main/elyra/airflow/bootstrapper.py#L332

i.e.
with open(r_script_output, "w") as log_file: subprocess.run(run_args, stdout=log_file, stderr=subprocess.STDOUT, check=True)

We only see the name of the processed file in stdout and no logs in case of python or r nodes:

Bildschirmfoto 2024-04-17 um 18 04 57

whereas ipynb notebookFileOp

https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L357

probably handles cell logging differently, which is why we see its python output.

I found the following info on subprocess: https://docs.python.org/3/library/subprocess.html#using-the-subprocess-module
subprocess.run has an argument capture_output=False by default.

If capture_output is true, stdout and stderr will be captured. When used, the internal [Popen](https://docs.python.org/3/library/subprocess.html#subprocess.Popen) object is automatically created with stdout and stdin both set to [PIPE](https://docs.python.org/3/library/subprocess.html#subprocess.PIPE). The stdout and stderr arguments may not be supplied at the same time as capture_output. If you wish to capture and combine both streams into one, set stdout to [PIPE](https://docs.python.org/3/library/subprocess.html#subprocess.PIPE) and stderr to [STDOUT](https://docs.python.org/3/library/subprocess.html#subprocess.STDOUT), instead of using capture_output.

@kevin-bates @lresende FYI

@romeokienzler
Copy link
Member

romeokienzler commented Apr 19, 2024

This is how we do it

result = subprocess.run(command_array, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

output = result.stdout.decode('utf-8')

print("Output:", output)
                    
print("Return code:", result.returncode)

@romeokienzler romeokienzler added good first issue Good for newcomers kind:bug Something isn't working component:pipeline-editor pipeline editor and removed status:Needs Triage labels Apr 19, 2024
@shalberd
Copy link
Contributor

shalberd commented Apr 24, 2024

Hi @paloma-rebuelta @romeokienzler @harshad16 just a quick update on my code change tested so we can see the actual logs in python and R nodes processed in KFP or Airflow. I decided to let the called process error exception (on non-zero return code) trickle up in subprocess by means of "check=True", as it's been before here in that part of Elyra code.
PR will follow, I am open to any hints regarding good python code style and practice, avoiding log injection vulnerabilities etc.

            t0 = time.time()
    
            run_args = ["python3", python_script]
        
            with open(python_script_output, "w") as log_file:
                print("Processing file:", python_script)
                try:
                    result = subprocess.run(run_args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, check=True)
                    output = result.stdout.decode('utf-8')
                    logger.info(f"Output: {output}")
                    logger.info(f"Return code: {result.returncode}")
                except subprocess.CalledProcessError as e:
                    logger.error("Output: %s", e.output.decode("utf-8"))
                    logger.error("Return code: %s", e.returncode)

In terms of new behavior, see some samples, in my case Airflow, using different python content:

Bildschirmfoto 2024-04-24 um 16 00 54
Bildschirmfoto 2024-04-24 um 16 01 15

Bildschirmfoto 2024-04-24 um 16 02 08
Bildschirmfoto 2024-04-24 um 16 03 02

Bildschirmfoto 2024-04-24 um 16 05 13
Bildschirmfoto 2024-04-24 um 16 06 53

@romeokienzler @kevin-bates @lresende @harshad16 I wonder why elyra developers initially decided to put the stdout script output into a file and then put that file into the elyra s3 bucket ... in my PR, I will add script output to stdout and log it in whatever system it runs on, leaving log collection to the system. Do we want to keep putting python or r or notebook output stdout to S3 bucket file like here https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L492 Searching for architectural input here. Also posted this question in elyra Slack general channel.

Bildschirmfoto 2024-04-24 um 20 56 33

Since I am now piping all script run output to stdout, including stderr, I am getting the following, understandable, warning after running make lint

elyra/airflow/bootstrapper.py:334:53: F841 local variable 'log_file' is assigned to but never used
elyra/airflow/bootstrapper.py:374:48: F841 local variable 'log_file' is assigned to but never used
elyra/kfp/bootstrapper.py:486:53: F841 local variable 'log_file' is assigned to but never used
elyra/kfp/bootstrapper.py:528:48: F841 local variable 'log_file' is assigned to but never used

@shalberd
Copy link
Contributor

shalberd commented May 13, 2024

@harshad16 @romeokienzler regarding notebook logs and cell output that @paloma-rebuelta sees when executing Jupyter notebooks code (see her first code example ipynb that I also ran today), when airflow or kfp log level is only set to INFO (not debug), it seems the log info or cell output is not written:

[2024-05-13, 17:31:49 UTC] {pod_manager.py:466} INFO - [base] [I 17:31:48.057] Output Notebook: notebook_log_test-output.ipynb
[2024-05-13, 17:31:49 UTC] {pod_manager.py:466} INFO - [base] 
Executing:   0%|          | 0/1 [00:00<?, ?cell/s][I 17:31:49.020] Executing notebook with kernel: python3
[2024-05-13, 17:31:49 UTC] {pod_manager.py:466} INFO - [base] 
Executing: 100%|██████████| 1/1 [00:00<00:00,  1.03cell/s]
Executing: 100%|██████████| 1/1 [00:01<00:00,  1.19s/cell]
[2024-05-13, 17:31:49 UTC] {pod_manager.py:466} INFO - [base] [I 17:31:49.259] 'NotebookLog':'notebook_log_test' - notebook execution completed (2.110 secs)

so we have to do some more work in bootstrapper.py for NotebookFileOps, too, it seems, with regards to logger and stdout and stderr ... I'll test this out. i.e.

nteract/papermill#433 (comment)

https://github.com/nteract/papermill/blob/main/papermill/engines.py#L221

Update:
ok, so for papermill.execute, if i set log_output to True, I get what Paloma saw, except also in overall airflow logs info mode:

[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] None
[I 19:01:56.440] Executing notebook with kernel: python3
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] [I 19:01:56.441] Executing Cell 1---------------------------------------
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] [W 19:01:56.463] INFO:test:this is a jupyter notebook
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] INFO:test:this is a jupyter notebook
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base] [W 19:01:56.464] ERROR:test:this is a contrived error message in a jupyter notebook script
2024-05-13T19:01:56.466436267Z 
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base] ERROR:test:this is a contrived error message in a jupyter notebook script
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base] [I 19:01:56.466] this is a print statement in a jupyter notebook
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base] this is a print statement in a jupyter notebook
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base] [I 19:01:56.467] Ending Cell 1------------------------------------------
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base] 
Executing: 100%|██████████| 1/1 [00:01<00:00,  1.00s/cell]
Executing: 100%|██████████| 1/1 [00:01<00:00,  1.22s/cell]
[2024-05-13, 19:01:58 UTC] {pod_manager.py:511} WARNING - Pod notebook-log-test-mp6spa42 log read interrupted but container base still running
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] None
[I 19:01:56.440] Executing notebook with kernel: python3
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] [I 19:01:56.441] Executing Cell 1---------------------------------------
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] [W 19:01:56.463] INFO:test:this is a jupyter notebook
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] INFO:test:this is a jupyter notebook
..

need to find out why cell output got logged twice: log read interrupted but container base still running. but good to see that the three additional arguments to papermill.execute_notebook

log_output=True, stdout_file=sys.stdout, stderr_file=sys.stderr

make sure that in any case, cell content is logged correctly and when there is a raise or unknown error, that, too, is logged.
¨
when I add

papermill.execute_notebook(notebook, notebook_output, kernel_name=kernel_name, log_output=True, stdout_file=sys.stdout, stderr_file=sys.stderr)

at https://github.com/elyra-ai/elyra/blob/main/elyra/airflow/bootstrapper.py#L229C13-L229C91
(or of course the bootstrapper code in kfp, too) it's cool. Will add that to the PR as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:pipeline-editor pipeline editor good first issue Good for newcomers kind:bug Something isn't working
Projects
None yet
3 participants