Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported transition from state PREPROCESSED_FOR_TRANSFORMATION to state DISTILLED #28

Closed
cristianvasquez opened this issue Jun 25, 2024 · 5 comments

Comments

@cristianvasquez
Copy link
Contributor

We are making the first run to test if a transformation is triggered with package_cn_v1.9 tag:1.0.0-rc.3 installed.

@Dragos0000 suggested to report this here.

The run is for one day, Apr 1

image

The batches fail at the same step, notice_distillation_pipeline, with 'Unsupported transition from state PREPROCESSED_FOR_TRANSFORMATION to state DISTILLED'

I'm unsure if any of them went through

*** Reading local file: /opt/airflow/logs/dag_id=notice_processing_pipeline/run_id=manual__2024-06-25T10:36:56.911113+00:00/task_id=notice_distillation_pipeline/attempt=1.log
[2024-06-25, 10:58:11 UTC] {taskinstance.py:1083} INFO - Dependencies all met for <TaskInstance: notice_processing_pipeline.notice_distillation_pipeline manual__2024-06-25T10:36:56.911113+00:00 [queued]>
[2024-06-25, 10:58:11 UTC] {taskinstance.py:1083} INFO - Dependencies all met for <TaskInstance: notice_processing_pipeline.notice_distillation_pipeline manual__2024-06-25T10:36:56.911113+00:00 [queued]>
[2024-06-25, 10:58:11 UTC] {taskinstance.py:1279} INFO - 
--------------------------------------------------------------------------------
[2024-06-25, 10:58:11 UTC] {taskinstance.py:1280} INFO - Starting attempt 1 of 1
[2024-06-25, 10:58:11 UTC] {taskinstance.py:1281} INFO - 
--------------------------------------------------------------------------------
[2024-06-25, 10:58:11 UTC] {taskinstance.py:1300} INFO - Executing <Task(NoticeBatchPipelineOperator): notice_distillation_pipeline> on 2024-06-25 10:36:56.911113+00:00
[2024-06-25, 10:58:11 UTC] {standard_task_runner.py:55} INFO - Started process 25082 to run task
[2024-06-25, 10:58:11 UTC] {standard_task_runner.py:82} INFO - Running: ['airflow', 'tasks', 'run', 'notice_processing_pipeline', 'notice_distillation_pipeline', 'manual__2024-06-25T10:36:56.911113+00:00', '--job-id', '57619', '--raw', '--subdir', 'DAGS_FOLDER/reprocess_unpackaged_notices_from_backlog.py', '--cfg-path', '/tmp/tmp7_3ld40w']
[2024-06-25, 10:58:11 UTC] {standard_task_runner.py:83} INFO - Job 57619: Subtask notice_distillation_pipeline
[2024-06-25, 10:58:11 UTC] {warnings.py:109} WARNING - /home/airflow/.local/lib/python3.8/site-packages/airflow/settings.py:249: DeprecationWarning: The sql_alchemy_conn option in [core] has been moved to the sql_alchemy_conn option in [database] - the old setting has been used, but please update your config.
  SQL_ALCHEMY_CONN = conf.get("database", "SQL_ALCHEMY_CONN")

[2024-06-25, 10:58:11 UTC] {task_command.py:388} INFO - Running <TaskInstance: notice_processing_pipeline.notice_distillation_pipeline manual__2024-06-25T10:36:56.911113+00:00 [running]> on host ip-10-68-154-169.eu-west-1.compute.internal
[2024-06-25, 10:58:11 UTC] {taskinstance.py:1507} INFO - Exporting the following env vars:
[email protected]
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=notice_processing_pipeline
AIRFLOW_CTX_TASK_ID=notice_distillation_pipeline
AIRFLOW_CTX_EXECUTION_DATE=2024-06-25T10:36:56.911113+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2024-06-25T10:36:56.911113+00:00
[2024-06-25, 10:58:11 UTC] {taskinstance.py:1768} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/airflow/dags/operators/DagBatchPipelineOperator.py", line 80, in execute
    self.batch_pipeline_callable(notice_ids=notice_ids, mongodb_client=mongodb_client))
  File "/opt/airflow/dags/pipelines/notice_batch_processor_pipelines.py", line 24, in notices_batch_distillation_pipeline
    notice.set_distilled_rdf_manifestation(
  File "/opt/airflow/ted_sws/core/model/notice.py", line 306, in set_distilled_rdf_manifestation
    self.update_status_to(NoticeStatus.DISTILLED)
  File "/opt/airflow/ted_sws/core/model/notice.py", line 479, in update_status_to
    raise UnsupportedStatusTransition(
ted_sws.core.model.notice.UnsupportedStatusTransition: Unsupported transition from state PREPROCESSED_FOR_TRANSFORMATION to state DISTILLED.
[2024-06-25, 10:58:11 UTC] {taskinstance.py:1318} INFO - Marking task as FAILED. dag_id=notice_processing_pipeline, task_id=notice_distillation_pipeline, execution_date=20240625T103656, start_date=20240625T105811, end_date=20240625T105811
[2024-06-25, 10:58:11 UTC] {standard_task_runner.py:100} ERROR - Failed to execute job 57619 for task notice_distillation_pipeline (Unsupported transition from state PREPROCESSED_FOR_TRANSFORMATION to state DISTILLED.; 25082)
[2024-06-25, 10:58:11 UTC] {local_task_job.py:208} INFO - Task exited with return code 1
[2024-06-25, 10:58:13 UTC] {taskinstance.py:2578} INFO - 0 downstream tasks scheduled from follow-on schedule check
@cristianvasquez
Copy link
Contributor Author

7 Notices went through

@costezki
Copy link
Contributor

Unsupported transition from state PREPROCESSED_FOR_TRANSFORMATION to state DISTILLED means that an important state TRANSFORMED has been missed. One needs to transform the notices first before attempting to DISTILL the (i.e. duplicate entities in a Notice).
The Pipeline operator is strongly encouraged to follow the instructions presented in the user manual in order to avoid such errors in the future.

@cristianvasquez
Copy link
Contributor Author

cristianvasquez commented Jul 1, 2024

Thank you for your response, but doesn't help in this specific issue.

the pipeline, states and transitions you mention are defined in the Python code and the code was not modified.

The pipeline was triggered by the DAG that processes notices per day, which triggered the processes shown in the screenshot. Notice that depict notice_transformation_pipeline steps are green, which leads to thinking the notices are in the TRANSFORMED state

should I rename this issue as 'state TRANSFORMED has been missed'
would it help to track the problem?

@costezki
Copy link
Contributor

costezki commented Jul 1, 2024

This is due to loading of incorrect packages. Please remove the alpha packages from the system. Then for notices that are in the backlog, please trigger the re-transform DAGs. Then the issues would disappear.
Feel free to contact our colleagues directly for such concerns.

@costezki costezki closed this as completed Jul 1, 2024
@cristianvasquez
Copy link
Contributor Author

Related issue: #27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants