Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] The hdfs directory is not synchronized when the spark resource is deleted #45886

Open
blanklin030 opened this issue May 20, 2024 · 0 comments · May be fixed by #45903
Open

[BugFix] The hdfs directory is not synchronized when the spark resource is deleted #45886

blanklin030 opened this issue May 20, 2024 · 0 comments · May be fixed by #45903
Labels
type/bug Something isn't working

Comments

@blanklin030
Copy link
Contributor

blanklin030 commented May 20, 2024

Steps to reproduce the behavior (Required)

    1. create spark load
LOAD LABEL pre_stream.test_load_ly_2 (
DATA FROM TABLE test_list_dup_sr_external_h2s_foit_820240510
INTO TABLE test_list_dup_sr
TEMPORARY PARTITION(temp__p20230930_BR)
SET (
    `id` = `id`,
    `name` = `name`,
    `dt` = '2023-09-30',
    `country_code` = 'BR'
) 
)WITH RESOURCE 'spark_resource' (
  "spark.yarn.tags" = "xxx05131",
  "spark.dynamicAllocation.enabled" = "true",
  "spark.executor.memory" = "3g",
  "spark.executor.memoryOverhead" = "2g",
  "spark.streaming.batchDuration" = "5",
  "spark.executor.cores" = "1",
  "spark.yarn.executor.memoryOverhead" = "2g",
  "spark.speculation" = "false",
  "spark.dynamicAllocation.minExecutors" = "2",
  "spark.dynamicAllocation.maxExecutors" = "100"
) PROPERTIES (
  "timeout" = "72000",
  "spark_load_submit_timeout" = "7200"
)
;
    1. some directory is created
2024-05-14 01:42:12,013 INFO (pending_load_task_scheduler_pool-1|498) [SparkRepository.upload():302] finished to upload file, localPath=/home/hadoop/starrocks-current/fe/spark-dpp/spark-dpp-1.0.0-jar-with-dependencies.jar, remotePath=hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__db__tb_sr__1019adb1d38c/__archive_1.0.0/__lib__spark-dpp-1.0.0-jar-with-dependencies.jar


2024-05-14 01:42:12,077 INFO (pending_load_task_scheduler_pool-1|498) [SparkRepository.rename():316] finished to rename file, originPath=hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__db__tb_sr__1019adb1d38c/__archive_1.0.0/__lib__spark-dpp-1.0.0-jar-with-dependencies.jar, destPath=hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__db__tb_sr__1019adb1d38c/__archive_1.0.0/__lib_70688c469808112f344091125a860404_spark-dpp-1.0.0-jar-with-dependencies.jar
    1. drop spark resource
drop resource spark_resource
    1. The hdfs directory is not synchronized when the spark resource is deleted
[hadoop@bigdata-starrocks-xxx ~]$ hdfs dfs -ls hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__spark_resource/__archive_1.0.0/
Found 2 items
-rw-r--r--   3 prod_xxx supergroup  394653421 2024-05-20 10:54 hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__spark_resource/__archive_1.0.0/__lib_62eff19a2751990e17b47aa258fb7623_spark-2x.zip
-rw-r--r--   3 prod_xxx supergroup    4013682 2024-05-20 10:53 hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__spark_resource/__archive_1.0.0/__lib_70688c469808112f344091125a860404_spark-dpp-1.0.0-jar-with-dependencies.jar

Expected behavior (Required)

drop spark resource and delete spark directory

Real behavior (Required)

drop spark resource and the spark directory didn't remove

StarRocks version (Required)

  • You can get the StarRocks version by executing SQL select current_version()
@blanklin030 blanklin030 added the type/bug Something isn't working label May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant