Skip to content

Commit

Permalink
[Doc] Revise Backup Restore according to feedback (#53738)
Browse files Browse the repository at this point in the history
(cherry picked from commit edd5009)
  • Loading branch information
EsoragotoSpirit authored and mergify[bot] committed Dec 11, 2024
1 parent 0fd7d1b commit c32747e
Show file tree
Hide file tree
Showing 6 changed files with 45 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,12 @@ StarRocks supports creating repositories in HDFS, AWS S3, and Google GCS.
- For AWS S3:
- "aws.s3.use_instance_profile": Whether or not to allow instance profile and assumed role as credential methods for accessing AWS S3. Default: `false`.

- If you use IAM user-based credential (Access Key and Secret Key) to access AWS S3, you don't need to specify this parameter, and you need to specify "aws.s3.access_key", "aws.s3.secret_key", and "aws.s3.endpoint".
- If you use IAM user-based credential (Access Key and Secret Key) to access AWS S3, you don't need to specify this parameter, and you need to specify "aws.s3.access_key", "aws.s3.secret_key", and "aws.s3.region".
- If you use Instance Profile to access AWS S3, you need to set this parameter to `true` and specify "aws.s3.region".
- If you use Assumed Role to access AWS S3, you need to set this parameter to `true` and specify "aws.s3.iam_role_arn" and "aws.s3.region".

- "aws.s3.access_key": The Access Key ID that you can use to access the Amazon S3 bucket.
- "aws.s3.secret_key": The Secret Access Key that you can use to access the Amazon S3 bucket.
- "aws.s3.endpoint": The endpoint that you can use to access the Amazon S3 bucket.
- "aws.s3.iam_role_arn": The ARN of the IAM role that has privileges on the AWS S3 bucket in which your data files are stored. If you want to use assumed role as the credential method for accessing AWS S3, you must specify this parameter. Then, StarRocks will assume this role when it analyzes your Hive data by using a Hive catalog.
- "aws.s3.region": The region in which your AWS S3 bucket resides. Example: `us-west-1`.

Expand All @@ -66,6 +65,11 @@ StarRocks supports creating repositories in HDFS, AWS S3, and Google GCS.
>
> StarRocks supports creating repositories in Google GCS only according to the S3A protocol. Therefore, when you create repositories in Google GCS, you must replace the prefix in the GCS URI you pass as a repository location in `ON LOCATION` with `s3a://`.
- For MinIO:
- "aws.s3.access_key": The Access Key that you can use to access the MinIO bucket.
- "aws.s3.secret_key": The Secret Key that you can use to access the MinIO bucket.
- "aws.s3.endpoint": The endpoint that you can use to access the MinIO bucket.

## Examples

Example 1: Create a repository named `hdfs_repo` in an Apache™ Hadoop® cluster.
Expand All @@ -89,7 +93,7 @@ ON LOCATION "s3a://bucket_s3/backup"
PROPERTIES(
"aws.s3.access_key" = "XXXXXXXXXXXXXXXXX",
"aws.s3.secret_key" = "yyyyyyyyyyyyyyyyy",
"aws.s3.endpoint" = "s3.us-east-1.amazonaws.com"
"aws.s3.region" = "us-east-1"
);
```

Expand All @@ -105,3 +109,16 @@ PROPERTIES(
"fs.s3a.endpoint" = "storage.googleapis.com"
);
```

Example 4: Create a repository named `minio_repo` in the MinIO bucket `bucket_minio`.

```SQL
CREATE REPOSITORY minio_repo
WITH BROKER
ON LOCATION "s3://bucket_minio/backup"
PROPERTIES(
"aws.s3.access_key" = "XXXXXXXXXXXXXXXXX",
"aws.s3.secret_key" = "yyyyyyyyyyyyyyyyy",
"aws.s3.endpoint" = "http://minio:9000"
);
```
Original file line number Diff line number Diff line change
Expand Up @@ -255,4 +255,4 @@ MySQL > SHOW PARTITIONS FROM t_recharge_detail1;
- Currently, using CTAS to create tables configured expression partitioning is not supported.
- Currently, using Spark Load to load data to tables that use expression partitioning is not supported.
- When the `ALTER TABLE <table_name> DROP PARTITION <partition_name>` statement is used to delete a partition created by using the column expression, data in the partition is directly removed and cannot be recovered.
- Currently, you cannot [backup and restore](../../administration/management/Backup_and_restore.md) partitions created by the expression partitioning.
- Currently, you cannot [backup and restore](../../administration/management/Backup_and_restore.md) tables created with the expression partitioning strategy.
Original file line number Diff line number Diff line change
Expand Up @@ -112,5 +112,5 @@ DISTRIBUTED BY HASH(`id`);
- List partitioning does support dynamic partitioning and creating multiple partitions at a time.
- Currently, StarRocks's shared-data mode does not support this feature.
- When the `ALTER TABLE <table_name> DROP PARTITION <partition_name>;` statement is used to delete a partition created by using list partitioning, data in the partition is directly removed and cannot be recovered.
- Currently you cannot backup and restore partitions created by the list partitioning.
- Currently you cannot [backup and restore](../../administration/management/Backup_and_restore.md) tables created with the list partitioning strategy.
- From v3.3.5 onwards, StarRocks supports creating [asynchronous materialized views](../../using_starrocks/async_mv/Materialized_view.md) with base tables created with the list partitioning strategy.
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ PROPERTIES ("key"="value", ...)

**PROPERTIES**

StarRocks 支持在 HDFS、AWS S3、Google GCS、阿里云 OSS 以及腾讯云 COS 中创建仓库。
StarRocks 支持在 HDFS、AWS S3、Google GCS、阿里云 OSS、腾讯云 COS 以及 MinIO 中创建仓库。

- HDFS:
- "username":用于访问 HDFS 集群中 NameNode 节点的用户名。
Expand All @@ -44,13 +44,12 @@ StarRocks 支持在 HDFS、AWS S3、Google GCS、阿里云 OSS 以及腾讯云 C
- S3:
- "aws.s3.use_instance_profile":是否使用 Instance Profile 或 Assumed Role 作为安全凭证访问 AWS S3。默认值:`false`

- 如果您使用 IAM 用户凭证(Access Key 和 Secret Key)访问 AWS S3,则无需指定该参数,并指定 "aws.s3.access_key"、"aws.s3.secret_key" 以及 "aws.s3.endpoint"。
- 如果您使用 IAM 用户凭证(Access Key 和 Secret Key)访问 AWS S3,则无需指定该参数,并指定 "aws.s3.access_key"、"aws.s3.secret_key" 以及 "aws.s3.region"。
- 如果您使用 Instance Profile 访问 AWS S3,则需将该参数设置为 `true`,并指定 "aws.s3.region"。
- 如果您使用 Assumed Role 访问 AWS S3,则需将该参数设置为 `true`,并指定 "aws.s3.iam_role_arn" 和 "aws.s3.region"。

- "aws.s3.access_key":访问 AWS S3 存储空间的 Access Key。
- "aws.s3.secret_key":访问 AWS S3 存储空间的 Secret Key。
- "aws.s3.endpoint":访问 AWS S3 存储空间的连接地址。
- "aws.s3.iam_role_arn":有访问 AWS S3 存储空间权限 IAM Role 的 ARN。如使用 Instance Profile 或 Assumed Role 作为安全凭证访问 AWS S3,则必须指定该参数。
- "aws.s3.region":需访问的 AWS S3 存储空间的地区,如 `us-west-1`

Expand All @@ -77,6 +76,11 @@ StarRocks 支持在 HDFS、AWS S3、Google GCS、阿里云 OSS 以及腾讯云 C
- "fs.cosn.userinfo.secretKey":访问腾讯云 COS 存储空间的 SecretKey,是用于加密签名字符串和服务端验证签名字符串的密钥。
- "fs.cosn.bucket.endpoint_suffix":访问腾讯云 COS 存储空间的连接地址。

- MinIO:
- "aws.s3.access_key":访问 MinIO 存储空间的 Access Key。
- "aws.s3.secret_key":访问 MinIO 存储空间的 Secret Key。
- "aws.s3.endpoint":访问 MinIO 存储空间的连接地址。

## 示例

示例一:在 Apache™ Hadoop® 集群中创建名为 `hdfs_repo` 的仓库。
Expand All @@ -100,7 +104,7 @@ ON LOCATION "s3a://bucket_s3/backup"
PROPERTIES(
"aws.s3.access_key" = "XXXXXXXXXXXXXXXXX",
"aws.s3.secret_key" = "yyyyyyyyyyyyyyyyy",
"aws.s3.endpoint" = "s3.us-east-1.amazonaws.com"
"aws.s3.region" = "us-east-1"
);
```

Expand All @@ -116,3 +120,16 @@ PROPERTIES(
"fs.s3a.endpoint" = "storage.googleapis.com"
);
```

示例四:在 MinIO 存储桶 `bucket_minio` 中创建一个名为 `minio_repo` 的仓库。

```SQL
CREATE REPOSITORY minio_repo
WITH BROKER
ON LOCATION "s3://bucket_minio/backup"
PROPERTIES(
"aws.s3.access_key" = "XXXXXXXXXXXXXXXXX",
"aws.s3.secret_key" = "yyyyyyyyyyyyyyyyy",
"aws.s3.endpoint" = "http://minio:9000"
);
```
Original file line number Diff line number Diff line change
Expand Up @@ -239,5 +239,5 @@ MySQL > SHOW PARTITIONS FROM t_recharge_detail1;
- 使用 CTAS 建表时暂时不支持表达式分区。
- 暂时不支持使用 Spark Load 导入数据至表达式分区的表。
- 使用 `ALTER TABLE <table_name> DROP PARTITION <partition_name>` 删除列表达式分区时,分区直接被删除并且不能被恢复。
- 列表达式分区暂时不支持[备份与恢复](../../administration/management/Backup_and_restore.md)
- 表达式分区表暂时不支持[备份与恢复](../../administration/management/Backup_and_restore.md)
- 如果使用表达式分区,则仅支持回滚到 2.5.4 及以后的版本。
Original file line number Diff line number Diff line change
Expand Up @@ -112,5 +112,5 @@ DISTRIBUTED BY HASH(`id`);
- 不支持[动态](./dynamic_partitioning.md) List 分区。
- StarRocks 存算分离模式 从 3.1.1 版本开始支持该功能。
- 使用 `ALTER TABLE <table_name> DROP PARTITION <partition_name>;` 分区直接被删除并且不能被恢复。
- List 分区暂时不支持[备份与恢复](../../administration/management/Backup_and_restore.md)
- List 分区表暂时不支持[备份与恢复](../../administration/management/Backup_and_restore.md)
- 从 3.3.5 版本开始,[异步物化视图](../../using_starrocks/async_mv/Materialized_view.md)支持基于使用 List 分区的基表创建。

0 comments on commit c32747e

Please sign in to comment.