You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the issues and found no similar issues.
Version
doris 2.1.7
hadoop 3.1.4
What's Wrong?
Multiple replicas of the tablet are cold backed up to HDFS. It is common for some replicas to experience cold backup anomalies, while other tablets may have all replicas successfully cold backed up. If the partition replica is set to 1, this issue will not occur. The errors reported when replicas are cold backed up to HDFS mainly include ‘Blocklist for /data/10108/10110.0.meta has changed!’ and ‘Cannot read cooldown meta: [INTERNAL_ERROR] malformed tablet meta’.
Below is the specific information:
create table info:
CREATE TABLE IF NOT EXISTS example_tbl_by_default_t01
(
timestamp DATETIME NOT NULL COMMENT "日志时间",
type INT NOT NULL COMMENT "日志类型",
error_code INT COMMENT "错误码",
error_msg VARCHAR(1024) COMMENT "错误详细信息",
op_id BIGINT COMMENT "负责人id",
op_time DATETIME COMMENT "处理时间"
)
auto partition by list(error_msg)()
DISTRIBUTED BY HASH(type) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 2"
);
ALTER TABLE example_tbl_by_default_t01 set ("storage_policy" = "policy_hdfs_t01");
detail error:
It has been confirmed that the meta file causing the error exists on HDFS and is in a normal state.
[hdfs_builder.cpp:60] java.io.IOException: Blocklist for /data/10108/10110.0.meta has changed!
at org.apache.hadoop.hdfs.DFSInputStream.fetchAndCheckLocatedBlocks(DFSInputStream.java:302)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:238)
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1012)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:952)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:930)
at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1128)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1496)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1705)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:259)
[tablet.cpp:2451] cannot read cooldown meta: [INTERNAL_ERROR]malformed tablet meta
, path=/data/24763/24765.0.meta
0# doris::Tablet::_read_cooldown_meta(std::shared_ptrdoris::io::RemoteFileSystem const&, doris::TabletMetaPB*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/..
/../../../include/c++/11/bits/unique_ptr.h:120
1# doris::Tablet::_follow_cooldowned_data() at /root/doris/be/src/common/status.h:491
2# doris::Tablet::cooldown(std::shared_ptrdoris::Rowset) at /root/doris/be/src/common/status.h:491
3# std::_Function_handler<void (), doris::StorageEngine::_cooldown_tasks_producer_callback()::$_1>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x8
6_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
4# doris::WorkThreadPool::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
5# execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
6# start_thread
7# __clone
This is the information after cold backup of another table tablet. The issue of partial replicas of the tablet failing to cold back up will persist. After restarting the BE, it will return to normal, and the above errors will no longer occur.
What You Expected?
Multiple replicas of the tablet can be successfully cooled down to HDFS
Search before asking
Version
doris 2.1.7
hadoop 3.1.4
What's Wrong?
Multiple replicas of the tablet are cold backed up to HDFS. It is common for some replicas to experience cold backup anomalies, while other tablets may have all replicas successfully cold backed up. If the partition replica is set to 1, this issue will not occur. The errors reported when replicas are cold backed up to HDFS mainly include ‘Blocklist for /data/10108/10110.0.meta has changed!’ and ‘Cannot read cooldown meta: [INTERNAL_ERROR] malformed tablet meta’.
Below is the specific information:
CREATE TABLE IF NOT EXISTS example_tbl_by_default_t01
(
timestamp DATETIME NOT NULL COMMENT "日志时间",
type INT NOT NULL COMMENT "日志类型",
error_code INT COMMENT "错误码",
error_msg VARCHAR(1024) COMMENT "错误详细信息",
op_id BIGINT COMMENT "负责人id",
op_time DATETIME COMMENT "处理时间"
)
auto partition by list(error_msg)()
DISTRIBUTED BY HASH(type) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 2"
);
CREATE RESOURCE "remote_hdfs_t01" PROPERTIES (
"type"="hdfs",
"fs.defaultFS"="qione01:9000"
)
CREATE STORAGE POLICY policy_hdfs_t01
PROPERTIES(
"storage_resource" = "remote_hdfs_t01",
"cooldown_ttl" = "60"
)
ALTER TABLE example_tbl_by_default_t01 set ("storage_policy" = "policy_hdfs_t01");
It has been confirmed that the meta file causing the error exists on HDFS and is in a normal state.
[hdfs_builder.cpp:60] java.io.IOException: Blocklist for /data/10108/10110.0.meta has changed!
at org.apache.hadoop.hdfs.DFSInputStream.fetchAndCheckLocatedBlocks(DFSInputStream.java:302)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:238)
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1012)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:952)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:930)
at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1128)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1496)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1705)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:259)
[tablet.cpp:2451] cannot read cooldown meta: [INTERNAL_ERROR]malformed tablet meta
, path=/data/24763/24765.0.meta
0# doris::Tablet::_read_cooldown_meta(std::shared_ptrdoris::io::RemoteFileSystem const&, doris::TabletMetaPB*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/..
/../../../include/c++/11/bits/unique_ptr.h:120
1# doris::Tablet::_follow_cooldowned_data() at /root/doris/be/src/common/status.h:491
2# doris::Tablet::cooldown(std::shared_ptrdoris::Rowset) at /root/doris/be/src/common/status.h:491
3# std::_Function_handler<void (), doris::StorageEngine::_cooldown_tasks_producer_callback()::$_1>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x8
6_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
4# doris::WorkThreadPool::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
5# execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
6# start_thread
7# __clone
This is the information after cold backup of another table tablet. The issue of partial replicas of the tablet failing to cold back up will persist. After restarting the BE, it will return to normal, and the above errors will no longer occur.
What You Expected?
Multiple replicas of the tablet can be successfully cooled down to HDFS
How to Reproduce?
No response
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: