StarRocks Best Practices on Server type, Data Modeling, Query Performance, Data Loading (WIP) #22765

alberttwong · 2023-04-28T18:28:06Z

alberttwong
Apr 28, 2023

Selecting the optimal server type

For the Frontend (FE), we recommend a total CPU core count of 8 and over 16GB of memory. For the Backend (BE), a total CPU core count of 16, with over 64GB of memory, will work best. It's important to note that the CPU of the BE server needs to support the AVX2 instruction set to leverage StarRocks' vectorization technology.

In addition, your hardware should include a 10 Gigabit Ethernet card and a compatible 10 Gigabit switch for smooth networking.

Picking the right type of tables

First off, let's talk about the duplicated key table. This is StarRocks' default table and one of the most popular ones among users. Picture this: you've got a table creation statement on your right. See the DUPLICATE KEY keyword? After this, you can declare your sorting columns. It's a great idea to choose columns that are frequently used in your filter conditions.

Next up, we have the Aggregation table. This one is ideal when you have business scenarios involving aggregated indicator columns. Imagine you've created a data table using this table. What happens is that your indicator columns get neatly aggregated based on your definitions.

Finally, let's talk about the Primary Key table. This table shines when you're dealing with real-time or frequent updates. The Primary Key table ensures that for any given primary key, there's just one record, eliminating the need for Merge operations. This greatly improves query performance at the cost of a slight dip in write performance and memory usage.

Things to think about for query performance

Let's start with partition bucketing. This clever technique is all about slicing and dicing your data into digestible pieces to minimize data scanned. Imagine cutting your data into parts by range, then dividing these parts into tablets using a hash.

Next up, we have sorting columns, these bad boys can dramatically enhance your query efficiency. Let's say, your SQL query often uses 'site_id' and 'city_code' as filter conditions, why not use them as sorting columns? Makes sense, right?

Also enable the query cache. Typically it's not enabled by default. In certain situations, you can get a 10X in performance for your queries.

Picking the optimal data loading methodology

Now let's talk about the different import methods StarRocks offers. Got local CSV or JSON files? Use the stream load method to bring them in. HDFS files? No problem, Broker Load's got you covered. What about Kafka text or json files? Routine Load and StarRocks Kafka Connector is your friend. And if you have large-scale files with a Spark cluster, consider using SparkLoad to import data. Got data in Apache Iceberg, Apache Hive or Apache Hudi, connect to that external catalog and then use the select and insert method. But that's not all. StarRocks even allows for setting up external tables, and you can store external table data in StarRocks using the Insert into select method. This can even import MySQL tables into StarRock.

See more at https://github.com/StarRocks/starrocks/discussions/24660

Additional Content

https://www.bilibili.com/video/BV1SX4y1c7i4/?spm_id_from=333.337.search-card.all.click

alberttwong · 2023-06-02T23:35:33Z

alberttwong
Jun 2, 2023
Author

6/2/23 @ss892714028 is in process to update this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StarRocks Best Practices on Server type, Data Modeling, Query Performance, Data Loading (WIP) #22765

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

StarRocks Best Practices on Server type, Data Modeling, Query Performance, Data Loading (WIP) #22765

alberttwong Apr 28, 2023

Selecting the optimal server type

Picking the right type of tables

Things to think about for query performance

Picking the optimal data loading methodology

Additional Content

Replies: 1 comment

alberttwong Jun 2, 2023 Author

alberttwong
Apr 28, 2023

alberttwong
Jun 2, 2023
Author