AlmaLinux

Run Apache Doris Analytics on Rocky/Alma 8

Analytical databases store and manage big data, including data mining, business, market and customer data for business intelligence (BI) analysis. Analytical databases are specially optimized to provide faster queries and are designed to efficiently handle large volumes of data.

There are several examples of analytical databases, the most popular ones are SAP HANA, Oracle Database, Microsoft SQL Server Analysis Services, Google BigQuery, Apache Cassandra, Amazon Redshift etc. Today will walk through how to install and use Apache Doris Analytics Database on Rocky / AlmaLinux 8. But before that, we need to know what Apache Doris is.

What is Apache Doris?

Apache Doris, formerly known as Palo, is an open-source, easy-to-use, high-performance and real-time distributed analytical data warehousing system. It was initially developed by Baidu and was later donated to the Apache Software Foundation.

Doris is built on top of the Apache Hadoop ecosystem, including HDFS and Apache HBase, and provides a SQL interface for data querying and analysis. It supports both batch and streaming data processing and can handle both structured and semi-structured data.

The features and benefits associated with Apache Doris are:

  • Security and Access Control: It provides security and access control features to ensure that data is only accessed by authorized users.
  • Data Visualization: It supports a range of data visualization and reporting tools, including Tableau and Superset, making it easy to create visualizations and reports from data.
  • SQL Interface: Doris provides a SQL interface for data querying and analysis, making it easy for users to query and analyze data using familiar SQL syntax.
  • Multidimensional Data Modeling: Doris supports multidimensional data modelling and OLAP analysis, making it easy to analyze data across multiple dimensions.
  • Real-time Data Ingestion: It supports real-time data ingestion and processing, allowing users to perform analysis on up-to-date data.
  • Distributed Architecture: Doris is built on top of Apache Hadoop and HBase and uses a distributed architecture to scale horizontally as data volumes increase.
  • High Query Performance: It provides high-speed query performance, even on complex analytical queries, by using pre-aggregation and caching techniques.

Install Apache Doris Analytics on Rocky / Alma 8

The Apache Doris architecture comprises only two types of processes. These are:

  • Frontend (FE): user request access, query parsing and planning, metadata management, node management, etc.
  • Backend (BE): data storage and query plan execution

Both types of processes are horizontally scalable, and a single cluster can support up to hundreds of machines and tens of petabytes of storage capacity. And these two types of processes guarantee the high availability of services and high reliability of data through consistency protocols. This highly integrated architecture design greatly reduces the operation and maintenance costs of a distributed system.

Below is an illustration of the Apache Doris architecture

Apache Doris Architecture

Now let’s plunge in!. Doris requires the following:

  • Java 1.8 and above
  • GCC 4.8.2 and above
  • Centos 7.1/Ubuntu 16.04 and above

1. Install Java runtime

Doris runs on a Linux environment with a Java runtime environment installed. The minimum JDK version required is 8.

sudo yum -y install java-11-openjdk java-11-openjdk-devel

Check the installed Java version with the command:

$ java -version
openjdk version "11.0.19" 2023-04-18 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-2) (build 11.0.19+7-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-2) (build 11.0.19+7-LTS, mixed mode, sharing)

Check if your CPU supports AVX2:

lscpu | grep avx2
##OR
cat /proc/cpuinfo | grep avx2

If you see no output, then your CPU doesn’t support AVX2 and so you need to select the appropriate archive for Apache Doris.

2. Download Apache Doris

Once Java has been installed download the latest binary version of Doris from the downloads page. You can also use the below command to pull the binary.

VERSION=1.2.6

##X64 ( avx2 )
wget https://apache-doris-releases.oss-accelerate.aliyuncs.com/apache-doris-$VERSION-bin-x64.tar.xz

##X64 ( no avx2 )
wget https://apache-doris-releases.oss-accelerate.aliyuncs.com/apache-doris-$VERSION-bin-x64-noavx2.tar.xz

##ARM64
wget https://apache-doris-releases.oss-accelerate.aliyuncs.com/apache-doris-$VERSION-bin-arm64.tar.xz

Once downloaded, extract the file:

tar xf apache-doris-*.tar.xz

3. Install and Configure Apache Doris FE

Once downloaded and extracted, we will start by installing the Apache Doris FrontEnd. Navigate to the directory:

cd apache-doris-*/fe

Once here, there is a configuration file stored at conf/fe.conf in which we need to make some modifications. The two main parameters to modify here are priority_networks and meta_dir

vim conf/fe.conf

First, add the priority_networks parameter as shown:

priority_networks = 192.168.200.0/24 

Remember to replace the network parameter to match your network, then add the metadata directory.

meta_dir = ${DORIS_HOME}/doris-meta

You can also modify the JAVA_OPTS to use the desired memory value:

JAVA_OPTS="-Xmx2048m ...

# For jdk 9+, this JAVA_OPTS will be used as default JVM options
JAVA_OPTS_FOR_JDK_9="-Xmx2048m ...

Once these changes have been made, save the file then allow the required ports through the firewall:

sudo firewall-cmd --add-port={8030/tcp,9020/tcp,9030/tcp,9010/tcp} --permanent
sudo firewall-cmd --reload

Now start the Apache Doris Front End services using the command:

./bin/start_fe.sh --daemon

Verify if the service is running:

$ curl http://127.0.0.1:8030/api/bootstrap
{"msg":"success","code":0,"data":{"replayedJournalId":0,"queryPort":0,"rpcPort":0,"version":""},"count":0}

From the above output, we can see that the Apache Doris FE is running on port 8030. Now we will try and access the service via the browser using the URL http://IP_Address:8030

Apache Doris Analytics Database on RockyAlmaLinux 8

You can now log in using the built-in user root with an empty password.

Apache Doris Analytics Database on RockyAlmaLinux 8 1

In the systems info, we have no backend for Apache Doris.

Apache Doris Analytics Database on RockyAlmaLinux 8 2

You can also connect to the Doris FE using a MySQL client. Ensure that it is installed on your machine before you proceed. On Rocky Linux 8/Alma Linux 8, use the command:

sudo yum install mysql

Now connect to Doris FE using the client:

$ mysql -uroot -P9030 -h127.0.0.1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 0
Server version: 5.7.99 Doris version doris-1.2.6-rc03-Unknown

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> 

Once connected, you can check the FE information:

mysql> show frontends\G;
*************************** 1. row ***************************
             Name: 192.168.200.53_9010_1690022537740
               IP: 192.168.200.53
      EditLogPort: 9010
         HttpPort: 8030
        QueryPort: 9030
          RpcPort: 9020
             Role: FOLLOWER
         IsMaster: true
        ClusterId: 2025977646
             Join: true
            Alive: true
ReplayedJournalId: 214
    LastHeartbeat: 2023-07-22 06:53:54
         IsHelper: true
           ErrMsg: 
          Version: doris-1.2.6-rc03-Unknown
 CurrentConnected: Yes
1 row in set (0.02 sec)

4. Install and Configure Apache Doris BE

Now we need to install the Apache Doris backend, but first, we need to make some configs as we did for the FE. First, navigate to the BE directory

cd ../be

Once here, open the config file:

vim conf/be.conf

Now make the modifications as desired. The two main parameters here are priority_networks and storage_root.

Add the priority_networks parameter

priority_networks=192.168.200.0/24

Then add the BE data storage directory

storage_root_path= ${DORIS_HOME}/storage

You also need to ensure that the JAVA_HOME environment variable is set and UDF functions have been installed. Now save the file and make the below setting:

$ sudo vim /etc/sysctl.conf
vm.max_map_count=2000000

Apply the changes:

sudo sysctl -p

Also, set the maximum number of open file descriptors on your system.

$ vim ~/.bashrc
ulimit -n 65536

Source the profile:

source ~/.bashrc

Start Apache Doris BE.

./bin/start_be.sh --daemon

The next thing to do is add the backend to the cluster. First, connect to the FE using the MySQL client as we did earlier:

mysql -uroot -P9030 -h127.0.0.1

Then add the BE using the command:

ALTER SYSTEM ADD BACKEND "192.168.200.53:9050";

In the command, replace 192.168.200.53 with the be_host_ip as set in the priority_networks and 9050 as the heartbeat_service_port.

Once the command has been executed, you can check the status:

mysql> SHOW BACKENDS\G;
*************************** 1. row ***************************
              BackendId: 10003
                Cluster: default_cluster
                     IP: 192.168.200.53
          HeartbeatPort: 9050
                 BePort: 9060
               HttpPort: 8040
               BrpcPort: 8060
          LastStartTime: 2023-07-22 08:28:27
          LastHeartbeat: 2023-07-22 08:28:50
                  Alive: true
   SystemDecommissioned: false
  ClusterDecommissioned: false
              TabletNum: 0
       DataUsedCapacity: 0.000 
          AvailCapacity: 21.705 GB
          TotalCapacity: 35.022 GB
                UsedPct: 38.03 %
         MaxDiskUsedPct: 38.03 %
     RemoteUsedCapacity: 0.000 
                    Tag: {"location" : "default"}
                 ErrMsg: 
                Version: doris-1.2.6-rc03-Unknown
                 Status: {"lastSuccessReportTabletsTime":"2023-07-22 08:28:51","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false}
HeartbeatFailureCounter: 0
               NodeRole: mix
1 row in set (0.01 sec)

The backend will now be available as shown, on the web.

Apache Doris Analytics Database on RockyAlmaLinux 8 3

5. Using Apache Doris Analytics Database

Now we are set to use the Apache Doris Analytics Database as desired. First, create a database using the command:

create database demo;

You can then create a table in the database:

use demo;

CREATE TABLE IF NOT EXISTS demo.example_tbl
(
    `user_id` LARGEINT NOT NULL COMMENT "user id",
    `date` DATE NOT NULL COMMENT "",
    `city` VARCHAR(20) COMMENT "",
    `age` SMALLINT COMMENT "",
    `sex` TINYINT COMMENT "",
    `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "",
    `cost` BIGINT SUM DEFAULT "0" COMMENT "",
    `max_dwell_time` INT MAX DEFAULT "0" COMMENT "",
    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT ""
)
AGGREGATE KEY(`user_id`, `date`, `city`, `age`, `sex`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1"
);

Exit the shell and create sample data in CSV format:

$ vim test.csv
10000,2017-10-01,Nairobi,20,0,2017-10-01 06:00:00,20,10,10
10006,2017-10-01,Nairobi,20,0,2017-10-01 07:00:00,15,2,2
10001,2017-10-01,Nairobi,30,1,2017-10-01 17:05:45,2,22,22
10002,2017-10-02,Mombasa,20,1,2017-10-02 12:59:12,200,5,5
10003,2017-10-02,Dodoma,32,0,2017-10-02 11:20:00,30,11,11
10004,2017-10-01,Kampala,35,0,2017-10-01 10:00:15,100,3,3
10004,2017-10-03,Kigali,35,0,2017-10-03 10:20:22,11,6,6

Import the created data

curl  --location-trusted -u root: -T test.csv -H "column_separator:," http://127.0.0.1:8030/api/demo/example_tbl/_stream_load

The command will return the below output:

Apache Doris Analytics Database on RockyAlmaLinux 8 4

Query data on Doris Analytics Database

We can now read the imported data using the commands:

mysql> use demo;
Database changed
mysql> select * from example_tbl;
+---------+------------+---------+------+------+---------------------+------+----------------+----------------+
| user_id | date       | city    | age  | sex  | last_visit_date     | cost | max_dwell_time | min_dwell_time |
+---------+------------+---------+------+------+---------------------+------+----------------+----------------+
| 10000   | 2017-10-01 | Nairobi |   20 |    0 | 2017-10-01 06:00:00 |   20 |             10 |             10 |
| 10001   | 2017-10-01 | Nairobi |   30 |    1 | 2017-10-01 17:05:45 |    2 |             22 |             22 |
| 10002   | 2017-10-02 | Mombasa |   20 |    1 | 2017-10-02 12:59:12 |  200 |              5 |              5 |
| 10003   | 2017-10-02 | Dodoma  |   32 |    0 | 2017-10-02 11:20:00 |   30 |             11 |             11 |
| 10004   | 2017-10-01 | Kampala |   35 |    0 | 2017-10-01 10:00:15 |  100 |              3 |              3 |
| 10004   | 2017-10-03 | Kigali  |   35 |    0 | 2017-10-03 10:20:22 |   11 |              6 |              6 |
| 10006   | 2017-10-01 | Nairobi |   20 |    0 | 2017-10-01 07:00:00 |   15 |              2 |              2 |
+---------+------------+---------+------+------+---------------------+------+----------------+----------------+
7 rows in set (0.09 sec)

mysql> select * from example_tbl where city='Mombasa';
+---------+------------+---------+------+------+---------------------+------+----------------+----------------+
| user_id | date       | city    | age  | sex  | last_visit_date     | cost | max_dwell_time | min_dwell_time |
+---------+------------+---------+------+------+---------------------+------+----------------+----------------+
| 10002   | 2017-10-02 | Mombasa |   20 |    1 | 2017-10-02 12:59:12 |  200 |              5 |              5 |
+---------+------------+---------+------+------+---------------------+------+----------------+----------------+
1 row in set (0.04 sec)

mysql> select city, sum(cost) as total_cost from example_tbl group by city;
+---------+------------+
| city    | total_cost |
+---------+------------+
| Nairobi |         37 |
| Mombasa |        200 |
| Dodoma  |         30 |
| Kampala |        100 |
| Kigali  |         11 |
+---------+------------+
5 rows in set (0.05 sec)

You can also execute the above commands from the web:

Apache Doris Analytics Database on RockyAlmaLinux 8 5

6. Manage Apache Doris Analytics Services

In case you need to manage the services, you can use the below commands from the appropriate directory:

  • Stop Doris FE(switch to the fe directory) and execute
./bin/stop_fe.sh
  • To stop Doris BE(switch to the BE directory) and run
./bin/stop_be.sh

Final Thoughts

That marks the end of this guide on how to install and use Apache Doris Analytics Database on Rocky / AlmaLinux 8. You can now use it to manage big data, such as data mining, business, market and customer data for business intelligence (BI) analysis. I hope this was informative!

See more:

Related Articles

AlmaLinux Install Zimbra 10,9 on Rocky Linux 8|AlmaLinux 8|RHEL 8 AlmaLinux Install FFmpeg on Rocky Linux 9|AlmaLinux 9|Oracle Linux 9 AlmaLinux How To Install VirtualBox 7.1 on Rocky / AlmaLinux 8 Optimization Install Varnish Cache 6 for Apache/Nginx on CentOS 7

Press ESC to close