Saturday, November 7, 2015

How to configure WSO2 DAS Fully Distributed Setup. (Cluster setup)

This blog post describes how to configure WSO2 DAS Fully Distributed Setup. (Cluster setup)
WSO2 Data Analytics Server 3.0.0 combines real-time, batch, interactive, and predictive (via machine learning) analysis of data into one integrated platform to support the multiple demands of Internet of Things (IoT) solutions, as well as mobile and Web apps. For more info
The following diagram describes the fully-distributed deployment pattern. This pattern is used as high availability deployment .
Prerequisite: 
  • Download and extract the wso2 DAS 300 (7 nodes)
  • Apache HBase and Apache HDFS cluster
  • MySQL setup 
  • SVN server to use as the deployment synchronizer.
DAS is designed to treat millions of events per second, and capable to handle Big Data volumes. Therefore we are using  Apache HBase and Apache HDFS as  underlying Data Access Layer (DAL) in DAS. The HBase DAL component uses Apache HBase for storing events (Analytics Record Store), and HDFS (the distributed file system used by Apache Hadoop) for storing index information (Analytics File System). To use this HBase DAL component, a pre-configured installation of Apache HBase (version 1.0.0 and upwards) running on top of Apache Hadoop (version 2.6.0 and upwards) is required. It's required that all HBase/HDFS nodes and all DAS nodes are time synced.
If you are not interested in using Apache HBase and Apache HDFS as Data Store you can use RDBMS. In this blog post I'm only focusing on Apache HBase and Apache HDFS as DAS Data Store.
Please note that for each node I only use one DAS pack,it means I use 7 DAS packs in 7 node , so no offset is needed.
You need to off set each node if you need to setup a DAS cluster in a single machine or want to host multiple nodes in a single machine . To avoid port conflicts change the the following property in carbon .xml.
 <DAS_HOME>/repository/conf/carbon.xml    
 <Offset>0</Offset>

Database configuration

We are using mysql for all carbon related databases, analytics processed record store and metrics db.
1. Create all necessary databases.
create database dasreceiver1; // DAS receiver Node1 local database  
create database dasreceiver2; // DAS receiver Node2 local database  
create database dasanalyzer1; // DAS analyzer Node1 local database  
create database dasanalyzer2; // DAS analyzer Node2 local database  
create database dasindexer1; // DAS indexer Node1 local database  
create database dasindexer2; // DAS indexer Node2 local database  
create database dasdashboard; // DAS dashboard Node local database  
create database regdb; // Registry DB that used to mount to all DAS nodes  
create database userdb; //User DB, that will be shared with all DAS and Gregreate  
create database metrics; // Metrics DB, that will be used for WSO2 Carbon Metrics  
create database analytics_processed_data_store; // This will use for store analytics processed record

2. In each node (all 7 nodes),  add the following database configuration.
Open master-datasources.xml

DAS_HOME/repository/conf/datasources/master-datasources.xml  
Please note that when changing WSO2_CARBON_DB use relevant DB for relevant node .(For eg; receiver1 node use dasreceiver1 db as WSO2_CARBON_DB )

<datasources-configuration xmlns:svns="http://org.wso2.securevault/configuration">
  
    <providers>
        <provider>org.wso2.carbon.ndatasource.rdbms.RDBMSDataSourceReader</provider>
    </providers>
  
    <datasources>
      <datasource>
            <name>WSO2_CARBON_DB</name>
            <description>The datasource used for registry and user manager</description>
            <jndiConfig>
                <name>jdbc/WSO2CarbonDB</name>
            </jndiConfig>
            <definition type="RDBMS">
                <configuration>
                   <url>jdbc:mysql://10.100.7.53:3306/dasreceiver1</url>
                    <username>root</username>
                    <password>root</password>
                    <driverClassName>com.mysql.jdbc.Driver</driverClassName>
                    <maxActive>50</maxActive>
                    <maxWait>60000</maxWait>
                    <testOnBorrow>true</testOnBorrow>
                    <validationQuery>SELECT 1</validationQuery>
                    <validationInterval>30000</validationInterval>
                    <defaultAutoCommit>false</defaultAutoCommit>
                </configuration>
            </definition>
        </datasource>

  <datasource>
            <name>WSO2_DAS_UM</name>
            <description>The datasource used for registry and user manager</description>
            <jndiConfig>
                <name>jdbc/WSO2_DAS_UM</name>
            </jndiConfig>
            <definition type="RDBMS">
                <configuration>
                    <url>jdbc:mysql://10.100.7.53:3306/userdb</url>
                    <username>root</username>
                    <password>root</password>
                    <driverClassName>com.mysql.jdbc.Driver</driverClassName>
                    <maxActive>50</maxActive>
                    <maxWait>60000</maxWait>
                    <testOnBorrow>true</testOnBorrow>
                    <validationQuery>SELECT 1</validationQuery>
                    <validationInterval>30000</validationInterval>
                    <defaultAutoCommit>false</defaultAutoCommit>
                </configuration>
            </definition>
        </datasource>

 <datasource>
            <name>WSO2_DAS_REG</name>
            <description>The datasource used for registry and user manager</description>
            <jndiConfig>
                <name>jdbc/WSO2_DAS_REG</name>
            </jndiConfig>
            <definition type="RDBMS">
                <configuration>
                    <url>jdbc:mysql://10.100.7.53:3306/regdb</url>
                    <username>root</username>
                    <password>root</password>
                    <driverClassName>com.mysql.jdbc.Driver</driverClassName>
                    <maxActive>50</maxActive>
                    <maxWait>60000</maxWait>
                    <testOnBorrow>true</testOnBorrow>
                    <validationQuery>SELECT 1</validationQuery>
                    <validationInterval>30000</validationInterval>
                    <defaultAutoCommit>false</defaultAutoCommit>
                </configuration>
            </definition>
        </datasource>
    </datasources>

</datasources-configuration>
  • Open metrics-datasources.xml and add the below configurations.
DAS_HOME/repository/conf/datasources/metrics-datasources.xml  


<datasources-configuration xmlns:svns="http://org.wso2.securevault/configuration">

    <providers>
        <provider>org.wso2.carbon.ndatasource.rdbms.RDBMSDataSourceReader</provider>
    </providers>

    <datasources>
    <!-- MySQL -->
          <datasource>
            <name>WSO2_METRICS_DB</name>
            <jndiConfig>
                <name>jdbc/WSO2MetricsDB</name>
            </jndiConfig>
            <definition type="RDBMS">
                 <configuration>
                   <url>jdbc:mysql://10.100.7.53:3306/metrics</url>
                    <username>root</username>
                    <password>root</password>
                    <driverClassName>com.mysql.jdbc.Driver</driverClassName>
                    <maxActive>60</maxActive>
                    <maxWait>60000</maxWait>
                    <testOnBorrow>true</testOnBorrow>
                    <validationQuery>SELECT 1</validationQuery>
                    <validationInterval>30000</validationInterval>
                    <defaultAutoCommit>false</defaultAutoCommit>
                </configuration>
            </definition>
        </datasource>
    </datasources>
</datasources-configuration>
  • Open analytics-datasources.xml and add the below configurations.

DAS_HOME/repository/conf/datasources/analytics-datasources.xml  
Uncomment the HBase HDFSDataSourceReader, HbaseDataSourceReader as shown below.


  <providers>
     <provider>org.wso2.carbon.ndatasource.rdbms.RDBMSDataSourceReader</provider>
     <provider>org.wso2.carbon.datasource.reader.hadoop.HDFSDataSourceReader</provider>
     <provider>org.wso2.carbon.datasource.reader.hadoop.HBaseDataSourceReader</provider>
     <!--<provider>org.wso2.carbon.datasource.reader.cassandra.CassandraDataSourceReader</provider>-->
    </providers>
By specifying the datasource configurations as follows, you can configure HBase Datasource ( setting up a connection to a remote HBase instance.) Please comment RDBMS specific configuration for WSO2_ANALYTICS_RS_DB_HBASE .


        <datasource>
            <name>WSO2_ANALYTICS_RS_DB_HBASE</name>
            <description>The datasource used for analytics file system</description>
            <jndiConfig>
                <name>jdbc/WSO2HBaseDB</name>
            </jndiConfig>
            <definition type="HBASE">
                <configuration>
                    <property>
                        <name>hbase.master</name>
                        <value>das300-hdfs-master:60000</value>
                    </property>
                     <property>
                        <name>hbase.zookeeper.quorum</name>
                        <value>das300-hdfs-master,das300-hdfs-slave1,das300-hdfs-slave2</value>
                    </property>
                    <property>
                        <name>fs.hdfs.impl</name>
                        <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
                    </property>
                    <property>
                        <name>fs.file.impl</name>
                        <value>org.apache.hadoop.fs.LocalFileSystem</value>
                    </property>
                </configuration>
            </definition>
        </datasource>
By specifying the datasource configurations as follows you can configure HDFS Datasource ( setting up a connection to a remote HDFS instance.) Please comment RDBMS specific configuration for WSO2_ANALYTICS_FS_DB_HDFS.


    <datasource>
            <name>WSO2_ANALYTICS_FS_DB_HDFS</name>
            <description>The datasource used for analytics file system</description>
            <jndiConfig>
                <name>jdbc/WSO2HDFSDB</name>
            </jndiConfig>
            <definition type="HDFS">
                <configuration>
                    <property>
                        <name>fs.default.name</name>
                        <value>hdfs://das300-hdfs-master:9000</value>
                    </property>
                    <property>
                        <name>dfs.data.dir</name>
                        <value>/dfs/data</value>
                    </property>
                    <property>
                        <name>fs.hdfs.impl</name>
                        <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
                    </property>
                    <property>
                        <name>fs.file.impl</name>
                        <value>org.apache.hadoop.fs.LocalFileSystem</value>
                    </property>
                </configuration>
            </definition>
        </datasource>
By specifying the datasource configurations as follows you can configure RDBMS Datasource for Analytics processed data store.


        <datasource>
            <name>WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB</name>
            <description>The datasource used for analytics record store</description>
            <definition type="RDBMS">
                <configuration>
                    <url>jdbc:mysql://10.100.7.53:3306/analytics_processed_data_store</url>
                    <username>root</username>
                    <password>root</password>
                    <driverClassName>com.mysql.jdbc.Driver</driverClassName>
                    <maxActive>50</maxActive>
                    <maxWait>60000</maxWait>
                    <testOnBorrow>true</testOnBorrow>
                    <validationQuery>SELECT 1</validationQuery>
                    <validationInterval>30000</validationInterval>
                    <defaultAutoCommit>false</defaultAutoCommit>
                </configuration>
            </definition>
        </datasource>
  • In each node ,you have mapped habse/hdfs host names and ips in host file.

192.168.48.167 das300-hdfs-master
192.168.48.172 das300-hdfs-slave2
192.168.48.168 das300-hdfs-slave1
Other configurations
1. Open carbon.xml file and  do the below configurations.
 <DAS_HOME>/repository/conf/carbon.xml
  • Add the necessary host names

<HostName>das.qa.wso2.receiver1</HostName>
<MgtHostName>mgt.das.qa.wso2.receiver</MgtHostName>
  • Below changes must do in order to set the Deployment synchronization . Note that AutoCommit option is set to true in the only in one receiver node .

    <DeploymentSynchronizer>
        <Enabled>true</Enabled>
        <AutoCommit>false</AutoCommit>
        <AutoCheckout>true</AutoCheckout>
        <RepositoryType>svn</RepositoryType>
        <SvnUrl>http://xxx.xx.x/svn/das300rc_repo</SvnUrl>
        <SvnUser>xxx</SvnUser>
        <SvnPassword>xxx</SvnPassword>
        <SvnUrlAppendTenantId>true</SvnUrlAppendTenantId>
    </DeploymentSynchronizer>
2. Open axis2.xml file and  do the below configurations.
DAS_HOME/repository/conf/axis2/axis2.xml
  • Enable the hazelcast clustering

<clustering class="org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent" enable="true">
  • Change the membershipScheme to wka

<parameter name="membershipScheme">wka</parameter>
  • Change the domain, all cluster nodes will join to same domain.

<parameter name="domain">wso2.qa.das.domain</parameter>
  • Change the local member host by adding the IP of the each node with port.

<parameter name="localMemberHost">192.168.48.205</parameter>
<parameter name="localMemberPort">4000</parameter>
  • Add all other well known members with ports. (other 6 nodes IP and port)

        <members>
            <member>
                <hostName>192.168.48.21</hostName>
                <port>4000</port>
            </member>
            <member>
                <hostName>192.168.48.22</hostName>
                <port>4000</port>
            </member>
            <member>
                <hostName>192.168.48.23</hostName>
                <port>4000</port>
            </member>
            <member>
                <hostName>192.168.48.24</hostName>
                <port>4000</port>
            </member>
            <member>
                <hostName>192.168.48.25</hostName>
                <port>4000</port>
            </member>
        </members>
2. Open registry.xml file and  do the below configurations for registry mounting.
 <DAS_HOME>/repository/conf/registry.xml   


<wso2registry>
 <currentDBConfig>wso2registry</currentDBConfig>
    <readOnly>false</readOnly>
    <enableCache>true</enableCache>
    <registryRoot>/</registryRoot>

    <dbConfig name="wso2registry">
        <dataSource>jdbc/WSO2CarbonDB</dataSource>
    </dbConfig>

    <dbConfig name="govregistry">
       <dataSource>jdbc/WSO2_DAS_REG</dataSource>
    </dbConfig>
  <remoteInstance url="https://localhost">
    <id>gov</id>
    <cacheId>root@jdbc:mysql://10.100.7.53:3306:3306/regdb</cacheId>
    <dbConfig>govregistry</dbConfig>
    <readOnly>false</readOnly>
    <enableCache>true</enableCache>
    <registryRoot>/</registryRoot>
 </remoteInstance>

  <mount path="/_system/governance" overwrite="true">
   <instanceId>gov</instanceId>
   <targetPath>/_system/governance</targetPath>
  </mount>
 <mount path="/_system/config" overwrite="true">
   <instanceId>gov</instanceId>
   <targetPath>/_system/config</targetPath>
 </mount>
3. Open user-mgt.xml file and  do the below configurations.
 <DAS_HOME>/repository/conf/user-mgt.xml


<Property name="dataSource">jdbc/WSO2_DAS_UM</Property>
Analyzer nodes related configurations (for apache Spark) .
Here we are using 2 analyzer nodes , as we are using this setup as a highly available setup we need to have 2 spark master nodes. For that need to change master count as 2 in spark-defaults.conf files 2 anlyser nodes .

<DAS_home>/repository/conf/analytics/spark/spark-defaults.conf
(Basically when one node goes down other node automatically start as spark master.( Fail over situation )

carbon.spark.master.count  2
You need to create a symbolic link in each DAS node as  a clustered DAS deployment, the directory path for the Spark Class path is different for each node depending on the location of the <DAS_HOME>. The symbolic link redirects the Spark Driver Application to the relevant directory for each node when it creates the Spark class path. Therefore, you need add a symbolic path for both analyzer nodes in spark-defaults.conf.

carbon.das.symbolic.link /home/das_symlink
Note using following command you can create symbolic path in linux.

ln-s /path/to/file /path/to/symlink 
eg:
sudo ln -s /home/analyzer1/analyzer  /home/das_symlink
Starting the cluster
When starting the instances you can provide predefined profiles to start the instances as receiver nodes, analyzer nodes or Indexer node.
./wso2server.sh -receiverNode start
./wso2server.sh -analyzerNode start
./wso2server.sh -receiverNode start
When you starting dashboard node you need to add some start up parameters to  wso2server.sh file

<DAS_home>/bin/wso2server.sh
Add following highlighted parameters and start the sever.

$JAVACMD \
-Xbootclasspath/a:"$CARBON_XBOOTCLASSPATH" \
-Xms256m -Xmx1024m -XX:MaxPermSize=256m \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath="$CARBON_HOME/repository/logs/heap-dump.hprof" \
$JAVA_OPTS \
-Dcom.sun.management.jmxremote \
-classpath "$CARBON_CLASSPATH" \
-Djava.endorsed.dirs="$JAVA_ENDORSED_DIRS" \
-Djava.io.tmpdir="$CARBON_HOME/tmp" \
-Dcatalina.base="$CARBON_HOME/lib/tomcat" \
-Dwso2.server.standalone=true \
-Dcarbon.registry.root=/ \
-Djava.command="$JAVACMD" \
-Dcarbon.home="$CARBON_HOME" \
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager \
-Dcarbon.config.dir.path="$CARBON_HOME/repository/conf" \
-Djava.util.logging.config.file="$CARBON_HOME/repository/conf/etc/logging-bridge.properties" \
-Dcomponents.repo="$CARBON_HOME/repository/components/plugins" \
-Dconf.location="$CARBON_HOME/repository/conf"\
-Dcom.atomikos.icatch.file="$CARBON_HOME/lib/transactions.properties" \
-Dcom.atomikos.icatch.hide_init_file_path=true \
-Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false \
-Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true \
-Dcom.sun.jndi.ldap.connect.pool.authentication=simple \
-Dcom.sun.jndi.ldap.connect.pool.timeout=3000 \
-Dorg.terracotta.quartz.skipUpdateCheck=true \
-Djava.security.egd=file:/dev/./urandom \
-Dfile.encoding=UTF8 \
-Djava.net.preferIPv4Stack=true \
-Dcom.ibm.cacheLocalHost=true \
-DdisableAnalyticsStats=true \
-DdisableEventSink=true \
-DisableIndexThrottling=true \
-DenableAnalyticsStats=true\
-DdisableAnalyticsEngine=true \
-DdisableAnalyticsExecution=true \
-DdisableIndexing=true \
-DdisableDataPurging=true \
-DdisableAnalyticsSparkCtx=true \
-DdisableAnalyticsStats=true \
org.wso2.carbon.bootstrap.Bootstrap $*
status=$?
done
If you have not already created the necessary tables in the DBs, you can start the servers with the –Dsetup. Eg:

./wso2server.sh -receiverNode -Dsetup start

Start  servers according to  following sequence.
1 Receiver Nodes
2 Analyzer Nodes
3 Indexer Nodes
4 Dashboard Node

If cluster is successfully setup, Members successfully join cluster  message can be seen in  the carbon log

TID:
[-1] [] [2015-10-21 12:07:50,815]  INFO
{org.wso2.carbon.core.clustering.hazelcast.wka.WKABasedMembershipScheme}
-  Member joined [6974cb1c-8403-4711-9408-9de0cfaadda2]:
/192.168.48.25:4000
{org.wso2.carbon.core.clustering.hazelcast.wka.WKABasedMembershipScheme}
TID:
[-1] [] [2015-10-21 12:08:05,043]  INFO
{org.wso2.carbon.core.clustering.hazelcast.wka.WKABasedMembershipScheme}
-  Member joined [3ebbg27b-91db-4d98-8c8a-95e2604e3a9c]:
/192.168.48.25:4000

0 comments :

Post a Comment