Error code: 1 message: Configuration error: Error: Could not alloc node id at XXX Connection done from wrong host ip

Caused by com.mysql.clusterj.ClusterJDatastoreException:Datastore exception on connectString ‘10.229.84.24’ nodeId 0; Return code: -1 error code: 1 message: Configuration error: Error: Could not alloc node id at 10.229.84.24 port 1186: Connection done from wrong host ip 10.229.84.24…
2023-12-29 10:14:40,259 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
io.hops.exception.StorageInitializtionException: Error getting connection to cluster
at io.hops.metadata.ndb.NdbStorageFactory.setConfiguration(NdbStorageFactory.java:73)
at io.hops.metadata.HdfsStorageFactory.setConfiguration(HdfsStorageFactory.java:123)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:565)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:820)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:808)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1219)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1245)
2023-12-29 10:14:40,261 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: io.hops.exception.StorageInitializtionException: Error getting connection to cluster
2023-12-29 10:14:40,266 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

PS:my linux system information is as bellow:
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
Codename: Core

My package version is as bellow:
hops-metadata-dal-3.2.0.4
hops-metadata-dal-impl-ndb-3.2.0.4
mysql-cluster-8.2.0-linux-glibc2.17-x86_64

My hops install version is mater(GitHub - hopshadoop/hops: Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.

My ini file on manage node:
/var/lib/mysql-cluster/config.ini

[ndbd default]

Options affecting ndbd processes on all data nodes:

NoOfReplicas=2 # Number of fragment replicas
DataMemory=80M # How much memory to allocate for data storage
IndexMemory=18M # How much memory to allocate for index storage
# For DataMemory and IndexMemory, we have used the
# default values. Since the “world” database takes up
# only about 500KB, this should be more than enough for
# this example NDB Cluster setup.
# NOTE: IndexMemory is deprecated in NDB 7.6 and later; in
# these versions, resources for all data and indexes are
# allocated by DataMemory and any that are set for IndexMemory
# are added to the DataMemory resource pool
#ServerPort=2202 # This the default value; however, you can use any
# port that is free for all the hosts in the cluster
# Note1: It is recommended that you do not specify the port
# number at all and simply allow the default value to be used
# instead
# Note2: The port was formerly specified using the PortNumber
# TCP parameter; this parameter is no longer available in NDB
# Cluster 7.5.

[ndb_mgmd]

Management process options:

HostName=10.229.84.24 # Hostname or IP address of management node
DataDir=/var/lib/mysql-cluster # Directory for management node log files

[ndbd]

Options for data node “A”:

                            # (one [ndbd] section per data node)

HostName=10.229.84.25 # Hostname or IP address
NodeId=2 # Node ID for this data node
DataDir=/usr/local/mysql/data # Directory for this data node’s data files

[ndbd]

Options for data node “B”:

HostName=10.229.84.26 # Hostname or IP address
NodeId=3 # Node ID for this data node
DataDir=/usr/local/mysql/data # Directory for this data node’s data files
[mysqld]

SQL node options:

HostName=10.229.84.22 # Hostname or IP address
# (additional mysqld connections can be
# specified for this node for various
# purposes such as running ndb_restore)

[mysqld]

SQL node options:

HostName=10.229.84.23 # Hostname or IP address

                            # (additional mysqld connections can be
                            # specified for this node for various
                            # purposes such as running ndb_restore)

[Computer]
HostName=10.229.84.24
Id=wt

My hops-ndb-config.properties
com.mysql.clusterj.connectstring=10.229.84.24
com.mysql.clusterj.database=hops_db
com.mysql.clusterj.connection.pool.size=1
com.mysql.clusterj.max.transactions=1024

#determines the number of seconds to wait until the first “live” node is detected.
#If this amount of time is exceeded with no live nodes detected,
#then the method immediately returns a negative value. Default=30
com.mysql.clusterj.connect.timeout.before=30

#determines the number of seconds to wait after the first “live” node is
#detected for all nodes to become active. If this amount of time is exceeded
#without all nodes becoming active, then the method immediately returns a
#value greater than zero. Default=20
com.mysql.clusterj.connect.timeout.after=20

#The number of seconds to wait for all sessions to be closed when reconnecting a SessionFactory
#due to network failures. The default, 0, indicates that the automatic reconnection to the cluster
#due to network failures is disabled. Reconnection can be enabled by using the method
#SessionFactory.reconnect(int timeout) and specifying a new timeout value.
com.mysql.clusterj.connection.reconnect.timeout=5

#clusterj caching
#set io.hops.enable.clusterj.dto.cache and io.hops.enable.clusterj.session.cache to use dto and session caching provided by clusterj
io.hops.enable.clusterj.dto.cache=false
io.hops.enable.clusterj.session.cache=false

com.mysql.clusterj.max.cached.instances=0
com.mysql.clusterj.max.cached.sessions=0
com.mysql.clusterj.warmup.cached.sessions=0

io.hops.metadata.ndb.mysqlserver.data_source_class_name = com.mysql.cj.jdbc.MysqlDataSource
io.hops.metadata.ndb.mysqlserver.host=10.229.84.22
io.hops.metadata.ndb.mysqlserver.port=3306
io.hops.metadata.ndb.mysqlserver.username= username
io.hops.metadata.ndb.mysqlserver.password= password
io.hops.metadata.ndb.mysqlserver.connection_pool_size=1
io.hops.metadata.ndb.mysqlserver.useSSL=false

#size of the session pool. should be altreat as big as the number of active RPC handling Threads in the system
io.hops.session.pool.size=1000

#Session is reused Random.getNextInt(0,io.hops.session.reuse.count) times and then it is GCed
#use smaller values if using java 6.
#if you use java 7 or higer then use G1GC and there is no need to close sessions. use Int.MAX_VALUE
io.hops.session.reuse.count=2147483647

We just follow the instuction manually on GitHub - hopshadoop/hops: Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.;
We have blocked for more than a week here now; could you give any suggestions?

Hello,

Looking at the config.ini you are posted it seems you are missing API slots [API] to be used by the Namenode

Regards,
Mahmoud

Hi, thank you for your helpful suggestion, it’s OK now !! :grinning: :heart:
However, we are currently facing another issue as bellow, could you give us any suggestions?

package version :
clusterj-hops-fix-7.6.10.jar hadoop-kms-3.2.0.4.jar hops-metadata-dal-impl-ndb-3.2.0.4.jar mysql-connector-java-8.0.11.jar
hadoop-common-3.2.0.4.jar hadoop-nfs-3.2.0.4.jar
hadoop-common-3.2.0.4-tests.jar hops-metadata-dal-3.2.0.4.jar

test step :slight_smile:
we now install ndb cluster on node23(bdb sql server node) , node24 ( ndb mamager node), node25 (ndb data node); and all of them are started;
Then we install dal and hadoop on node23;
Then we configure hops-ndb-config.properties;
and the last we use commond ‘start.dfs.sh’ to install hopsfs .

But as we don’t have any doc to follow that we have no idea with how to contine now:

//ERROR LOG:
2024-01-05 16:04:25,825 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting
2024-01-05 16:04:25,827 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Leader Node RPC up at: 192.168.1.23/192.168.1.23:9000
2024-01-05 16:04:25,934 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000, call Call#1476 Retry#0 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.versionRequest from 192.168.1.23:41326
java.io.IOException: Leader Node still not started
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1372)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.versionRequest(NameNodeRpcServer.java:1106)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.versionRequest(DatanodeProtocolServerSideTranslatorPB.java:277)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:36000)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:868)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:814)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1821)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2900)
2024-01-05 16:04:26,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state
2024-01-05 16:04:26,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest edits from old active before taking over writer role in edits logs
2024-01-05 16:04:26,018 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all datandoes as stale
2024-01-05 16:04:26,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication and invalidation queues
2024-01-05 16:04:26,018 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: initializing replication queues
2024-01-05 16:04:26,026 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
2024-01-05 16:04:26,064 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: processMisReplicated read 0/50000 in the Ids range [0 - 50000] (max inodeId when the process started: 1)

Hope for your kindly suggestion!