Hello,
its a VM installation of hopsworks ver 1.2
the following services are DEAD and can’t start them successfully. could you help with this issue?
Thank you
2020-08-26 23:25:52 INFO [agent/setupLogging] Hops-Kagent started.
2020-08-26 23:25:52 INFO [agent/setupLogging] Heartbeat URL: https://10.0.2.15:8181/hopsworks-api/api/agentresource?action=heartbeat
2020-08-26 23:25:52 INFO [agent/setupLogging] Host Id: hopsworks0.logicalclocks.com
2020-08-26 23:25:52 INFO [agent/setupLogging] Hostname: hopsworks0.logicalclocks.com
2020-08-26 23:25:52 INFO [agent/setupLogging] Public IP: 10.0.2.15
2020-08-26 23:25:52 INFO [agent/setupLogging] Private IP: 10.0.2.15
2020-08-26 23:25:52 INFO [service/alive] Service ndb_mgmd is alive
2020-08-26 23:25:52 INFO [service/alive] Service prometheus is alive
2020-08-26 23:25:52 INFO [service/alive] Service alertmanager is alive
2020-08-26 23:25:52 INFO [service/alive] Service node_exporter is alive
2020-08-26 23:25:52 ERROR [service/alive] Service ndbmtd is DEAD.
2020-08-26 23:25:52 ERROR [service/alive] Service mysqld is DEAD.
2020-08-26 23:25:52 INFO [service/alive] Service mysqld_exporter is alive
2020-08-26 23:25:52 ERROR [service/alive] Service airflow-webserver is DEAD.
2020-08-26 23:25:52 ERROR [service/alive] Service airflow-scheduler is DEAD.
2020-08-26 23:25:52 INFO [service/alive] Service glassfish-domain1 is alive
2020-08-26 23:25:52 INFO [service/alive] Service kagent is alive
2020-08-26 23:25:52 ERROR [service/alive] Service namenode is DEAD.
2020-08-26 23:25:52 INFO [service/alive] Service sqoop is alive
2020-08-26 23:25:52 INFO [service/alive] Service zookeeper is alive
2020-08-26 23:25:52 INFO [service/alive] Service influxdb is alive
2020-08-26 23:25:52 INFO [service/alive] Service grafana is alive
2020-08-26 23:25:52 INFO [service/alive] Service elasticsearch is alive
2020-08-26 23:25:52 INFO [service/alive] Service elastic_exporter is alive
2020-08-26 23:25:52 INFO [service/alive] Service datanode is alive
2020-08-26 23:25:52 ERROR [service/alive] Service kafka is DEAD.
2020-08-26 23:25:52 ERROR [service/alive] Service epipe is DEAD.
2020-08-26 23:25:52 INFO [service/alive] Service historyserver is alive
2020-08-26 23:25:52 ERROR [service/alive] Service resourcemanager is DEAD.
2020-08-26 23:25:52 INFO [service/alive] Service logstash is alive
2020-08-26 23:25:52 INFO [service/alive] Service kibana is alive
2020-08-26 23:25:52 ERROR [service/alive] Service hivemetastore is DEAD.
2020-08-26 23:25:52 INFO [service/alive] Service hiveserver2 is alive
2020-08-26 23:25:52 INFO [service/alive] Service livy is alive
2020-08-26 23:25:52 INFO [service/alive] Service flinkhistoryserver is alive
2020-08-26 23:25:52 ERROR [service/alive] Service nodemanager is DEAD.
2020-08-26 23:25:52 ERROR [service/alive] Service sparkhistoryserver is DEAD.
2020-08-26 23:25:52 INFO [service/alive] Service filebeat-beamjobservercluster is alive
2020-08-26 23:25:52 INFO [service/alive] Service filebeat-beamjobserverlocal is alive
2020-08-26 23:25:52 INFO [service/alive] Service filebeat-beamsdkworker is alive
2020-08-26 23:25:52 INFO [service/alive] Service filebeat-spark is alive
2020-08-26 23:25:52 INFO [service/alive] Service filebeat-kagent is alive
2020-08-26 23:25:52 INFO [service/alive] Service filebeat-tf-serving is alive
2020-08-26 23:25:52 INFO [service/alive] Service filebeat-sklearn-serving is alive
ndbd log is showing the following error:
020-08-26 22:57:12 [ndbd] INFO – Not initial start
2020-08-26 22:57:12 [ndbd] INFO – Local sysfile: Node restorable on its own, gci: 0, version: 70603
2020-08-26 22:57:12 [ndbd] INFO – Start phase 0 completed
2020-08-26 22:57:12 [ndbd] INFO – Phase 0 has made some file system initialisations
2020-08-26 22:57:12 [ndbd] WARNING – Failed to memlock pages, error: 12 (Cannot allocate memory)
2020-08-26 22:57:12 [ndbd] INFO – Watchdog KillSwitch off.
2020-08-26 22:57:12 [ndbd] INFO – Starting QMGR phase 1
2020-08-26 22:57:12 [ndbd] INFO – Starting with m_restart_seq set to 33
2020-08-26 22:57:12 [ndbd] INFO – DIH reported normal start, now starting the Node Inclusion Protocol
2020-08-26 22:57:12 [ndbd] INFO – Include node protocol completed, phase 1 in QMGR completed
2020-08-26 22:57:12 [ndbd] INFO – Start phase 1 completed
2020-08-26 22:57:12 [ndbd] INFO – Phase 1 initialised some variables and included node in cluster, locked memory if configured to do so
2020-08-26 22:57:12 [ndbd] INFO – Starting with m_restart_seq set to 33
2020-08-26 22:57:12 [ndbd] INFO – Asking master node to accept our start (we are master, GCI = 4209500)
2020-08-26 22:57:12 [ndbd] INFO – System Restart: master node: 1, num starting: 1, gci: 4209500
2020-08-26 22:57:12 [ndbd] INFO – CNTR_START_CONF: started: 0000000000000000
2020-08-26 22:57:12 [ndbd] INFO – CNTR_START_CONF: starting: 0000000000000002
2020-08-26 22:57:12 [ndbd] INFO – NDBCNTR master accepted us into cluster, start NDB start phase 1
2020-08-26 22:57:12 [ndbd] INFO – We are performing a restart of the cluster, restoring GCI = 4209500
2020-08-26 22:57:12 [ndbd] INFO – LDM(1): Started LDM restart phase 1 (read REDO log page headers to init REDO log data)
2020-08-26 22:57:12 [ndbd] INFO – Schema file initialisation Starting
2020-08-26 22:57:12 [ndbd] INFO – Schema file initialisation Completed
2020-08-26 22:57:12 [ndbd] INFO – NDB start phase 1 completed
2020-08-26 22:57:12 [ndbd] INFO – Start phase 2 completed
2020-08-26 22:57:12 [ndbd] INFO – Phase 2 did more initialisations, master accepted our start, we started REDO log initialisations
2020-08-26 22:57:12 [ndbd] INFO – Grant nodes to start phase: 3, nodes: 0000000000000002
2020-08-26 22:57:12 [ndbd] INFO – Start NDB start phase 2
2020-08-26 22:57:12 [ndbd] INFO – NDB start phase 2 completed
2020-08-26 22:57:12 [ndbd] INFO – Start phase 3 completed
2020-08-26 22:57:12 [ndbd] INFO – Phase 3 performed local connection setups
2020-08-26 22:57:12 [ndbd] INFO – Grant nodes to start phase: 4, nodes: 0000000000000002
2020-08-26 22:57:12 [ndbd] INFO – Start NDB start phase 3
2020-08-26 22:57:12 [ndbd] INFO – NDB start phase 3 completed
2020-08-26 22:57:12 [ndbd] INFO – Restart recreating table with id = 71
2020-08-26 22:57:13 [ndbd] INFO – Restart recreating table with id = 73
2020-08-26 22:57:13 [ndbd] INFO – LDM(1):Ready to start execute REDO log phase, prepare REDO log phase completed
2020-08-26 22:57:13 [ndbd] INFO – Restart recreating table with id = 72
2020-08-26 22:57:13 [ndbd] INFO – Restart recreating table with id = 74
2020-08-26 22:57:13 [ndbd] INFO – Restart recreating table with id = 1125
error: [ code: 1509 line: 24299 node: 1 count: 1 status: 0 key: 0 name: ‘’ ]
2020-08-26 22:57:13 [ndbd] INFO – Failed to restore schema during restart, error 1509.
2020-08-26 22:57:13 [ndbd] INFO – DBDICT (Line: 4824) 0x00000002
2020-08-26 22:57:13 [ndbd] INFO – Error handler shutting down system
2020-08-26 22:57:13 [ndbd] ALERT – Node 1: Forced node shutdown completed. Occured during startphase 4. Caused by error 2355: ‘Failure to restore schema(Resource configuration error). Permanent error, external action needed’.