i use the installer.sh method every time, here is the last try, it fails on the same step again
Uninstall
[fmarines@per320-2 cluster]$ ./hopsworks-installer.sh
Karamel/Hopsworks Installer, Copyright© 2020 Logical Clocks AB. All rights reserved.
This program can install Karamel/Chef and/or Hopsworks.
To cancel installation at any time, press CONTROL-C
You appear to have following setup on this host:
- available memory: 46
- available disk space (on ‘/’ root partition): 18G
- available disk space (under ‘/mnt’ partition):
- available CPUs: 20
- available GPUS: 4
- your ip is: 192.168.0.230
- installation user: fmarines
- linux distro: centos
- cluster defn branch: https://raw.githubusercontent.com/logicalclocks/karamel-chef/1.3
- hopsworks-chef branch: logicalclocks/hopsworks-chef/1.3
WARNING: We recommend at least 60GB of disk space on the root partition. Minimum is 50GB of available disk.
You have 18G space on ‘/’, and no space on ‘/mnt’.
./hopsworks-installer.sh: line 213: -1: substring expression < 0
-------------------- Installation Options --------------------
What would you like to do?
(1) Install a single-host Hopsworks cluster.
(2) Install a single-host Hopsworks cluster with TLS enabled.
(3) Install a multi-host Hopsworks cluster with TLS enabled.
(4) Install an Enterprise Hopsworks cluster.
(5) Install an Enterprise Hopsworks cluster with Kubernetes
(6) Install and start Karamel.
(7) Install Nvidia drivers and reboot server.
(8) Purge (uninstall) Hopsworks from this host.
(9) Purge (uninstall) Hopsworks from ALL hosts.
Please enter your choice 1, 2, 3, 4, 5, 6, 7, 8, 9, q (quit), or h (help) : 8
Press ENTER to continue
Shutting down services…
2020-08-13 13:30:23 INFO [agent/setupLogging] Hops-Kagent started.
2020-08-13 13:30:23 INFO [agent/setupLogging] Heartbeat URL: https://hopsworks.glassfish.service.consul:443/hopsworks-api/api/agentresource?action=heartbeat
2020-08-13 13:30:23 INFO [agent/setupLogging] Host Id: PER320-2
2020-08-13 13:30:23 INFO [agent/setupLogging] Hostname: PER320-2
2020-08-13 13:30:23 INFO [agent/setupLogging] Public IP: 192.168.0.230
2020-08-13 13:30:23 INFO [agent/setupLogging] Private IP: 192.168.0.230
2020-08-13 13:30:24 INFO [service/stop] Stopped service: namenode
2020-08-13 13:30:24 INFO [service/stop] Stopped service: sqoop
2020-08-13 13:30:24 INFO [service/stop] Stopped service: elastic_exporter
2020-08-13 13:30:25 INFO [service/stop] Stopped service: elasticsearch
2020-08-13 13:30:25 INFO [service/stop] Stopped service: grafana
2020-08-13 13:30:25 INFO [service/stop] Stopped service: influxdb
2020-08-13 13:30:25 INFO [service/stop] Stopped service: consul
2020-08-13 13:30:25 INFO [service/stop] Stopped service: kagent
2020-08-13 13:30:31 INFO [service/stop] Stopped service: glassfish-domain1
2020-08-13 13:30:32 INFO [service/stop] Stopped service: airflow-scheduler
2020-08-13 13:32:02 INFO [service/stop] Stopped service: airflow-webserver
2020-08-13 13:32:02 INFO [service/stop] Stopped service: mysqld_exporter
2020-08-13 13:32:07 INFO [service/stop] Stopped service: mysqld
2020-08-13 13:32:08 INFO [service/stop] Stopped service: ndbmtd
2020-08-13 13:32:08 INFO [service/stop] Stopped service: nvml_monitor
2020-08-13 13:32:08 INFO [service/stop] Stopped service: node_exporter
2020-08-13 13:32:08 INFO [service/stop] Stopped service: prometheus
2020-08-13 13:32:08 INFO [service/stop] Stopped service: alertmanager
2020-08-13 13:32:09 INFO [service/stop] Stopped service: ndb_mgmd
Killing karamel…
Removing karamel…
Removing cookbooks…
Purging old installation…
[fmarines@per320-2 cluster]$ systemctl |grep failed
● airflow-webserver.service loaded failed failed Airflow webserver daemon
● consul.service loaded failed failed “HashiCorp Consul - A service mesh solution”
● elasticsearch.service loaded failed failed Elasticsearch daemon.
● flinkhistoryserver.service loaded failed failed Flink historyserver
● namenode.service loaded failed failed NameNode server for HDFS.
● sqoop.service loaded failed failed Sqoop server
[fmarines@per320-2 cluster]$ sudo systemctl disable airflow-webserver.service
Removed symlink /etc/systemd/system/multi-user.target.wants/airflow-webserver.service.
[fmarines@per320-2 cluster]$ sudo systemctl disable consul.service
Removed symlink /etc/systemd/system/multi-user.target.wants/consul.service.
[fmarines@per320-2 cluster]$ sudo systemctl disable sqoop.service
Removed symlink /etc/systemd/system/multi-user.target.wants/sqoop.service.
[fmarines@per320-2 cluster]$ sudo systemctl disable namenode.service
Removed symlink /etc/systemd/system/multi-user.target.wants/namenode.service.
[fmarines@per320-2 cluster]$ sudo systemctl disable flinkhistoryserver.service
Removed symlink /etc/systemd/system/multi-user.target.wants/flinkhistoryserver.service.
[fmarines@per320-2 cluster]$ sudo systemctl disable elasticsearch.service
Removed symlink /etc/systemd/system/multi-user.target.wants/elasticsearch.service.
[fmarines@per320-2 cluster]$
[fmarines@per320-2 cluster]$ systectl |grep failed
bash: systectl: command not found…
[fmarines@per320-2 cluster]$ systemctl |grep failed
● airflow-webserver.service loaded failed failed Airflow webserver daemon
● consul.service loaded failed failed “HashiCorp Consul - A service mesh solution”
● elasticsearch.service loaded failed failed Elasticsearch daemon.
● flinkhistoryserver.service loaded failed failed Flink historyserver
● namenode.service loaded failed failed NameNode server for HDFS.
● sqoop.service loaded failed failed Sqoop server
[fmarines@per320-2 cluster]$ sudo systemctl reset-failed
[fmarines@per320-2 cluster]$ more /etc/init.d/
devtoolset-8-stap-server functions netconsole README
devtoolset-8-systemtap jexec network
[fmarines@per320-2 cluster]$ more /etc/init.d/
Re-install
Found karamel
Running command from /extend1/cluster/karamel-0.6:
setsid ./bin/karamel -headless -launch …/cluster-defns/hopsworks-installer-active.yml > …/installation.log 2>&1 &
Installation has started, but may take 1 hour or more…
The Karamel installer UI will soon start at: http://192.168.0.230:9090/index.html
Note: port 9090 must be open for external traffic and Karamel will shutdown when installation finishes.
=====================================================================
You can view the installation logs with this command:
tail -f installation.log
[fmarines@per320-2 cluster]$ tail -f installation.log
time later from installation.log file
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH…
20/08/13 15:10:59 WARN util.NativeCodeLoader: Loaded the native-hadoop library
20/08/13 15:10:59 WARN ha.FailoverProxyHelper: Failed to get list of NN from default NN. Default NN was hdfs://rpc.namenode.service.consul:8020
20/08/13 15:10:59 WARN hdfs.DFSUtil: Could not resolve Service
com.logicalclocks.servicediscoverclient.exceptions.ServiceNotFoundException: Error: host not found Could not find service ServiceQuery(name=rpc.namenode.service.consul, tags=[])
at com.logicalclocks.servicediscoverclient.resolvers.DnsResolver.getSRVRecordsInternal(DnsResolver.java:112)
at com.logicalclocks.servicediscoverclient.resolvers.DnsResolver.getSRVRecords(DnsResolver.java:98)
at com.logicalclocks.servicediscoverclient.resolvers.DnsResolver.getService(DnsResolver.java:71)
at org.apache.hadoop.hdfs.DFSUtil.getNameNodesRPCAddressesFromServiceDiscovery(DFSUtil.java:822)
at org.apache.hadoop.hdfs.DFSUtil.getNameNodesRPCAddressesAsURIs(DFSUtil.java:772)
at org.apache.hadoop.hdfs.DFSUtil.getNameNodesRPCAddressesAsURIs(DFSUtil.java:764)