Error in Hopsworks on-premise installation - hops::nn

Hi,

I’m getting an error while installing hopsworks (using hopsworks-installer.sh script). It’s keep on failing on hops::nn task.

Previously, all the components were installed (then Karmel UI status showed disconnected) but I wasn’t able to open up the hopsworks UI so I have started the installation again (after purging the previous installation using ./hopsworks-installer.sh -i purge -ni)

Logs::
Generic options supported are
-conf specify an application configuration file
-D <property=value> use value for given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides ‘fs.defaultFS’ property from configurations.
-jt <local|resourcemanager:port> specify a ResourceManager
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
command [genericOptions] [commandOptions]

Usage: hadoop fs [generic options] -chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…
---- End output of “bash” “/tmp/chef-script20201006-6349-4s17t7” ----
Ran “bash” “/tmp/chef-script20201006-6349-4s17t7” returned 255
ot find service ServiceQuery(name=rpc.namenode.service.consul, tags=[])
e[0m at com.logicalclocks.servicediscoverclient.resolvers.DnsResolver.getSRVRecordsInternal(DnsResolver.java:112)
e[0m at com.logicalclocks.servicediscoverclient.resolvers.DnsResolver.getSRVRecords(DnsResolver.java:98)
e[0m at com.logicalclocks.servicediscoverclient.resolvers.DnsResolver.getService(DnsResolver.java:71)
e[0m at org.apache.hadoop.hdfs.DFSUtil.getNameNodesRPCAddressesFromServiceDiscovery(DFSUtil.java:822)
e[0m at org.apache.hadoop.hdfs.DFSUtil.getNameNodesRPCAddressesAsURIs(DFSUtil.java:772)
e[0m at org.apache.hadoop.hdfs.DFSUtil.getNameNodesRPCAddressesAsURIs(DFSUtil.java:764)
e[0m at org.apache.hadoop.hdfs.DFSUtil.getNameNodesRPCAddressesAsURIs(DFSUtil.java:757)
e[0m at org.apache.hadoop.hdfs.server.namenode.ha.FailoverProxyHelper.getActiveNamenodes(FailoverProxyHelper.java:100)
e[0m at org.apache.hadoop.hdfs.server.namenode.ha.HopsRandomStickyFailoverProxyProvider.(HopsRandomStickyFailoverProxyProvider.java:99)
e[0m at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
e[0m at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
e[0m at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
e[0m at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
e[0m at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:473)
e[0m at org.apache.hadoop.hdfs.NameNodeProxies.createHopsRandomStickyProxy(NameNodeProxies.java:503)
e[0m at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:353)
e[0m at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:288)
e[0m at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:161)
e[0m at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2704)
e[0m at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:96)
e[0m at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2738)
e[0m at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2720)
e[0m at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:399)
e[0m at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:179)
e[0m at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:382)
e[0m at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
e[0m at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
e[0m at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:245)
e[0m at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:228)
e[0m at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103)
e[0m at org.apache.hadoop.fs.shell.Command.run(Command.java:175)
e[0m at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
e[0m at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
e[0m at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
e[0m at org.apache.hadoop.fs.FsShell.main(FsShell.java:378)
e[0m 20/10/06 12:23:40 WARN ha.HopsRandomStickyFailoverProxyProvider: HopsRandomStickyFailoverProxyProvider (1754662105) no new namenodes were found
e[0m -mkdir: java.net.UnknownHostException: rpc.namenode.service.consul
e[0m Usage: hadoop fs [generic options]
e[0m [-appendToFile … ]
e[0m [-cat [-ignoreCrc] …]
e[0m [-checksum …]
e[0m [-chgrp [-R] GROUP PATH…]
e[0m [-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]
e[0m [-chown [-R] [OWNER][:[GROUP]] PATH…]
e[0m [-copyFromLocal [-f] [-p] [-l] [-d] [-t num] … ]
e[0m [-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] … ]
e[0m [-count [-q] [-h] [-v] [-t []] …]
e[0m [-cp [-f] [-p | -p[topax]] [-d] … ]
e[0m [-createSnapshot []]
e[0m [-deleteSnapshot ]
e[0m [-df [-h] [ …]]
e[0m [-du [-s] [-h] [-x] …]
e[0m [-expunge]
e[0m [-find … …]
e[0m [-get [-f] [-p] [-ignoreCrc] [-crc] … ]
e[0m [-getfacl [-R] ]
e[0m [-getfattr [-R] {-n name | -d} [-e en] ]
e[0m [-getmerge [-nl] [-skip-empty-file] ]
e[0m [-help [cmd …]]
e[0m [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [ …]]
e[0m [-mkdir [-p] …]
e[0m [-moveFromLocal … ]
e[0m [-moveToLocal ]
e[0m [-mv … ]
e[0m [-put [-f] [-p] [-l] [-d] [-t num] … ]
e[0m [-renameSnapshot ]
e[0m [-rm [-f] [-r|-R] [-skipTrash] [-safely] …]
e[0m [-rmdir [–ignore-fail-on-non-empty] …]
e[0m [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} ]|[–set <acl_spec> ]]
e[0m [-setfattr {-n name [-v value] | -x name} ]
e[0m [-setrep [-R] [-w] …]
e[0m [-stat [format] …]
e[0m [-tail [-f] ]
e[0m [-test -[defsz] ]
e[0m [-text [-ignoreCrc] …]
e[0m [-touchz …]
e[0m [-truncate [-w] …]
e[0m [-usage [cmd …]]
e[0m
e[0m Generic options supported are
e[0m -conf specify an application configuration file
e[0m -D <property=value> use value for given property
e[0m -fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides ‘fs.defaultFS’ property from configurations.
e[0m -jt <local|resourcemanager:port> specify a ResourceManager
e[0m -files specify comma separated files to be copied to the map reduce cluster
e[0m -libjars specify comma separated jar files to include in the classpath.
e[0m -archives specify comma separated archives to be unarchived on the compute machines.
e[0m
e[0m The general command line syntax is
e[0m command [genericOptions] [commandOptions]
e[0m
e[0m Usage: hadoop fs [generic options] -mkdir [-p] …
e[0m 20/10/06 12:23:51 WARN util.NativeCodeLoader: Loaded the native-hadoop library
e[0m 20/10/06 12:24:01 WARN ha.FailoverProxyHelper: Failed to get list of NN from default NN. Default NN was hdfs://rpc.namenode.service.consul:8020
e[0m 20/10/06 12:24:09 WARN hdfs.DFSUtil: Could not resolve Service
e[0m com.logicalclocks.servicediscoverclient.exceptions.ServiceNotFoundException: Error: timed out Could not find service ServiceQuery(name=rpc.namenode.service.consul, tags=[])
e[0m at com.logicalclocks.servicediscoverclient.resolvers.DnsResolver.getSRVRecordsInternal(DnsResolver.java:112)
e[0m at com.logicalclocks.servicediscoverclient.resolvers.DnsResolver.getSRVRecords(DnsResolver.java:98)
e[0m at com.logicalclocks.servicediscoverclient.resolvers.DnsResolver.getService(DnsResolver.java:71)
e[0m at org.apache.hadoop.hdfs.DFSUtil.getNameNodesRPCAddressesFromServiceDiscovery(DFSUtil.java:822)
e[0m at org.apache.hadoop.hdfs.DFSUtil.getNameNodesRPCAddressesAsURIs(DFSUtil.java:772)
e[0m at org.apache.hadoop.hdfs.DFSUtil.getNameNodesRPCAddressesAsURIs(DFSUtil.java:764)
e[0m at org.apache.hadoop.hdfs.DFSUtil.getNameNodesRPCAddressesAsURIs(DFSUtil.java:757)
e[0m at org.apache.hadoop.hdfs.server.namenode.ha.FailoverProxyHelper.getActiveNamenodes(FailoverProxyHelper.java:100)
e[0m at org.apache.hadoop.hdfs.server.namenode.ha.HopsRandomStickyFailoverProxyProvider.(HopsRandomStickyFailoverProxyProvider.java:99)
e[0m at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
e[0m at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
e[0m at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
e[0m at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
e[0m at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:473)
e[0m at org.apache.hadoop.hdfs.NameNodeProxies.createHopsRandomStickyProxy(NameNodeProxies.java:503)
e[0m at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:353)
e[0m at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:288)
e[0m at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:161)
e[0m at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2704)
e[0m at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:96)
e[0m at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2738)
e[0m at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2720)
e[0m at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:399)
e[0m at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:179)
e[0m at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:382)
e[0m at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
e[0m at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
e[0m at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:245)
e[0m at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:228)
e[0m at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103)
e[0m at org.apache.hadoop.fs.shell.Command.run(Command.java:175)
e[0m at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
e[0m at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
e[0m at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
e[0m at org.apache.hadoop.fs.FsShell.main(FsShell.java:378)
e[0m 20/10/06 12:24:09 WARN ha.HopsRandomStickyFailoverProxyProvider: HopsRandomStickyFailoverProxyProvider (1754662105) no new namenodes were found
e[0m -mkdir: java.net.UnknownHostException: rpc.namenode.service.consul
e[0m Usage: hadoop fs [generic options]
e[0m [-appendToFile … ]
e[0m [-cat [-ignoreCrc] …]
e[0m [-checksum …]
e[0m [-chgrp [-R] GROUP PATH…]
e[0m [-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]
e[0m [-chown [-R] [OWNER][:[GROUP]] PATH…]
e[0m [-copyFromLocal [-f] [-p] [-l] [-d] [-t num] … ]
e[0m [-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] … ]
e[0m [-count [-q] [-h] [-v] [-t []] …]
e[0m [-cp [-f] [-p | -p[topax]] [-d] … ]
e[0m [-createSnapshot []]
e[0m [-deleteSnapshot ]
e[0m [-df [-h] [ …]]
e[0m [-du [-s] [-h] [-x] …]
e[0m [-expunge]
e[0m [-find … …]
e[0m [-get [-f] [-p] [-ignoreCrc] [-crc] … ]
e[0m [-getfacl [-R] ]
e[0m [-getfattr [-R] {-n name | -d} [-e en] ]
e[0m [-getmerge [-nl] [-skip-empty-file] ]
e[0m [-help [cmd …]]
e[0m [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [ …]]
e[0m [-mkdir [-p] …]
e[0m [-moveFromLocal … ]
e[0m [-moveToLocal ]
e[0m [-mv … ]
e[0m [-put [-f] [-p] [-l] [-d] [-t num] … ]
e[0m [-renameSnapshot ]
e[0m [-rm [-f] [-r|-R] [-skipTrash] [-safely] …]
e[0m [-rmdir [–ignore-fail-on-non-empty] …]
e[0m [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} ]|[–set <acl_spec> ]]
e[0m [-setfattr {-n name [-v value] | -x name} ]
e[0m [-setrep [-R] [-w] …]
e[0m [-stat [format] …]
e[0m [-tail [-f] ]
e[0m [-test -[defsz] ]
e[0m [-text [-ignoreCrc] …]
e[0m [-touchz …]
e[0m [-truncate [-w] …]
e[0m [-usage [cmd …]]
e[0m
e[0m Generic options supported are
e[0m -conf specify an application configuration file
e[0m -D <property=value> use value for given property
e[0m -fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides ‘fs.defaultFS’ property from configurations.
e[0m -jt <local|resourcemanager:port> specify a ResourceManager
e[0m -files specify comma separated files to be copied to the map reduce cluster
e[0m -libjars specify comma separated jar files to include in the classpath.
e[0m -archives specify comma separated archives to be unarchived on the compute machines.
e[0m
e[0m The general command line syntax is
e[0m command [genericOptions] [commandOptions]
e[0m
e[0m Usage: hadoop fs [generic options] -mkdir [-p] …
e[0m 20/10/06 12:24:10 WARN util.NativeCodeLoader: Loaded the native-hadoop library
e[0m 20/10/06 12:24:20 WARN ha.FailoverProxyHelper: Failed to get list of NN from default NN. Default NN was hdfs://rpc.namenode.service.consul:8020
e[0m 20/10/06 12:24:28 WARN hdfs.DFSUtil: Could not resolve Service
e[0m com.logicalclocks.servicediscoverclient.
Loading …

Hi!

The recipe fails because it is trying to perform a filesystem operation but the filesystem is not working. Can you please check if the NameNode is running with sudo systemctl status namenode? Also check NameNode’s logs in /srv/hops/hadoop/logs if there is any exception or an error message.

Thanks.

Hi Antonios,

Result of the command sudo systemctl status namenode

● namenode.service - NameNode server for HDFS.
Loaded: loaded (/lib/systemd/system/namenode.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/namenode.service.d
└─limits.conf
Active: active (running) since Tue 2020-10-06 12:25:14 IST; 2h 43min ago
Main PID: 17284 (java)
Tasks: 185 (limit: 4915)
CGroup: /system.slice/namenode.service
└─17284 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Dproc_namenode -Xmx1000m -XX:MaxDirectMemorySize=1000m -XX:MaxDirectMemorySize=1000m -XX:MaxDirectMemorySize=1000m -XX:MaxDirectMemorySize=10

Oct 06 12:25:14 Goals10109 start-nn.sh[17206]: /************************************************************
Oct 06 12:25:14 Goals10109 start-nn.sh[17206]: STARTUP_MSG: Starting NameNode
Oct 06 12:25:14 Goals10109 start-nn.sh[17206]: STARTUP_MSG: user = hdfs
Oct 06 12:25:14 Goals10109 start-nn.sh[17206]: STARTUP_MSG: host = Goals10109/192.168.1.3
Oct 06 12:25:14 Goals10109 start-nn.sh[17206]: STARTUP_MSG: args = []
Oct 06 12:25:14 Goals10109 start-nn.sh[17206]: STARTUP_MSG: version = 2.8.2.10-RC1
Oct 06 12:25:14 Goals10109 start-nn.sh[17206]: STARTUP_MSG: classpath = /srv/hops/hadoop/etc/hadoop:/srv/hops/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/srv/hops/hadoop/share/hadoop/com
Oct 06 12:25:14 Goals10109 start-nn.sh[17206]: STARTUP_MSG: build = git@github.com:hopshadoop/hops.git -r abc3abcdf67a1e49fd6d0b8b381a23a677996a18; compiled by ‘gautier’ on 2020-06-12T13:18Z
Oct 06 12:25:14 Goals10109 start-nn.sh[17206]: STARTUP_MSG: java = 1.8.0_265
Oct 06 12:25:14 Goals10109 systemd[1]: Started NameNode server for HDFS…

Also, there are no error and warning’s for NameNode’s logs in /srv/hops/hadoop/logs

Regards
Ashish Kamboj

Then the problem is with the service discovery system.

  • What OS are you using?
  • Can you dig glassfish.service.consul? It should return the IP of that service.
  • Can you also check the logs of consul and dnsmasq with journalctl -f -u consul and journalctl -f -u dnsmasq?
  • Try restarting them with sudo systemctl restart SERVICE_NAME and execute the dig command above
  • Please paste here, if possible, the content of /etc/resolv.conf

Hi Antonios,

Below are the answers for the queries-

  • What OS are you using?
    Ubuntu 18.04.4 LTS

  • Can you dig glassfish.service.consul ? It should return the IP of that service.
    ; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> glassfish.service.consul
    ;; global options: +cmd
    ;; connection timed out; no servers could be reached

  • Can you also check the logs of consul and dnsmasq with journalctl -f -u consul and journalctl -f -u dnsmasq ?

journalctl -f -u consul
– Logs begin at Mon 2020-06-22 15:33:05 IST. –
Oct 06 14:41:59 Goals10109 consul[18665]: 2020-10-06T14:41:59.873+0530 [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader=
Oct 06 14:41:59 Goals10109 consul[18665]: 2020-10-06T14:41:59.873+0530 [INFO] agent.server.raft: entering candidate state: node=“Node at 192.168.1.3:8300 [Candidate]” term=3
Oct 06 14:41:59 Goals10109 consul[18665]: 2020-10-06T14:41:59.879+0530 [INFO] agent.server.raft: election won: tally=1
Oct 06 14:41:59 Goals10109 consul[18665]: 2020-10-06T14:41:59.879+0530 [INFO] agent.server.raft: entering leader state: leader=“Node at 192.168.1.3:8300 [Leader]”
Oct 06 14:41:59 Goals10109 consul[18665]: 2020-10-06T14:41:59.879+0530 [INFO] agent.server: cluster leadership acquired
Oct 06 14:41:59 Goals10109 consul[18665]: 2020-10-06T14:41:59.879+0530 [INFO] agent.server: New leader elected: payload=Goals10109
Oct 06 14:41:59 Goals10109 consul[18665]: 2020-10-06T14:41:59.901+0530 [INFO] agent.leader: started routine: routine=“CA root pruning”
Oct 06 14:42:00 Goals10109 consul[18665]: 2020-10-06T14:42:00.161+0530 [INFO] agent: Synced node info
Oct 06 14:42:05 Goals10109 consul[18665]: 2020-10-06T14:42:05.415+0530 [ERROR] agent: Newer Consul version available: new_version=1.8.4 current_version=1.7.0
Oct 06 14:48:30 Goals10109 consul[18665]: 2020-10-06T14:48:30.399+0530 [INFO] agent: Synced check: check=airflow-webserver-check

journalctl -f -u dnsmasq
– Logs begin at Mon 2020-06-22 15:33:05 IST. –
Oct 05 19:09:59 Goals10109 dnsmasq[1491]: dnsmasq: syntax check OK.
Oct 05 19:09:59 Goals10109 systemd[1]: Starting dnsmasq - A lightweight DHCP and caching DNS server…
Oct 05 19:09:59 Goals10109 systemd[1]: dnsmasq.service: Control process exited, code=exited status=2
Oct 05 19:09:59 Goals10109 systemd[1]: dnsmasq.service: Failed with result ‘exit-code’.
Oct 05 19:09:59 Goals10109 systemd[1]: Failed to start dnsmasq - A lightweight DHCP and caching DNS server.
– Reboot –
Oct 06 11:13:34 Goals10109 dnsmasq[1424]: dnsmasq: syntax check OK.
Oct 06 11:13:34 Goals10109 systemd[1]: Starting dnsmasq - A lightweight DHCP and caching DNS server…
Oct 06 11:13:34 Goals10109 systemd[1]: dnsmasq.service: Control process exited, code=exited status=2
Oct 06 11:13:34 Goals10109 systemd[1]: dnsmasq.service: Failed with result ‘exit-code’.
Oct 06 11:13:34 Goals10109 systemd[1]: Failed to start dnsmasq - A lightweight DHCP and caching DNS server.

  • Try restarting them with sudo systemctl restart SERVICE_NAME and execute the dig command above
    Restarted using command sudo systemctl restart namenode.service

After running dig glassfish.service.consul
; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> glassfish.service.consul
;; global options: +cmd
;; connection timed out; no servers could be reached

  • Please paste here, if possible, the content of /etc/resolv.conf
    nameserver 127.0.0.53
    options edns0
    search domain.name

There is something off with dnsmasq Could you post the content of (a) /etc/systemd/resolved.conf and (b) /etc/dnsmasq.d/default

Also, the output of the command ip addr show

Hi Antonios,

Below are the outputs -

(a) /etc/systemd/resolved.conf
[Resolve]
DNS=127.0.0.2
#FallbackDNS=
Domains=~consul
#LLMNR=no
#MulticastDNS=no
#DNSSEC=no
#Cache=yes

(b) /etc/dnsmasq.d/default
port=53
no-resolv
bind-interfaces
listen-address=127.0.0.2,192.168.1.3
server=/consul/127.0.0.1#8600

© ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
link/ether 8c:04:ba:2a:1c:ae brd ff:ff:ff:ff:ff:ff
3: gpd0: <POINTOPOINT,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 500
link/none
4: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether c0:b5:d7:7e:60:19 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.3/24 brd 192.168.1.255 scope global dynamic noprefixroute wlp2s0
valid_lft 239037sec preferred_lft 239037sec
inet6 fe80::a255:c483:d478:7174/64 scope link noprefixroute
valid_lft forever preferred_lft forever
5: br-18fff45fd5f8: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:9c:5e:3f:1f brd ff:ff:ff:ff:ff:ff
inet 172.19.0.1/16 brd 172.19.255.255 scope global br-18fff45fd5f8
valid_lft forever preferred_lft forever
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:66:56:f7:c0 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
7: br-448a48f8ef72: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:04:41:21:63 brd ff:ff:ff:ff:ff:ff
inet 172.23.0.1/16 brd 172.23.255.255 scope global br-448a48f8ef72
valid_lft forever preferred_lft forever
8: br-62160400d47e: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:46:20:2a:9e brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-62160400d47e
valid_lft forever preferred_lft forever
9: br-911c2adc9f23: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:7a:9a:40:fb brd ff:ff:ff:ff:ff:ff
inet 172.20.0.1/16 brd 172.20.255.255 scope global br-911c2adc9f23
valid_lft forever preferred_lft forever
10: br-e6b22af915d0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:03:cf:b4:f4 brd ff:ff:ff:ff:ff:ff
inet 172.22.0.1/16 brd 172.22.255.255 scope global br-e6b22af915d0
valid_lft forever preferred_lft forever
11: br-09ebbb97491e: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:3f:26:a0:37 brd ff:ff:ff:ff:ff:ff
inet 172.21.0.1/16 brd 172.21.255.255 scope global br-09ebbb97491e
valid_lft forever preferred_lft forever

I can’t tell what’s the problem from what I’ve seen so far.
Watch the logs of dnsmasq with sudo journalctl -f -u dnsmasq and on another terminal sudo systemctl restart dnsmasq.

Also, are you installing on a local VM, AWS, Azure or something else?

Yup, I’m installing on my system. Previously I has installed all the components but didn’t open up the UI, so trying to install it again but now I’m getting this error.

Can you please also help, if I’m able to install all the components successfully the How will open or log in to Hopsworks.ai UI in order to use all the components?

What’s on the logs of dnsmasq while restarting it?
Once installation is done Hopsworks will be available at https://YOUR_PUBLIC_IP/hopsworks

Below are the logs while restarting the dnsmasq

Oct 06 17:30:44 Goals10109 systemd[1]: Stopping dnsmasq - A lightweight DHCP and caching DNS server…
Oct 06 17:30:44 Goals10109 dnsmasq[12594]: exiting on receipt of SIGTERM
Oct 06 17:30:44 Goals10109 systemd[1]: Stopped dnsmasq - A lightweight DHCP and caching DNS server.
Oct 06 17:30:44 Goals10109 systemd[1]: Starting dnsmasq - A lightweight DHCP and caching DNS server…
Oct 06 17:30:44 Goals10109 dnsmasq[5661]: dnsmasq: syntax check OK.
Oct 06 17:30:44 Goals10109 dnsmasq[5693]: started, version 2.79 cachesize 150
Oct 06 17:30:44 Goals10109 dnsmasq[5693]: compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ipset auth DNSSEC loop-detect inotify
Oct 06 17:30:44 Goals10109 dnsmasq[5693]: using nameserver 127.0.0.1#8600 for domain consul
Oct 06 17:30:44 Goals10109 dnsmasq[5693]: read /etc/hosts - 12 addresses
Oct 06 17:30:44 Goals10109 systemd[1]: Started dnsmasq - A lightweight DHCP and caching DNS server.
Oct 06 17:30:46 Goals10109 dnsmasq[5853]: nameserver 127.0.0.1 refused to do a recursive query

From what you’ve pasted I see that the error isn’t there anymore. Could it be that the dig glassfish.service.consul is returning the IP now?

Now it’s returning after executing - dig glassfish.service.consul

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> glassfish.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53214
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;glassfish.service.consul. IN A

;; ANSWER SECTION:
glassfish.service.consul. 0 IN A 192.168.1.3

;; Query time: 1 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Oct 06 18:31:03 IST 2020
;; MSG SIZE rcvd: 69

OK so now you may continue the installation.

Karamel is an application we use for orchestrating the deployments. Its web UI listens on port 9090. So if you’re behind a firewall forward the port with ssh -L9090:localhost:9090 USER@HOST_IP

Then go to your browser and visit http://localhost:9090/index.html Click on Terminal > Status you should see your failed recipe. Click retry and it will move forward with the rest of the installation.

1 Like

Hi Antonios,

I was able to successfully install the Hopsworks. Also, the hopsworks.ai UI is also opened but not sure about the credentials for login (It asks for the Email and Password). When I click on Register and fill all the details but in the Security Question dropdown nothing will appear to select, so not able to register also.

Can you please help from where I’ll get the login details?

Regards
Ashish Kamboj

Hi. Good to hear that you made progress.
If you didn’t change anything Hopsworks comes pre-configured an administrative account. You can find more information in our documentation here

Getting error while creating a New Project - check logs here /srv/hops/domains/domain1/logs/server.log

[#|2020-10-07T18:16:01.165+0530|WARNING|Payara 4.1|javax.enterprise.web|_ThreadID=27;_ThreadName=http-thread-pool::http-listener-2(1);_TimeMillis=1602074761165;_LevelValue=900;|StandardWrapperValve[microprofile-metrics-resource]: Servlet.service() for servlet microprofile-metrics-resource threw exception
java.lang.RuntimeException: javax.management.InstanceNotFoundException: amx:pp=/mon/server-mon[server],type=managed-executor-service-mon,name=executorService/concurrent/hopsExecutorService
at fish.payara.microprofile.metrics.jmx.MBeanExpression.getNumberValue(MBeanExpression.java:127)
at fish.payara.microprofile.metrics.writer.PrometheusExporter.exportGauge(PrometheusExporter.java:154)
at fish.payara.microprofile.metrics.writer.PrometheusWriter.writeMetricMap(PrometheusWriter.java:176)
at fish.payara.microprofile.metrics.writer.PrometheusWriter.writeMetrics(PrometheusWriter.java:134)
at fish.payara.microprofile.metrics.writer.PrometheusWriter.write(PrometheusWriter.java:126)
at fish.payara.microprofile.metrics.rest.MetricsResource.processRequest(MetricsResource.java:107)
at fish.payara.microprofile.metrics.rest.MetricsResource.doGet(MetricsResource.java:177)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1692)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:258)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:160)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:654)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:593)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:159)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:654)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:593)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:368)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:238)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:483)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:180)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:206)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:180)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:235)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:284)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:201)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:133)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:539)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:117)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:56)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:137)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:593)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:573)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.management.InstanceNotFoundException: amx:pp=/mon/server-mon[server],type=managed-executor-service-mon,name=executorService/concurrent/hopsExecutorService
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:643)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at fish.payara.microprofile.metrics.jmx.MBeanExpression.getAttribute(MBeanExpression.java:103)
at fish.payara.microprofile.metrics.jmx.MBeanExpression.getNumberValue(MBeanExpression.java:117)

Hi.

That’s not an error, it’s just a warning. Could you try creating a Project and send us the isolated logs from the application server?