Hopsworks installation error on GCP again!

Hi,
I was able to install Hopsworks on single host GCP after it failed first time ( I guess because of SeLinux issues)

I tried to install again on redhat 7 GCP . I failed with similar error.
Installation Log (tail -f installation.log)

INFO [2020-04-23 03:45:35,735] se.kth.karamel.backend.machines.SshMachine: 10.142.0.4: Running task: hops::nn
ERROR [2020-04-23 03:46:51,693] se.kth.karamel.backend.dag.DagNode: Failed ‘hops::nn on 10.142.0.4’ because '10.142
.0.4: Command did not complete: mkdir -p /home/orchestralgk/.karamel/install ; cd /home/orchestralgk/.karamel/insta
ll; echo $$ > pid; echo '#!/bin/bash
set -eo pipefail
echo $(date ‘+%H:%M:%S’): ‘hops__nn’ >> order
cat > hops__nn.json <<-‘END_OF_FILE’
{
json file
}

sudo chef-solo -c /home/orchestralgk/.karamel/install/solo.rb -j /home/orchestralgk/.karamel/install/hops__nn.json
2>&1 | tee hops__nn.log
echo ‘https://github.com/logicalclocks/hops-hadoop-chef/tree/master/hops::nn’ >> succeed_list
’ > hops__nn.sh ; chmod +x hops__nn.sh ; ./hops__nn.sh
', DAG is stuck here :frowning:
INFO [2020-04-23 03:46:56,308] se.kth.karamel.backend.machines.MachinesMonitor: Sending pause signal to all machines

Here is hops__nn.log

  • hops_hdfs_directory[/tmp] action create_as_superuser
    • bash[mk-dir-/tmp] action run

      ================================================================================
      Error executing action run on resource ‘bash[mk-dir-/tmp]’

      Mixlib::ShellOut::ShellCommandFailed

      Expected process to exit with [0], but received ‘255’
      ---- Begin output of “bash” “/tmp/chef-script20200423-27242-19h7lqv” ----

      STDOUT:
      STDERR: 20/04/23 03:46:23 WARN util.NativeCodeLoader: Loaded the native-hadoop library
      20/04/23 03:46:24 WARN ha.FailoverProxyHelper: Failed to get list of NN from default NN. Default NN wa
      s hdfs://rpc.namenode.service.consul:8020
      20/04/23 03:46:24 WARN hdfs.DFSUtil: Could not resolve Service



System Info:
------------
chef_version=14.10.9
platform=redhat
platform_version=7.8
ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
program_name=/bin/chef-solo
executable=/opt/chefdk/bin/chef-solo

================================================================================
Error executing action `create_as_superuser` on resource 'hops_hdfs_directory[/tmp]'
================================================================================

Mixlib::ShellOut::ShellCommandFailed
------------------------------------
bash[mk-dir-/tmp] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 88) had an error: Mixlib:

:ShellOut::ShellCommandFailed: Expected process to exit with [0], but received ‘255’
---- Begin output of “bash” “/tmp/chef-script20200423-27242-19h7lqv” ----
STDOUT:
STDERR: 20/04/23 03:46:23 WARN util.NativeCodeLoader: Loaded the native-hadoop library
20/04/23 03:46:24 WARN ha.FailoverProxyHelper: Failed to get list of NN from default NN. Default NN was
hdfs://rpc.namenode.service.consul:8020
20/04/23 03:46:24 WARN hdfs.DFSUtil: Could not resolve Service

Have you made some changes recently ?

regards,
Vivek

Hello,

We haven’t made any changes. Can you install bind-utils and do dig namenode.service.consul? It should answer you with the IP for this domain. Make sure that the namenode is up and running systemct status namenode Otherwise restart it and wait until dig answers with the IP. It might take about 20 seconds until the domain is available again.

Hello @vivek and @antonios
i’m having the same problem, was doing an installation and troubleshooting other issue, decided to start over and now i got this same error, may ask how did you resolved?
Regards