Installer crashes on fresh install

Trying to install on a fresh Ubuntu 18.04, I ran into a couple of issues. I could fix them by manually fixing some paths and permissions. However, now the installer stops here:

Recipe: hops::defaultESC[0m
  * template[/srv/hops/hadoop/etc/hadoop/log4j.properties] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/core-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/hadoop-env.sh] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/jmxremote.access] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/jmxremote.password] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/yarn-jmxremote.password] action create (up to date)
  * template[/srv/hops/hadoop/sbin/set-env.sh] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/hdfs-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/erasure-coding-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/yarn-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/resource-types.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/container-executor.cfg] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/yarn-env.sh] action create (up to date)
  * bash[remove-hadoop-log-copy-cron] action run
    ESC[32m- execute "bash"  "/tmp/chef-script20201008-8976-1wpzz6m"ESC[0m
ESC[0m  * cron[copy_hadoop_logs] action create
    ESC[32m- add crontab entry for cron[copy_hadoop_logs]ESC[0m
ESC[0m  * cron[delete_hadoop_logs] action create
    ESC[32m- add crontab entry for cron[delete_hadoop_logs]ESC[0m
ESC[0m  * cookbook_file[/srv/hops/hadoop/etc/hadoop/namenode.yaml] action create (up to date)
Recipe: hops::formatESC[0m
  * hops_ndb[format-nn] action format_nn
    * bash[format-nn] action run (skipped due to not_if)
     (up to date)
  * bash[validate_formatting] action run
    ESC[0m
    ================================================================================ESC[0m
    ESC[31mError executing action `run` on resource 'bash[validate_formatting]'ESC[0m
    ================================================================================ESC[0m

ESC[0m    Mixlib::ShellOut::ShellCommandFailedESC[0m
    ------------------------------------ESC[0m
    Expected process to exit with [0], but received '1'
ESC[0m    ---- Begin output of "bash"  "/tmp/chef-script20201008-8976-n5p8oo" ----
ESC[0m    STDOUT:
ESC[0m    STDERR: mysql: [Warning] Using a password on the command line interface can be insecure.
ESC[0m    ---- End output of "bash"  "/tmp/chef-script20201008-8976-n5p8oo" ----
ESC[0m    Ran "bash"  "/tmp/chef-script20201008-8976-n5p8oo" returned 1ESC[0m

ESC[0m    Resource Declaration:ESC[0m
    ---------------------ESC[0m
    # In /tmp/chef-solo/cookbooks/hops/recipes/format.rb
ESC[0m
ESC[0m     22:     bash "validate_formatting" do
ESC[0m     23:       user "root"
ESC[0m     24:       code <<-EOF
ESC[0m     25:        #{exec} hops -e 'select count(*) from hdfs_variables' | tail -n 1 | egrep -v "^0$"
ESC[0m     26:       EOF
ESC[0m     27:     end
ESC[0m     28:   rescue
ESC[0m
ESC[0m    Compiled Resource:ESC[0m
    ------------------ESC[0m
    # Declared in /tmp/chef-solo/cookbooks/hops/recipes/format.rb:22:in `from_file'
ESC[0m
ESC[0m    bash("validate_formatting") do
ESC[0m      action [:run]
ESC[0m      default_guard_interpreter :default
ESC[0m      command nil
ESC[0m      backup 5
ESC[0m      interpreter "bash"
ESC[0m      declared_type :bash
ESC[0m      cookbook_name "hops"
ESC[0m      recipe_name "format"
ESC[0m      user "root"
ESC[0m      code "       /srv/hops/mysql-cluster/ndb/scripts/mysql-client.sh hops -e 'select count(*) from hdfs_variables' | tail -n 1 | egrep -v \"^0$\"\n"
ESC[0m      domain nil
ESC[0m    end
ESC[0m
ESC[0m    System Info:ESC[0m
    ------------ESC[0m
    chef_version=14.10.9
ESC[0m    platform=ubuntu
ESC[0m    platform_version=18.04
ESC[0m    ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
ESC[0m    program_name=/usr/bin/chef-solo
ESC[0m    executable=/opt/chefdk/bin/chef-soloESC[0m

ESC[0mESC[0m
Running handlers:ESC[0m
[2020-10-08T07:30:07+00:00] ERROR: Running exception handlers
Running handlers complete
ESC[0m[2020-10-08T07:30:07+00:00] ERROR: Exception handlers complete
Chef Client failed. 16 resources updated in 08 secondsESC[0m
[2020-10-08T07:30:07+00:00] FATAL: Stacktrace dumped to /tmp/chef-solo/chef-stacktrace.out
[2020-10-08T07:30:07+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2020-10-08T07:30:07+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: bash[validate_formatting] (hops::format line 22) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of "bash"  "/tmp/chef-script20201008-8976-n5p8oo" ----
STDOUT:
STDERR: mysql: [Warning] Using a password on the command line interface can be insecure.
---- End output of "bash"  "/tmp/chef-script20201008-8976-n5p8oo" ----
Ran "bash"  "/tmp/chef-script20201008-8976-n5p8oo" returned 1
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
ERROR [2020-10-08 07:30:12,290] se.kth.karamel.backend.machines.SshMachine: -------------------------------------------------------------------------------

ERROR [2020-10-08 07:30:12,290] se.kth.karamel.backend.machines.SshMachine: End Log for Failed: 'hops::ndb' '10.0.22.49'

It looks like the WARNING is interpreted as an error?

Hi Manuel,

Are you using the latest installer and what choices did you make during the installation? You can check that you have the latest installer by opening the hopsworks-installer.sh and checking that the value of HOPSWORKS_BRANCH is 1.4. I will see if I can get some details about that error.

If you had already an installation on that machine, then i would start over from scratch by cleaning with this command:
./hopsworks-installer.sh -i purge -ni

The installation is failing because the state of the hdfs_variables table in the hops database is not correct. It’s better to wipe and start over. Like Steffen mentioned, make sure you are installing 1.4.

Yes, I’m using the latest, 1.4
I am installing on 3 AWS instances.

I’ve started from scratch now as suggested by Jim_Dowling with the following choices:
(3) Install a multi-host Hopsworks cluster with TLS enabled.
Platform:
(2) AWS.

Still running into the same error.

You have to purge each machine individually. Or just re-create the instances.
You need a head VM. And to set up passwordless SSH between the machines:

Have you done this?

Yes, of course. I followed all the steps as described in the documentation and also started with clean instances again.

Hi,

Could you send us the content of /home/ubuntu/.karamel/hops/logs/<MASTER_IP>/hops__ndb.log? With MASTER_IP the ip of the node on which you are running the script. And, check in the namenode logs (/srv/hops/hadoop/logs/hadoop-hdfs-namenode-<HOST_NAME>.log) if there is any error message.

@Gautier Here we go, sorry for the delay. Content of /home/ubuntu/.karamel/hops/logs/<MASTER_IP>/hops__ndb.log:

Starting Chef Client, version 14.10.9ESC[0m
resolving cookbooks for run list: ["hops::ndb"]ESC[0m
Synchronizing Cookbooks:ESC[0m
  - hops (1.4.0)ESC[0m
  - java (7.0.0)ESC[0m
  - magic_shell (1.0.0)ESC[0m
  - sysctl (1.0.5)ESC[0m
  - cmake (0.3.0)ESC[0m
  - kagent (1.4.0)ESC[0m
  - ndb (1.4.0)ESC[0m
  - conda (1.4.0)ESC[0m
  - kzookeeper (1.4.0)ESC[0m
  - elastic (1.4.0)ESC[0m
  - consul (1.4.0)ESC[0m
  - homebrew (5.0.8)ESC[0m
  - windows (7.0.2)ESC[0m
  - openssl (4.4.0)ESC[0m
  - ohai (5.3.0)ESC[0m
  - hostsfile (2.4.6)ESC[0m
  - ntp (2.0.3)ESC[0m
  - sudo (4.0.1)ESC[0m
  - ulimit (1.4.0)ESC[0m
  - ulimit2 (0.2.0)ESC[0m
  - elasticsearch (4.0.6)ESC[0m
  - chef-sugar (5.1.8)ESC[0m
  - apt (7.2.0)ESC[0m
  - yum (5.1.0)ESC[0m
  - ark (5.0.0)ESC[0m
  - seven_zip (3.1.2)ESC[0m
Installing Cookbook Gems:ESC[0m
Compiling Cookbooks...ESC[0m
Converging 52 resourcesESC[0m
Recipe: hops::ndbESC[0m
  * directory[/srv/hops/ndb-hops-3.2.0.0-RC4-7.6.12] action create (up to date)
  * link[/srv/hops/ndb-hops] action create (up to date)
  * remote_file[/tmp/chef-solo/flyway-commandline-5.0.3-linux-x64.tar.gz] action create_if_missing (up to date)
  * bash[unpack_flyway] action run (skipped due to not_if)
  * template[/srv/hops/ndb-hops/flyway/conf/flyway.conf] action create (up to date)
  * directory[/srv/hops/ndb-hops/flyway/undo] action create (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V0.0.2__initial_tables.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.2__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.3__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.4__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.5__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.6__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.7__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.8__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.9__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.10__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V3.2.0.0__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/ndb-dal-3.2.0.0-RC4-7.6.12.jar] action create_if_missing (up to date)
  * hops_ndb[extract_ndb_hops] action install_ndb_hops
    * link[/srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/common/lib/ndb-dal.jar] action delete
      ESC[32m- delete link to file at /srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/common/lib/ndb-dal.jarESC[0m
ESC[0m    * link[/srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/common/lib/ndb-dal.jar] action create
      ESC[32m- create symlink at /srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/common/lib/ndb-dal.jar to /srv/hops/ndb-hops/ndb-dal-3.2.0.0-RC4-7.6.12.jarESC[0m
      ESC[32m- change owner from 'root' to 'hdfs'ESC[0m
      ESC[32m- change group from 'root' to 'hadoop'ESC[0m
ESC[0m    * link[/srv/hops/hadoop-3.2.0.0-RC4/lib/native/libndbclient.so] action delete
      ESC[32m- delete link to file at /srv/hops/hadoop-3.2.0.0-RC4/lib/native/libndbclient.soESC[0m
ESC[0m    * link[/srv/hops/hadoop-3.2.0.0-RC4/lib/native/libndbclient.so] action create
      ESC[32m- create symlink at /srv/hops/hadoop-3.2.0.0-RC4/lib/native/libndbclient.so to /srv/hops/mysql/lib/libndbclient.soESC[0m
ESC[0m
ESC[0m  * link[/srv/hops/ndb-hops/ndb-dal.jar] action create (up to date)
  * template[/srv/hops/hadoop-3.2.0.0-RC4/etc/hadoop/ndb.props] action create (up to date)
  * hops_ndb[install] action install_hops
    * ndb_waiter[wait_mysql_started] action wait_until_cluster_ready
      * bash[wait_mysql_started] action run
        ESC[32m- execute "bash"  "/tmp/chef-script20201012-19666-wgs158"ESC[0m
ESC[0m
ESC[0m    * ndb_mysql_basic[mysqld_start_hop_install] action wait_until_started
      * bash[remove_mycnf_mysqld_start_hop_install] action run (skipped due to only_if)
      * bash[wait_mysqld_started] action run
        ESC[32m- execute "bash"  "/tmp/chef-script20201012-19666-wuarf4"ESC[0m
ESC[0m
ESC[0m    * bash[mysql-install-hops] action run
      ESC[32m- execute "bash"  "/tmp/chef-script20201012-19666-m7ir6k"ESC[0m
ESC[0m    * template[/srv/hops/ndb-hops/flyway.sql] action create (up to date)
    * bash[flyway_baseline] action run
      ESC[32m- execute "bash"  "/tmp/chef-script20201012-19666-whfavc"ESC[0m
ESC[0m    * bash[flyway_migrate] action run
      ESC[32m- execute "bash"  "/tmp/chef-script20201012-19666-x6neue"ESC[0m
ESC[0m
ESC[0m  * template[/srv/hops/hadoop-3.2.0.0-RC4/sbin/start-nn.sh] action create
    ESC[32m- change mode from '0550' to '0700'ESC[0m
    ESC[32m- change group from 'metaserver' to 'hadoop'ESC[0m
ESC[0m  * template[/srv/hops/hadoop-3.2.0.0-RC4/sbin/stop-nn.sh] action create
    ESC[32m- change mode from '0550' to '0700'ESC[0m
    ESC[32m- change group from 'metaserver' to 'hadoop'ESC[0m
ESC[0m  * template[/srv/hops/hadoop-3.2.0.0-RC4/sbin/restart-nn.sh] action create
    ESC[32m- change mode from '0550' to '0700'ESC[0m
    ESC[32m- change group from 'metaserver' to 'hadoop'ESC[0m
ESC[0m  * template[/srv/hops/hadoop-3.2.0.0-RC4/sbin/format-nn.sh] action create
    ESC[32m- change mode from '0550' to '0700'ESC[0m
    ESC[32m- change group from 'metaserver' to 'hadoop'ESC[0m
ESC[0mRecipe: java::notifyESC[0m
  * log[jdk-version-changed] action nothing (skipped due to action :nothing)
Recipe: java::openjdkESC[0m
  * apt_package[openjdk-8-jdk, openjdk-8-jre-headless] action install (up to date)
  * java_alternatives[set-java-alternatives] action set (up to date)
Recipe: java::default_java_symlinkESC[0m
  * link[/usr/lib/jvm/default-java] action create (up to date)
Recipe: java::set_java_homeESC[0m
  * directory[/etc/profile.d] action create (up to date)
  * template[/etc/profile.d/jdk.sh] action create (up to date)
  * ruby_block[Set JAVA_HOME in /etc/environment] action run
    ESC[32m- execute the ruby block Set JAVA_HOME in /etc/environmentESC[0m
ESC[0mRecipe: hops::defaultESC[0m
  * template[/srv/hops/hadoop/etc/hadoop/log4j.properties] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/core-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/hadoop-env.sh] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/jmxremote.access] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/jmxremote.password] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/yarn-jmxremote.password] action create (up to date)
  * template[/srv/hops/hadoop/sbin/set-env.sh] action create
    ESC[32m- change mode from '0550' to '0750'ESC[0m
ESC[0m  * template[/srv/hops/hadoop/etc/hadoop/hdfs-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/erasure-coding-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/yarn-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/resource-types.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/container-executor.cfg] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/yarn-env.sh] action create (up to date)
  * bash[remove-hadoop-log-copy-cron] action run
    ESC[32m- execute "bash"  "/tmp/chef-script20201012-19666-q4i28o"ESC[0m
ESC[0m  * cron[copy_hadoop_logs] action create
    ESC[32m- add crontab entry for cron[copy_hadoop_logs]ESC[0m
ESC[0m  * cron[delete_hadoop_logs] action create
    ESC[32m- add crontab entry for cron[delete_hadoop_logs]ESC[0m
ESC[0m  * cookbook_file[/srv/hops/hadoop/etc/hadoop/namenode.yaml] action create (up to date)
Recipe: hops::formatESC[0m
  * hops_ndb[format-nn] action format_nn
    * bash[format-nn] action run (skipped due to not_if)
     (up to date)
  * bash[validate_formatting] action run
    ESC[0m
    ================================================================================ESC[0m
    ESC[31mError executing action `run` on resource 'bash[validate_formatting]'ESC[0m
    ================================================================================ESC[0m

ESC[0m    Mixlib::ShellOut::ShellCommandFailedESC[0m
    ------------------------------------ESC[0m
    Expected process to exit with [0], but received '1'
ESC[0m    ---- Begin output of "bash"  "/tmp/chef-script20201012-19666-1n29bmb" ----
ESC[0m    STDOUT:
ESC[0m    STDERR: mysql: [Warning] Using a password on the command line interface can be insecure.
ESC[0m    ---- End output of "bash"  "/tmp/chef-script20201012-19666-1n29bmb" ----
ESC[0m    Ran "bash"  "/tmp/chef-script20201012-19666-1n29bmb" returned 1ESC[0m

ESC[0m    Resource Declaration:ESC[0m
    ---------------------ESC[0m
    # In /tmp/chef-solo/cookbooks/hops/recipes/format.rb
ESC[0m
ESC[0m     22:     bash "validate_formatting" do
ESC[0m     23:       user "root"
ESC[0m     24:       code <<-EOF
ESC[0m     25:        #{exec} hops -e 'select count(*) from hdfs_variables' | tail -n 1 | egrep -v "^0$"
ESC[0m     26:       EOF
ESC[0m     27:     end
ESC[0m     28:   rescue
ESC[0m
ESC[0m    Compiled Resource:ESC[0m
    ------------------ESC[0m
    # Declared in /tmp/chef-solo/cookbooks/hops/recipes/format.rb:22:in `from_file'
ESC[0m
ESC[0m    bash("validate_formatting") do
ESC[0m      action [:run]
ESC[0m      default_guard_interpreter :default
ESC[0m      command nil
ESC[0m      backup 5
ESC[0m      interpreter "bash"
ESC[0m      declared_type :bash
ESC[0m      cookbook_name "hops"
ESC[0m      recipe_name "format"
ESC[0m      user "root"
ESC[0m      code "       /srv/hops/mysql-cluster/ndb/scripts/mysql-client.sh hops -e 'select count(*) from hdfs_variables' | tail -n 1 | egrep -v \"^0$\"\n"
ESC[0m      domain nil
ESC[0m    end
ESC[0m
ESC[0m    System Info:ESC[0m
    ------------ESC[0m
    chef_version=14.10.9
ESC[0m    platform=ubuntu
ESC[0m    platform_version=18.04
ESC[0m    ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
ESC[0m    program_name=/usr/bin/chef-solo
ESC[0m    executable=/opt/chefdk/bin/chef-soloESC[0m

ESC[0mESC[0m
Running handlers:ESC[0m
[2020-10-12T10:08:31+00:00] ERROR: Running exception handlers
Running handlers complete
ESC[0m[2020-10-12T10:08:31+00:00] ERROR: Exception handlers complete
Chef Client failed. 22 resources updated in 42 secondsESC[0m
[2020-10-12T10:08:31+00:00] FATAL: Stacktrace dumped to /tmp/chef-solo/chef-stacktrace.out
[2020-10-12T10:08:31+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2020-10-12T10:08:31+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: bash[validate_formatting] (hops::format line 22) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1
---- Begin output of "bash"  "/tmp/chef-script20201012-19666-1n29bmb" ----
STDOUT:
STDERR: mysql: [Warning] Using a password on the command line interface can be insecure.
---- End output of "bash"  "/tmp/chef-script20201012-19666-1n29bmb" ----
Ran "bash"  "/tmp/chef-script20201012-19666-1n29bmb" returned 1
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
resolving cookbooks for run list: ["hops::ndb"]ESC[0m
Synchronizing Cookbooks:ESC[0m
  - hops (1.4.0)ESC[0m
  - java (7.0.0)ESC[0m
  - magic_shell (1.0.0)ESC[0m
  - sysctl (1.0.5)ESC[0m
  - cmake (0.3.0)ESC[0m
  - kagent (1.4.0)ESC[0m
  - ndb (1.4.0)ESC[0m
  - conda (1.4.0)ESC[0m
  - kzookeeper (1.4.0)ESC[0m
  - elastic (1.4.0)ESC[0m
  - consul (1.4.0)ESC[0m
  - homebrew (5.0.8)ESC[0m
  - windows (7.0.2)ESC[0m
  - ohai (5.3.0)ESC[0m
  - openssl (4.4.0)ESC[0m
  - hostsfile (2.4.6)ESC[0m
  - ntp (2.0.3)ESC[0m
  - sudo (4.0.1)ESC[0m
  - ulimit (1.4.0)ESC[0m
  - ulimit2 (0.2.0)ESC[0m
  - elasticsearch (4.0.6)ESC[0m
  - chef-sugar (5.1.8)ESC[0m
  - apt (7.2.0)ESC[0m
  - yum (5.1.0)ESC[0m
  - ark (5.0.0)ESC[0m
  - seven_zip (3.1.2)ESC[0m
Installing Cookbook Gems:ESC[0m
Compiling Cookbooks...ESC[0m
Converging 52 resourcesESC[0m
Recipe: hops::ndbESC[0m
  * directory[/srv/hops/ndb-hops-3.2.0.0-RC4-7.6.12] action create (up to date)
  * link[/srv/hops/ndb-hops] action create (up to date)
  * remote_file[/tmp/chef-solo/flyway-commandline-5.0.3-linux-x64.tar.gz] action create_if_missing (up to date)
  * bash[unpack_flyway] action run (skipped due to not_if)
  * template[/srv/hops/ndb-hops/flyway/conf/flyway.conf] action create (up to date)
  * directory[/srv/hops/ndb-hops/flyway/undo] action create (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V0.0.2__initial_tables.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.2__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.3__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.4__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.5__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.6__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.7__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.8__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.9__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V2.8.2.10__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/flyway/sql/V3.2.0.0__hops.sql] action create_if_missing (up to date)
  * remote_file[/srv/hops/ndb-hops/ndb-dal-3.2.0.0-RC4-7.6.12.jar] action create_if_missing (up to date)
  * hops_ndb[extract_ndb_hops] action install_ndb_hops
    * link[/srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/common/lib/ndb-dal.jar] action delete
      ESC[32m- delete link to file at /srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/common/lib/ndb-dal.jarESC[0m
ESC[0m    * link[/srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/common/lib/ndb-dal.jar] action create
      ESC[32m- create symlink at /srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/common/lib/ndb-dal.jar to /srv/hops/ndb-hops/ndb-dal-3.2.0.0-RC4-7.6.12.jarESC[0m
      ESC[32m- change owner from 'root' to 'hdfs'ESC[0m
      ESC[32m- change group from 'root' to 'hadoop'ESC[0m
ESC[0m    * link[/srv/hops/hadoop-3.2.0.0-RC4/lib/native/libndbclient.so] action delete
      ESC[32m- delete link to file at /srv/hops/hadoop-3.2.0.0-RC4/lib/native/libndbclient.soESC[0m
ESC[0m    * link[/srv/hops/hadoop-3.2.0.0-RC4/lib/native/libndbclient.so] action create
      ESC[32m- create symlink at /srv/hops/hadoop-3.2.0.0-RC4/lib/native/libndbclient.so to /srv/hops/mysql/lib/libndbclient.soESC[0m
ESC[0m
ESC[0m  * link[/srv/hops/ndb-hops/ndb-dal.jar] action create (up to date)
  * template[/srv/hops/hadoop-3.2.0.0-RC4/etc/hadoop/ndb.props] action create (up to date)
  * hops_ndb[install] action install_hops
    * ndb_waiter[wait_mysql_started] action wait_until_cluster_ready
      * bash[wait_mysql_started] action run
        ESC[32m- execute "bash"  "/tmp/chef-script20201012-21230-6c584y"ESC[0m
ESC[0m
ESC[0m    * ndb_mysql_basic[mysqld_start_hop_install] action wait_until_started
      * bash[remove_mycnf_mysqld_start_hop_install] action run (skipped due to only_if)
      * bash[wait_mysqld_started] action run
        ESC[32m- execute "bash"  "/tmp/chef-script20201012-21230-1y1rs76"ESC[0m
ESC[0m
ESC[0m    * bash[mysql-install-hops] action run
      ESC[32m- execute "bash"  "/tmp/chef-script20201012-21230-k1kzap"ESC[0m
ESC[0m    * template[/srv/hops/ndb-hops/flyway.sql] action create (up to date)
    * bash[flyway_baseline] action run (skipped due to not_if)
    * bash[flyway_migrate] action run
      ESC[32m- execute "bash"  "/tmp/chef-script20201012-21230-mgm3i2"ESC[0m
ESC[0m
ESC[0m  * template[/srv/hops/hadoop-3.2.0.0-RC4/sbin/start-nn.sh] action create (up to date)
  * template[/srv/hops/hadoop-3.2.0.0-RC4/sbin/stop-nn.sh] action create (up to date)
  * template[/srv/hops/hadoop-3.2.0.0-RC4/sbin/restart-nn.sh] action create (up to date)
  * template[/srv/hops/hadoop-3.2.0.0-RC4/sbin/format-nn.sh] action create (up to date)
Recipe: java::notifyESC[0m
  * log[jdk-version-changed] action nothing (skipped due to action :nothing)
Recipe: java::openjdkESC[0m
  * apt_package[openjdk-8-jdk, openjdk-8-jre-headless] action install (up to date)
  * java_alternatives[set-java-alternatives] action set (up to date)
Recipe: java::default_java_symlinkESC[0m
  * link[/usr/lib/jvm/default-java] action create (up to date)
Recipe: java::set_java_homeESC[0m
  * directory[/etc/profile.d] action create (up to date)
  * template[/etc/profile.d/jdk.sh] action create (up to date)
  * ruby_block[Set JAVA_HOME in /etc/environment] action run
ESC[0mRecipe: hops::defaultESC[0m
  * template[/srv/hops/hadoop/etc/hadoop/log4j.properties] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/core-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/hadoop-env.sh] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/jmxremote.access] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/jmxremote.password] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/yarn-jmxremote.password] action create (up to date)
  * template[/srv/hops/hadoop/sbin/set-env.sh] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/hdfs-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/erasure-coding-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/yarn-site.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/resource-types.xml] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/container-executor.cfg] action create (up to date)
  * template[/srv/hops/hadoop/etc/hadoop/yarn-env.sh] action create (up to date)
  * bash[remove-hadoop-log-copy-cron] action run
    ESC[32m- execute "bash"  "/tmp/chef-script20201012-21230-1hgfqtj"ESC[0m
ESC[0m  * cron[copy_hadoop_logs] action create
    ESC[32m- add crontab entry for cron[copy_hadoop_logs]ESC[0m
ESC[0m  * cron[delete_hadoop_logs] action create
    ESC[32m- add crontab entry for cron[delete_hadoop_logs]ESC[0m
ESC[0m  * cookbook_file[/srv/hops/hadoop/etc/hadoop/namenode.yaml] action create (up to date)
Recipe: hops::formatESC[0m
  * hops_ndb[format-nn] action format_nn
    * bash[format-nn] action run (skipped due to not_if)
     (up to date)
  * bash[validate_formatting] action run
    ESC[0m
    ================================================================================ESC[0m
    ESC[31mError executing action `run` on resource 'bash[validate_formatting]'ESC[0m
    ================================================================================ESC[0m

ESC[0m    Mixlib::ShellOut::ShellCommandFailedESC[0m
    ------------------------------------ESC[0m
    Expected process to exit with [0], but received '1'
ESC[0m    ---- Begin output of "bash"  "/tmp/chef-script20201012-21230-19qu2fk" ----
ESC[0m    STDOUT:
ESC[0m    STDERR: mysql: [Warning] Using a password on the command line interface can be insecure.
ESC[0m    ---- End output of "bash"  "/tmp/chef-script20201012-21230-19qu2fk" ----
ESC[0m    Ran "bash"  "/tmp/chef-script20201012-21230-19qu2fk" returned 1ESC[0m

ESC[0m    Resource Declaration:ESC[0m
    ---------------------ESC[0m
    # In /tmp/chef-solo/cookbooks/hops/recipes/format.rb
ESC[0m
ESC[0m     22:     bash "validate_formatting" do
ESC[0m     23:       user "root"
ESC[0m     24:       code <<-EOF
ESC[0m     25:        #{exec} hops -e 'select count(*) from hdfs_variables' | tail -n 1 | egrep -v "^0$"
ESC[0m     26:       EOF
ESC[0m     27:     end
ESC[0m     28:   rescue
ESC[0m
ESC[0m    Compiled Resource:ESC[0m
    ------------------ESC[0m
    # Declared in /tmp/chef-solo/cookbooks/hops/recipes/format.rb:22:in `from_file'
ESC[0m
ESC[0m    bash("validate_formatting") do
ESC[0m      action [:run]
ESC[0m      default_guard_interpreter :default
ESC[0m      command nil
ESC[0m      backup 5
ESC[0m      interpreter "bash"
ESC[0m      declared_type :bash
ESC[0m      cookbook_name "hops"
ESC[0m      recipe_name "format"
ESC[0m      user "root"
ESC[0m      code "       /srv/hops/mysql-cluster/ndb/scripts/mysql-client.sh hops -e 'select count(*) from hdfs_variables' | tail -n 1 | egrep -v \"^0$\"\n"
ESC[0m      domain nil
ESC[0m    end
ESC[0m
ESC[0m    System Info:ESC[0m
    ------------ESC[0m
    chef_version=14.10.9
ESC[0m    platform=ubuntu
ESC[0m    platform_version=18.04
ESC[0m    ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
ESC[0m    program_name=/usr/bin/chef-solo
ESC[0m    executable=/opt/chefdk/bin/chef-soloESC[0m

ESC[0mESC[0m
Running handlers:ESC[0m
[2020-10-12T10:08:47+00:00] ERROR: Running exception handlers
Running handlers complete
ESC[0m[2020-10-12T10:08:47+00:00] ERROR: Exception handlers complete
Chef Client failed. 16 resources updated in 08 secondsESC[0m
[2020-10-12T10:08:47+00:00] FATAL: Stacktrace dumped to /tmp/chef-solo/chef-stacktrace.out
[2020-10-12T10:08:47+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2020-10-12T10:08:47+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: bash[validate_formatting] (hops::format line 22) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1
---- Begin output of "bash"  "/tmp/chef-script20201012-21230-19qu2fk" ----
STDOUT:
STDERR: mysql: [Warning] Using a password on the command line interface can be insecure.
---- End output of "bash"  "/tmp/chef-script20201012-21230-19qu2fk" ----
Ran "bash"  "/tmp/chef-script20201012-21230-19qu2fk" returned 1

There is no file /srv/hops/hadoop/logs/hadoop-hdfs-namenode-<HOST_NAME>.log, but no errors in /srv/hops/hadoop/logs/hadoop-hdfs-namenode.log.

Sorry, my mistake, the logs for formating the namenode should be in /srv/hops/hadoop/logs/hadoop.log could you check if there is any error logged in this file?

The logs you sent me above look like you have retried running all or part of the process, which may hide the original problem. Have you clicked retry at some point or tried to run the script again after it failed?

@Manuel I suspect the issue you are seeing is not related to the to the format itself, that’s just the manifestation of it.
Could you please check if the database is healthy? When the platform is up and running, you will be able to do so from Grafana, but as you are early in the installation you can run the following script:

/srv/hops/mysql-cluster/ndb/scripts/mgm-client.sh -e "show"

You should see something that looks like this:

Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=1	@192.168.215.102  (mysql-5.7.25 ndb-7.6.9, Nodegroup: 0, *)
id=2	@192.168.215.103  (mysql-5.7.25 ndb-7.6.9, Nodegroup: 1)

[ndb_mgmd(MGM)]	1 node(s)
id=49	@192.168.215.101  (mysql-5.7.25 ndb-7.6.9)

[mysqld(API)]	16 node(s)
id=52	@192.168.215.102  (mysql-5.7.25 ndb-7.6.9)

The IDs, IPs and Nodegroup will probably be different, but just make sure you they are running (if nothing is running you’ll see it accepting connections from)

Can you also send zip all the logs in /srv/hops/hadoop/logs and send them here?

If the database is not running, can you send the logs in /srv/hops/mysql-cluster/log?


Fabio

No, no errors in /srv/hops/hadoop/logs/hadoop.log.

Yes, I did run the script again, by running setsid ./bin/karamel -headless -launch ../cluster-defns/hopsworks-installation.yml > ../installation.log 2>&1 &. Is that not recommended? There is very little information in the installation documentation about this (while there is a lot of redundant information).

@Fabio I did not see the other machines there, but that gave me an idea. My security group only opened port 22 on the machines, but I assume other ports are required as well. Would be good to specify that in the documentation. In any case, I have now opened all ports and am starting from scratch, with fresh instances.
The first error is the same as always:

[2020-10-16T07:32:38+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: bash[run_conda_installer_#{d}] (conda::install line 112) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of "bash"  "/tmp/chef-script20201016-20745-bjhwal" ----
STDOUT:
STDERR: mv: cannot stat '/srv/hops/anaconda/anaconda-py37-4.8.3/envs': No such file or directory
---- End output of "bash"  "/tmp/chef-script20201016-20745-bjhwal" ----
Ran "bash"  "/tmp/chef-script20201016-20745-bjhwal" returned 1
ERROR [2020-10-16 07:32:43,341] se.kth.karamel.backend.machines.SshMachine: -------------------------------------------------------------------------------

ERROR [2020-10-16 07:32:43,341] se.kth.karamel.backend.machines.SshMachine: End Log for Failed: 'conda::install' '10.0.57.158'
ERROR [2020-10-16 07:32:43,341] se.kth.karamel.backend.machines.SshMachine: -------------------------------------------------------------------------------

So far, I’ve always just manually fixed it by creating this directory.

This time, I also noticed there was an earlier error:

Recipe: java::set_java_homeESC[0m
  * directory[/etc/profile.d] action create (up to date)
  * template[/etc/profile.d/jdk.sh] action create (up to date)
  * ruby_block[Set JAVA_HOME in /etc/environment] action run
    ESC[32m- execute the ruby block Set JAVA_HOME in /etc/environmentESC[0m
ESC[0mRecipe: hops::dockerESC[0m
  * apt_package[docker.io] action install
    ESC[0m
    ================================================================================ESC[0m
    ESC[31mError executing action `install` on resource 'apt_package[docker.io]'ESC[0m
    ================================================================================ESC[0m

ESC[0m    Mixlib::ShellOut::ShellCommandFailedESC[0m
    ------------------------------------ESC[0m
    Expected process to exit with [0], but received '100'
ESC[0m    ---- Begin output of ["apt-get", "-q", "-y", "--allow-downgrades", "-o", "Dpkg::Options::=--force-confdef", "-o", "Dpkg::Options::=--force-confold", "install", "docker.io=19.03.6-0ubuntu1~18.04.1"] ----
ESC[0m    STDOUT: Reading package lists...
ESC[0m    Building dependency tree...
ESC[0m    Reading state information...
ESC[0m    STDERR: E: Version '19.03.6-0ubuntu1~18.04.1' for 'docker.io' was not found
ESC[0m    ---- End output of ["apt-get", "-q", "-y", "--allow-downgrades", "-o", "Dpkg::Options::=--force-confdef", "-o", "Dpkg::Options::=--force-confold", "install", "docker.io=19.03.6-0ubuntu1~18.04.1"] ----
ESC[0m    Ran ["apt-get", "-q", "-y", "--allow-downgrades", "-o", "Dpkg::Options::=--force-confdef", "-o", "Dpkg::Options::=--force-confold", "install", "docker.io=19.03.6-0ubuntu1~18.04.1"] returned 100ESC[0m

I don’t think I had seen it before, or maybe I missed it.

Which version of Ubuntu are you running?

Nevermind, it’s an issue on our side. Pushing a fix ASAP.

@Manuel: Bumped the dependency version of docker, should be working now.

Yes, I can confirm the docker issue is fixed now! Any plans to fix the other problem with the missing directory /srv/hops/anaconda/anaconda-py37-4.8.3/envs? It’s not a stopper, but a bit annoying.