On-Premise (Ubuntu 18.04) installer fails on tensorflow

I attempted to use the Hopsworks installer on a new Ubuntu 18.04 machine doing a single-host installation. The installation failed on the tensorflow step.

Install command:
./hopsworks-installer.sh -pwd changeme

Options Selected:
(1) Install a single-host Hopsworks cluster.
(1) On-premises or private cloud.

tensorflow__install.log:

[sudo] password for nbowyer: Starting Chef Client, version 14.10.9e[0m
resolving cookbooks for run list: ["tensorflow::install"]e[0m
Synchronizing Cookbooks:e[0m
  - tensorflow (1.4.0)e[0m
  - java (7.0.0)e[0m
  - magic_shell (1.0.0)e[0m
  - build-essential (8.2.1)e[0m
  - zip (1.1.0)e[0m
  - apt (7.2.0)e[0m
  - homebrew (5.0.8)e[0m
  - kagent (1.4.0)e[0m
  - ndb (1.4.0)e[0m
  - hops (1.4.0)e[0m
  - conda (1.4.0)e[0m
  - windows (7.0.2)e[0m
  - seven_zip (3.1.2)e[0m
  - openssl (4.4.0)e[0m
  - mingw (2.1.1)e[0m
  - hostsfile (2.4.6)e[0m
  - ntp (2.0.3)e[0m
  - sudo (4.0.1)e[0m
  - consul (1.4.0)e[0m
  - ulimit (1.4.0)e[0m
  - sysctl (1.0.5)e[0m
  - cmake (0.3.0)e[0m
  - kzookeeper (1.4.0)e[0m
  - elastic (1.4.0)e[0m
  - chef-sugar (5.1.8)e[0m
  - ohai (5.3.0)e[0m
  - ulimit2 (0.2.0)e[0m
  - elasticsearch (4.0.6)e[0m
  - yum (5.1.0)e[0m
  - ark (5.0.0)e[0m
Installing Cookbook Gems:e[0m
Compiling Cookbooks...e[0m
Converging 9 resourcese[0m
Recipe: tensorflow::installe[0m
  * apt_package[pkg-config, zip, g++, zlib1g-dev, unzip, swig, git, build-essential, cmake, unzip, libopenblas-dev, liblapack-dev, linux-image-4.15.0-121-generic, linux-headers-4.15.0-121-generic, python2.7, python2.7-numpy, python2.7-dev, python-pip, python2.7-lxml, python-pillow, libcupti-dev, libcurl3-dev, python-wheel, python-six, pciutils] action install
    e[32m- install version 0.29.1-0ubuntu2 of package pkg-confige[0m
    e[32m- install version 3.0-11build1 of package zipe[0m
    e[32m- install version 1:1.2.11.dfsg-0ubuntu2 of package zlib1g-deve[0m
    e[32m- install version 3.0.12-1 of package swige[0m
    e[32m- install version 3.10.2-1ubuntu2.18.04.1 of package cmakee[0m
    e[32m- install version 0.2.20+ds-4 of package libopenblas-deve[0m
    e[32m- install version 3.7.1-4ubuntu1 of package liblapack-deve[0m
    e[32m- install version 1:1.13.3-2ubuntu1 of package python2.7-numpye[0m
    e[32m- install version 9.0.1-2.3~ubuntu1.18.04.3 of package python-pipe[0m
    e[32m- install version 4.2.1-1ubuntu0.1 of package python2.7-lxmle[0m
    e[32m- install version 5.1.0-1ubuntu0.3 of package python-pillowe[0m
    e[32m- install version 9.1.85-3ubuntu1 of package libcupti-deve[0m
    e[32m- install version 7.58.0-2ubuntu3.10 of package libcurl3-deve[0m
    e[32m- install version 0.30.0-0.2 of package python-wheele[0m
    e[32m- install version 1.11.0-2 of package python-sixe[0m
e[0mRecipe: java::notifye[0m
  * log[jdk-version-changed] action nothing (skipped due to action :nothing)
Recipe: java::openjdke[0m
  * apt_package[openjdk-8-jdk, openjdk-8-jre-headless] action install (up to date)
  * java_alternatives[set-java-alternatives] action set (up to date)
Recipe: java::default_java_symlinke[0m
  * link[/usr/lib/jvm/default-java] action create (up to date)
Recipe: java::set_java_homee[0m
  * directory[/etc/profile.d] action create (up to date)
  * template[/etc/profile.d/jdk.sh] action create (up to date)
  * ruby_block[Set JAVA_HOME in /etc/environment] action run
    e[32m- execute the ruby block Set JAVA_HOME in /etc/environmente[0m
e[0mRecipe: tensorflow::installe[0m
  * magic_shell_environment[HADOOP_HDFS_HOME] action add
  Recipe: <Dynamically Defined Resource>e[0m
    * file[/etc/profile.d/HADOOP_HDFS_HOME.sh] action create
      e[32m- create new file /etc/profile.d/HADOOP_HDFS_HOME.she[0m
      e[32m- update content in file /etc/profile.d/HADOOP_HDFS_HOME.sh from none to 5e6d2ae[0m
      e[37m--- /etc/profile.d/HADOOP_HDFS_HOME.sh	2020-10-15 19:56:54.116977346 +0000e[0m
      e[37m+++ /etc/profile.d/.chef-HADOOP_HDFS_HOME20201015-22945-1bif74l.sh	2020-10-15 19:56:54.116977346 +0000e[0m
      e[37m@@ -1 +1,7 @@e[0m
      e[37m+    #e[0m
      e[37m+    # This file was generated by Chef for fstoree[0m
      e[37m+    # Do NOT modify this file by hand!e[0m
      e[37m+    #e[0m
      e[37m+e[0m
      e[37m+    export HADOOP_HDFS_HOME="/srv/hops/hadoop"e[0m
      e[32m- change mode from '' to '0755'e[0m
      e[32m- change owner from '' to 'root'e[0m
      e[32m- change group from '' to 'root'e[0m
e[0m     (up to date)
e[0m
Running handlers:e[0m
Running handlers complete
e[0me[0m
Deprecated features used!e[0m
  Resource openssl_dhparam from a cookbook is overriding the resource from the client. Please upgrade your cookbook or remove the cookbook from your run_list before the next major release of Chef. at 1 location:e[0m
    - /opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/log.rb:51:in `caller_location'e[0m
   See https://docs.chef.io/deprecations_map_collision.html for further details.e[0m
  Resource openssl_rsa_key from a cookbook is overriding the resource from the client. Please upgrade your cookbook or remove the cookbook from your run_list before the next major release of Chef. at 1 location:e[0m
    - /opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/log.rb:51:in `caller_location'e[0m
   See https://docs.chef.io/deprecations_map_collision.html for further details.e[0m
  Resource sudo from a cookbook is overriding the resource from the client. Please upgrade your cookbook or remove the cookbook from your run_list before the next major release of Chef. at 1 location:e[0m
    - /opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/log.rb:51:in `caller_location'e[0m
   See https://docs.chef.io/deprecations_map_collision.html for further details.e[0m
  Resource sysctl_param from a cookbook is overriding the resource from the client. Please upgrade your cookbook or remove the cookbook from your run_list before the next major release of Chef. at 1 location:e[0m
    - /opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/log.rb:51:in `caller_location'e[0m
   See https://docs.chef.io/deprecations_map_collision.html for further details.e[0m
e[0m
Chef Client finished, 3/10 resources updated in 01 minutes 05 secondse[0m

tensorflow__default.log:

[sudo] password for nbowyer: Starting Chef Client, version 14.10.9e[0m
resolving cookbooks for run list: ["tensorflow::default"]e[0m
Synchronizing Cookbooks:e[0m
  - tensorflow (1.4.0)e[0m
  - java (7.0.0)e[0m
  - magic_shell (1.0.0)e[0m
  - zip (1.1.0)e[0m
  - build-essential (8.2.1)e[0m
  - apt (7.2.0)e[0m
  - homebrew (5.0.8)e[0m
  - kagent (1.4.0)e[0m
  - ndb (1.4.0)e[0m
  - hops (1.4.0)e[0m
  - conda (1.4.0)e[0m
  - windows (7.0.2)e[0m
  - seven_zip (3.1.2)e[0m
  - mingw (2.1.1)e[0m
  - openssl (4.4.0)e[0m
  - hostsfile (2.4.6)e[0m
  - ntp (2.0.3)e[0m
  - sudo (4.0.1)e[0m
  - consul (1.4.0)e[0m
  - ulimit (1.4.0)e[0m
  - sysctl (1.0.5)e[0m
  - cmake (0.3.0)e[0m
  - kzookeeper (1.4.0)e[0m
  - elastic (1.4.0)e[0m
  - chef-sugar (5.1.8)e[0m
  - ohai (5.3.0)e[0m
  - ulimit2 (0.2.0)e[0m
  - elasticsearch (4.0.6)e[0m
  - yum (5.1.0)e[0m
  - ark (5.0.0)e[0m
Installing Cookbook Gems:e[0m
Compiling Cookbooks...e[0m
Converging 8 resourcese[0m
Recipe: tensorflow::defaulte[0m
  * remote_file[/tmp/chef-solo/demo-1.4.1.tar.gz] action create (up to date)
  * bash[extract_notebooks] action run
    e[32m- execute "bash"  "/tmp/chef-script20201016-19082-x890bs"e[0m
e[0m  * hops_hdfs_directory[/user/hdfs/tensorflow_demo] action create_as_superuser
    * bash[mk-dir-/user/hdfs/tensorflow_demo] action run (skipped due to not_if)
     (up to date)
  * hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] action replace_as_superuser
    * hops_hdfs_directory[/user/hdfs/tensorflow_demo] action rm_as_superuser
      * bash[rm-/user/hdfs/tensorflow_demo] action run
        e[32m- execute "bash"  "/tmp/chef-script20201016-19082-1r2v52x"e[0m
e[0m    
e[0m    * hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] action put_as_superuser
      * bash[hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] action run
        e[0m
        ================================================================================e[0m
        e[31mError executing action `run` on resource 'bash[hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo]'e[0m
        ================================================================================e[0m
        
e[0m        Mixlib::ShellOut::CommandTimeoute[0m
        --------------------------------e[0m
        Command timed out after 3600s:
e[0m        Command exceeded allowed execution time, process terminated
e[0m        ---- Begin output of "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" ----
e[0m        STDOUT: 
e[0m        STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m        copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m        copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m        copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m        copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m        copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m        copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m        copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m        ---- End output of "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" ----
e[0m        Ran "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" returned e[0m
        
e[0m        Resource Declaration:e[0m
        ---------------------e[0m
        # In /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb
e[0m        
e[0m         53:   bash "hdfs-put-dir-#{new_resource.name}" do
e[0m         54:     user node['hops']['hdfs']['user']
e[0m         55:     group node['hops']['group']
e[0m         56:     code <<-EOF
e[0m         57:      EXISTS=1
e[0m         58:      . #{node['hops']['base_dir']}/sbin/set-env.sh
e[0m         59:      if [ -z $ISDIR ] ; then
e[0m         60:         #{node['hops']['base_dir']}/bin/hdfs dfs -test -e #{new_resource.dest}
e[0m         61:         EXISTS=$?
e[0m         62:      else
e[0m         63:         #{node['hops']['base_dir']}/bin/hdfs dfs -test -f #{new_resource.dest}
e[0m         64:         EXISTS=$?
e[0m         65:      fi
e[0m         66:      if ([ $EXISTS -ne 0 ] || [ #{new_resource.isDir} ]) ; then
e[0m         67:         #{node['hops']['base_dir']}/bin/hdfs dfs -copyFromLocal #{new_resource.name} #{new_resource.dest}
e[0m         68:         #{node['hops']['base_dir']}/bin/hdfs dfs -chown #{new_resource.owner} #{new_resource.dest}
e[0m         69:         #{node['hops']['base_dir']}/bin/hdfs dfs -chgrp #{new_resource.group} #{new_resource.dest}
e[0m         70:         if [ "#{new_resource.mode}" != "" ] ; then
e[0m         71:            #{node['hops']['base_dir']}/bin/hadoop fs -chmod #{new_resource.mode} #{new_resource.dest}
e[0m         72:         fi
e[0m         73:      fi
e[0m         74:     EOF
e[0m         75:   end
e[0m         76: end
e[0m        
e[0m        Compiled Resource:e[0m
        ------------------e[0m
        # Declared in /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb:53:in `block in class_from_file'
e[0m        
e[0m        bash("hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo") do
e[0m          action [:run]
e[0m          default_guard_interpreter :default
e[0m          command nil
e[0m          backup 5
e[0m          interpreter "bash"
e[0m          declared_type :bash
e[0m          cookbook_name "tensorflow"
e[0m          user "hdfs"
e[0m          code "     EXISTS=1\n     . /srv/hops/hadoop/sbin/set-env.sh\n     if [ -z $ISDIR ] ; then\n        /srv/hops/hadoop/bin/hdfs dfs -test -e /user/hdfs/tensorflow_demo\n        EXISTS=$?\n     else\n        /srv/hops/hadoop/bin/hdfs dfs -test -f /user/hdfs/tensorflow_demo\n        EXISTS=$?\n     fi\n     if ([ $EXISTS -ne 0 ] || [ false ]) ; then\n        /srv/hops/hadoop/bin/hdfs dfs -copyFromLocal /tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo /user/hdfs/tensorflow_demo\n        /srv/hops/hadoop/bin/hdfs dfs -chown hdfs /user/hdfs/tensorflow_demo\n        /srv/hops/hadoop/bin/hdfs dfs -chgrp hadoop /user/hdfs/tensorflow_demo\n        if [ \"1755\" != \"\" ] ; then\n           /srv/hops/hadoop/bin/hadoop fs -chmod 1755 /user/hdfs/tensorflow_demo\n        fi\n     fi\n"
e[0m          domain nil
e[0m          group "hadoop"
e[0m        end
e[0m        
e[0m        System Info:e[0m
        ------------e[0m
        chef_version=14.10.9
e[0m        platform=ubuntu
e[0m        platform_version=18.04
e[0m        ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
e[0m        program_name=/usr/bin/chef-solo
e[0m        executable=/opt/chefdk/bin/chef-soloe[0m
        
e[0m      e[0m
      ================================================================================e[0m
      e[31mError executing action `put_as_superuser` on resource 'hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo]'e[0m
      ================================================================================e[0m
      
e[0m      Mixlib::ShellOut::CommandTimeoute[0m
      --------------------------------e[0m
      bash[hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::CommandTimeout: Command timed out after 3600s:
e[0m      Command exceeded allowed execution time, process terminated
e[0m      ---- Begin output of "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" ----
e[0m      STDOUT: 
e[0m      STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m      copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m      copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m      copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m      copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m      copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m      copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m      copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m      ---- End output of "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" ----
e[0m      Ran "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" returned e[0m
      
e[0m      Resource Declaration:e[0m
      ---------------------e[0m
      # In /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb
e[0m      
e[0m      143:   hops_hdfs_directory "#{new_resource.name}" do
e[0m      144:     owner "#{new_resource.owner}"
e[0m      145:     group "#{new_resource.group}"
e[0m      146:     mode "#{new_resource.mode}"
e[0m      147:     dest "#{new_resource.dest}"
e[0m      148:     action :put_as_superuser
e[0m      149:   end
e[0m      150: 
e[0m      
e[0m      Compiled Resource:e[0m
      ------------------e[0m
      # Declared in /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb:143:in `block in class_from_file'
e[0m      
e[0m      hops_hdfs_directory("/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo") do
e[0m        action [:put_as_superuser]
e[0m        default_guard_interpreter :default
e[0m        declared_type :hops_hdfs_directory
e[0m        cookbook_name "tensorflow"
e[0m        owner "hdfs"
e[0m        group "hadoop"
e[0m        mode "1755"
e[0m        dest "/user/hdfs/tensorflow_demo"
e[0m      end
e[0m      
e[0m      System Info:e[0m
      ------------e[0m
      chef_version=14.10.9
e[0m      platform=ubuntu
e[0m      platform_version=18.04
e[0m      ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
e[0m      program_name=/usr/bin/chef-solo
e[0m      executable=/opt/chefdk/bin/chef-soloe[0m
      
e[0m    e[0m
    ================================================================================e[0m
    e[31mError executing action `replace_as_superuser` on resource 'hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo]'e[0m
    ================================================================================e[0m
    
e[0m    Mixlib::ShellOut::CommandTimeoute[0m
    --------------------------------e[0m
    hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 143) had an error: Mixlib::ShellOut::CommandTimeout: bash[hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::CommandTimeout: Command timed out after 3600s:
e[0m    Command exceeded allowed execution time, process terminated
e[0m    ---- Begin output of "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" ----
e[0m    STDOUT: 
e[0m    STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m    copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m    copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m    copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m    copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m    copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m    copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m    copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m    ---- End output of "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" ----
e[0m    Ran "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" returned e[0m
    
e[0m    Resource Declaration:e[0m
    ---------------------e[0m
    # In /tmp/chef-solo/cookbooks/tensorflow/recipes/default.rb
e[0m    
e[0m     37:   hops_hdfs_directory "#{Chef::Config['file_cache_path']}/#{node['tensorflow']['hopstfdemo_dir']}-#{node['tensorflow']['examples_version']}/#{node['tensorflow']['hopstfdemo_dir']}" do
e[0m     38:     action :replace_as_superuser
e[0m     39:     owner node['hops']['hdfs']['user']
e[0m     40:     group node['hops']['group']
e[0m     41:     mode "1755"
e[0m     42:     dest "/user/#{node['hops']['hdfs']['user']}/#{node['tensorflow']['hopstfdemo_dir']}"
e[0m     43:   end
e[0m     44: 
e[0m    
e[0m    Compiled Resource:e[0m
    ------------------e[0m
    # Declared in /tmp/chef-solo/cookbooks/tensorflow/recipes/default.rb:37:in `from_file'
e[0m    
e[0m    hops_hdfs_directory("/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo") do
e[0m      action [:replace_as_superuser]
e[0m      updated true
e[0m      updated_by_last_action true
e[0m      default_guard_interpreter :default
e[0m      declared_type :hops_hdfs_directory
e[0m      cookbook_name "tensorflow"
e[0m      recipe_name "default"
e[0m      owner "hdfs"
e[0m      group "hadoop"
e[0m      mode "1755"
e[0m      dest "/user/hdfs/tensorflow_demo"
e[0m    end
e[0m    
e[0m    System Info:e[0m
    ------------e[0m
    chef_version=14.10.9
e[0m    platform=ubuntu
e[0m    platform_version=18.04
e[0m    ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
e[0m    program_name=/usr/bin/chef-solo
e[0m    executable=/opt/chefdk/bin/chef-soloe[0m
    
e[0me[0m
Running handlers:e[0m
[2020-10-16T15:19:14+00:00] ERROR: Running exception handlers
Running handlers complete
e[0m[2020-10-16T15:19:14+00:00] ERROR: Exception handlers complete
Chef Client failed. 3 resources updated in 01 hours 01 minutes 03 secondse[0m
[2020-10-16T15:19:14+00:00] FATAL: Stacktrace dumped to /tmp/chef-solo/chef-stacktrace.out
[2020-10-16T15:19:14+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2020-10-16T15:19:14+00:00] FATAL: Mixlib::ShellOut::CommandTimeout: hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (tensorflow::default line 37) had an error: Mixlib::ShellOut::CommandTimeout: hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 143) had an error: Mixlib::ShellOut::CommandTimeout: bash[hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::CommandTimeout: Command timed out after 3600s:
Command exceeded allowed execution time, process terminated
---- Begin output of "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" ----
STDOUT: 
STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
---- End output of "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" ----
Ran "bash"  "/tmp/chef-script20201016-19082-1c0a6xc" returned 

These logs are after I attempted a “Retry” on the tensorflow item through the Karamel web UI, but the errors or logs look the same or similar to what I saw in the logs on the initial install attempt.

Hi @nathanb. These logs suggests that something has gone wrong with the installation of the filesystem. Could you please check the logs in /srv/hops/hadoop/logs ? Since you’re doing a single node installation both the namenode and datanode logs should exist in the form of hadoop-hdfs-{namenode,datanode}-HOSTNAME.log Is there anything worrying in these logs?

Kind regards,
Antonios

It looks like hadoop-hdfs-namenode-fstore.log has already logrotated out the day of install. Its been repeating the following:

2020-10-16 14:31:19,955 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in whirlingLikeASufi
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(192.168.100.250:50010, datanodeUuid=c997c317-a0c4-4932-a835-ac73a8cced42, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-50;cid=CID-23887b42-d391-4421-92d1-a2c1b639f6eb;nsid=911;c=1602792561278) is attempting to report storage ID c997c317-a0c4-4932-a835-ac73a8cced42. Node 127.0.0.1:50010 is expected to serve this storage.
        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:511)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4300)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5521)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1073)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:240)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:35613)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:868)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:814)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1821)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2900)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1523)
        at org.apache.hadoop.ipc.Client.call(Client.java:1469)
        at org.apache.hadoop.ipc.Client.call(Client.java:1379)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy18.blockReceivedAndDeleted(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:267)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReceivedAndDeleted(BPServiceActor.java:619)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService$2.doAction(BPOfferService.java:1320)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.doActorActionWithRetry(BPOfferService.java:1373)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.blockReceivedAndDeletedWithRetry(BPOfferService.java:1317)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.reportReceivedDeletedBlocks(BPOfferService.java:878)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.whirlingLikeASufi(BPOfferService.java:805)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.run(BPOfferService.java:1284)
        at java.lang.Thread.run(Thread.java:748)
2020-10-16 14:31:20,956 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sending blockReceivedAndDeletedWithRetry for blocks [ 10000blk_10000_1001, status: RECEIVED_BLOCK, delHint:  10001blk_10001_1001, status: RECEIVED_BLOCK, delHint:  10002blk_10002_1001, status: RECEIVED_BLOCK, delHint:  10003blk_10003_1001, status: RECEIVED_BLOCK, delHint:  10004blk_10004_1001, status: RECEIVED_BLOCK, delHint:  10005blk_10005_1001, status: RECEIVED_BLOCK, delHint:  10006blk_10006_1001, status: RECEIVED_BLOCK, delHint:  10007blk_10007_1001, status: RECEIVED_BLOCK, delHint:  10008blk_10008_1001, status: RECEIVED_BLOCK, delHint:  10009blk_10009_1001, status: RECEIVED_BLOCK, delHint:  10010blk_10010_1001, status: RECEIVED_BLOCK, delHint:  10011blk_10011_1001, status: RECEIVED_BLOCK, delHint:  10012blk_10012_1001, status: RECEIVED_BLOCK, delHint:  10013blk_10013_1001, status: RECEIVED_BLOCK, delHint:  10014blk_10014_1001, status: RECEIVED_BLOCK, delHint:  10015blk_10015_1001, status: RECEIVED_BLOCK, delHint:  10016blk_10016_1001, status: RECEIVED_BLOCK, delHint:  10017blk_10017_1001, status: RECEIVED_BLOCK, delHint:  10018blk_10018_1001, status: RECEIVED_BLOCK, delHint:  10019blk_10019_1001, status: RECEIVED_BLOCK, delHint:  10020blk_10020_1001, status: RECEIVED_BLOCK, delHint:  10021blk_10021_1001, status: RECEIVED_BLOCK, delHint:  10022blk_10022_1001, status: RECEIVED_BLOCK, delHint:  10023blk_10023_1001, status: RECEIVED_BLOCK, delHint:  10024blk_10024_1001, status: RECEIVED_BLOCK, delHint:  10025blk_10025_1001, status: RECEIVED_BLOCK, delHint:  10026blk_10026_1001, status: RECEIVED_BLOCK, delHint:  10027blk_10027_1001, status: RECEIVED_BLOCK, delHint:  10028blk_10028_1001, status: RECEIVED_BLOCK, delHint: ]

hadoop-hdfs-namenode-fstore.log.2:

STARTUP_MSG:   build = git@github.com:hopshadoop/hops.git -r 5e3672f34a246afc247b6b89176465874b9dd48e; compiled by 'jenkins' on 2020-10-02T09:51Z
STARTUP_MSG:   java = 1.8.0_265
************************************************************/
2020-10-15 21:09:34,407 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2020-10-15 21:09:34,467 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode []
2020-10-15 21:09:34,531 WARN org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-namenode.properties,hadoop-metrics2.properties
2020-10-15 21:09:34,558 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2020-10-15 21:09:34,558 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started
2020-10-15 21:09:34,656 INFO io.hops.resolvingcache.Cache: starting Resolving Cache [InMemoryCache]
2020-10-15 21:09:34,688 INFO io.hops.metadata.ndb.ClusterjConnector: Database connect string: 192.168.100.250:1186 
2020-10-15 21:09:34,688 INFO io.hops.metadata.ndb.ClusterjConnector: Database name: hops
2020-10-15 21:09:34,688 INFO io.hops.metadata.ndb.ClusterjConnector: Max Transactions: 1024
2020-10-15 21:09:34,688 INFO io.hops.metadata.ndb.DBSessionProvider: Database connect string: 192.168.100.250:1186 
2020-10-15 21:09:34,688 INFO io.hops.metadata.ndb.DBSessionProvider: Database name: hops
2020-10-15 21:09:34,689 INFO io.hops.metadata.ndb.DBSessionProvider: Max Transactions: 1024
2020-10-15 21:09:35,789 INFO io.hops.security.UsersGroups: UsersGroups Initialized.
2020-10-15 21:09:35,907 INFO org.apache.hadoop.hdfs.DFSUtil: Starting Web-server for hdfs at: http://0.0.0.0:50070
2020-10-15 21:09:35,923 INFO org.eclipse.jetty.util.log: Logging initialized @2235ms
2020-10-15 21:09:36,006 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2020-10-15 21:09:36,008 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.namenode is not defined
2020-10-15 21:09:36,015 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2020-10-15 21:09:36,016 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context hdfs
2020-10-15 21:09:36,016 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2020-10-15 21:09:36,016 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2020-10-15 21:09:36,034 INFO org.apache.hadoop.http.HttpServer2: Added filter 'org.apache.hadoop.hdfs.web.AuthFilter' (class=org.apache.hadoop.hdfs.web.AuthFilter)
2020-10-15 21:09:36,035 INFO org.apache.hadoop.http.HttpServer2: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
2020-10-15 21:09:36,037 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 50070
2020-10-15 21:09:36,038 INFO org.eclipse.jetty.server.Server: jetty-9.3.24.v20180605, build timestamp: 2018-06-05T17:11:56Z, git hash: 84205aa28f11a4f31f2a3b86d1bba2cc8ab69827
2020-10-15 21:09:36,060 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@482d776b{/logs,file:///srv/hops/hadoop-3.2.0.0-RC4/logs/,AVAILABLE}
2020-10-15 21:09:36,060 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@132ddbab{/static,file:///srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/hdfs/webapps/static/,AVAILABLE}
2020-10-15 21:09:36,139 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.w.WebAppContext@47289387{/,file:///srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/hdfs/webapps/hdfs/,AVAILABLE}{/hdfs}
2020-10-15 21:09:36,143 INFO org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@7caa550{HTTP/1.1,[http/1.1]}{0.0.0.0:50070}
2020-10-15 21:09:36,143 INFO org.eclipse.jetty.server.Server: Started @2456ms
2020-10-15 21:09:36,161 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: No KeyProvider found.
2020-10-15 21:09:36,199 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
2020-10-15 21:09:36,199 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2020-10-15 21:09:36,201 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2020-10-15 21:09:36,201 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: The block deletion will start around 2020 Oct 15 21:09:36
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: dfs.block.access.token.enable=false
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: defaultReplication         = 3
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplication             = 512
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: minReplication             = 1
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplicationStreams      = 50
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: replicationRecheckInterval = 3000
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: encryptDataTransfer        = false
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: slicerBatchSize            = 500
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: misReplicatedNoOfBatchs    = 20
2020-10-15 21:09:36,206 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: slicerNbOfBatchs           = 20
2020-10-15 21:09:36,212 INFO com.zaxxer.hikari.HikariDataSource: HikariCP pool HikariPool-0 is starting.
2020-10-15 21:09:36,461 WARN io.hops.common.IDsGeneratorFactory: Called setConfiguration more than once.
2020-10-15 21:09:36,465 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner             = hdfs (auth:SIMPLE)
2020-10-15 21:09:36,465 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: superGroup          = hdfs
2020-10-15 21:09:36,465 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled = true
2020-10-15 21:09:36,466 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Append Enabled: true
2020-10-15 21:09:36,519 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: ACLs enabled? true
2020-10-15 21:09:36,519 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: XAttrs enabled? true
2020-10-15 21:09:36,520 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Maximum size of an xattr: 1039755
2020-10-15 21:09:36,520 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: The maximum number of xattrs per inode is set to 32
2020-10-15 21:09:36,520 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
2020-10-15 21:09:36,526 INFO org.apache.hadoop.hdfs.server.namenode.top.metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2020-10-15 21:09:36,526 INFO org.apache.hadoop.hdfs.server.namenode.top.metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2020-10-15 21:09:36,526 INFO org.apache.hadoop.hdfs.server.namenode.top.metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2020-10-15 21:09:36,532 INFO org.apache.hadoop.hdfs.server.namenode.NameCache: initialized with 0 entries 0 lookups
2020-10-15 21:09:36,641 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: RPC server is binding to 0.0.0.0:8020
2020-10-15 21:09:36,647 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 12000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2020-10-15 21:09:36,657 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8020
2020-10-15 21:09:36,657 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #2 for port 8020
2020-10-15 21:09:36,657 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #3 for port 8020
2020-10-15 21:09:36,794 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
2020-10-15 21:09:36,810 INFO io.hops.leaderElection.LETransaction: LE Status: id 2 I am a NON_LEADER process 
2020-10-15 21:09:38,825 INFO io.hops.leaderElection.LETransaction: LE Status: id 2 I can be the leader but I have weak locks. Retry with stronger lock
2020-10-15 21:09:38,825 INFO io.hops.leaderElection.LETransaction: LE Status: id 2 periodic update. Stronger locks requested in next round
2020-10-15 21:09:38,827 INFO io.hops.leaderElection.LETransaction: LE Status: id 2 I am the new LEADER. 
2020-10-15 21:09:38,912 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemState MBean
2020-10-15 21:09:39,938 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: cealring the safe blocks tabl, this may take some time.
2020-10-15 21:09:39,945 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2020-10-15 21:09:39,945 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
2020-10-15 21:09:39,945 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
2020-10-15 21:09:39,953 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Number of blocks under construction: 0
2020-10-15 21:09:39,960 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 3 secs
2020-10-15 21:09:39,962 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes
2020-10-15 21:09:39,963 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks
2020-10-15 21:09:39,964 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: cealring the safe blocks tabl, this may take some time.
2020-10-15 21:09:39,970 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
2020-10-15 21:09:39,996 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2020-10-15 21:09:39,996 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8020: starting
2020-10-15 21:09:40,005 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Leader Node RPC up at: fstore/127.0.1.1:8020
2020-10-15 21:09:40,144 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state
2020-10-15 21:09:40,144 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest edits from old active before taking over writer role in edits logs
2020-10-15 21:09:40,144 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all datandoes as stale
2020-10-15 21:09:40,144 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication and invalidation queues
2020-10-15 21:09:40,144 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: initializing replication queues
2020-10-15 21:09:40,155 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
2020-10-15 21:09:40,166 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: processMisReplicated read  0/10000 in the Ids range [0 - 10000] (max inodeId when the process started: 7)
2020-10-15 21:09:40,216 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of blocks            = 0
2020-10-15 21:09:40,216 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid blocks          = 0
2020-10-15 21:09:40,216 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of under-replicated blocks = 0
2020-10-15 21:09:40,216 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of  over-replicated blocks = 0
2020-10-15 21:09:40,216 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks being written    = 0
2020-10-15 21:09:40,216 INFO org.apache.hadoop.hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 68 msec
2020-10-15 21:09:40,708 INFO org.apache.hadoop.fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 60 minutes.
2020-10-15 21:10:14,089 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(192.168.100.250:50010, datanodeUuid=c997c317-a0c4-4932-a835-ac73a8cced42, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-50;cid=CID-23887b42-d391-4421-92d1-a2c1b639f6eb;nsid=911;c=1602792561278) storage c997c317-a0c4-4932-a835-ac73a8cced42
2020-10-15 21:10:14,090 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
2020-10-15 21:10:14,091 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.100.250:50010
2020-10-15 21:10:14,144 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(127.0.0.1:50010, datanodeUuid=c997c317-a0c4-4932-a835-ac73a8cced42, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-50;cid=CID-23887b42-d391-4421-92d1-a2c1b639f6eb;nsid=911;c=1602792561278) storage c997c317-a0c4-4932-a835-ac73a8cced42
2020-10-15 21:10:14,145 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* registerDatanode: 192.168.100.250:50010 is replaced by DatanodeRegistration(127.0.0.1:50010, datanodeUuid=c997c317-a0c4-4932-a835-ac73a8cced42, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-50;cid=CID-23887b42-d391-4421-92d1-a2c1b639f6eb;nsid=911;c=1602792561278) with the same storageID c997c317-a0c4-4932-a835-ac73a8cced42
2020-10-15 21:10:14,145 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.250:50010
2020-10-15 21:10:14,146 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010
2020-10-15 21:10:14,175 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(192.168.100.250:50010, datanodeUuid=c997c317-a0c4-4932-a835-ac73a8cced42, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-50;cid=CID-23887b42-d391-4421-92d1-a2c1b639f6eb;nsid=911;c=1602792561278) is attempting to report storage ID c997c317-a0c4-4932-a835-ac73a8cced42. Node 127.0.0.1:50010 is expected to serve this storage.
2020-10-15 21:10:14,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 20 on 8020, call Call#5 Retry#0 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.getNextNamenodeToSendBlockReport from 127.0.0.1:37374
org.apache.hadoop.hdfs.protocol.UnregisteredNodeException: Data node DatanodeRegistration(192.168.100.250:50010, datanodeUuid=c997c317-a0c4-4932-a835-ac73a8cced42, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-50;cid=CID-23887b42-d391-4421-92d1-a2c1b639f6eb;nsid=911;c=1602792561278) is attempting to report storage ID c997c317-a0c4-4932-a835-ac73a8cced42. Node 127.0.0.1:50010 is expected to serve this storage.
        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:511)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.getNextNamenodeToSendBlockReport(NameNode.java:1337)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getNextNamenodeToSendBlockReport(NameNodeRpcServer.java:1256)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.getNextNamenodeToSendBlockReport(DatanodeProtocolServerSideTranslatorPB.java:332)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:35625)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:868)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:814)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1821)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2900)
2020-10-15 21:10:14,197 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0

That log begins repeating the “expected to serve this storage” exception.

Should I purge and start over or is it useful to troubleshoot from this point? From reading other messages it sounds like hitting retry in Karamel isn’t advised.

It is worth trying the following.

  1. sudo systemctl stop datanode
  2. sudo rm -rf /srv/hops/hopsdata/hdfs/dn/disk/*

Then retry the recipe from Karamel

The install had died and it wasn’t clear how to restart it where it left off. I wiped the system and loaded a fresh ubuntu 18.04. This time the install appears to fail in hadoop spark::yarn.

hadoop_spark__yarn.log:

[sudo] password for nbowyer: Starting Chef Client, version 14.10.9e[0m
resolving cookbooks for run list: ["hadoop_spark::yarn"]e[0m
Synchronizing Cookbooks:e[0m
  - hadoop_spark (1.4.0)e[0m
  - java (7.0.0)e[0m
  - magic_shell (1.0.0)e[0m
  - conda (1.4.0)e[0m
  - kagent (1.4.0)e[0m
  - ndb (1.4.0)e[0m
  - hops (1.4.0)e[0m
  - hive2 (1.4.0)e[0m
  - hopsmonitor (1.4.0)e[0m
  - homebrew (5.0.8)e[0m
  - windows (7.0.2)e[0m
  - ulimit (1.4.0)e[0m
  - openssl (4.4.0)e[0m
  - hostsfile (2.4.6)e[0m
  - ntp (2.0.3)e[0m
  - sudo (4.0.1)e[0m
  - consul (1.4.0)e[0m
  - sysctl (1.0.5)e[0m
  - cmake (0.3.0)e[0m
  - kzookeeper (1.4.0)e[0m
  - elastic (1.4.0)e[0m
  - compat_resource (12.19.1)e[0m
  - authbind (0.1.10)e[0m
  - tensorflow (1.4.0)e[0m
  - hops_airflow (1.4.0)e[0m
  - chef-sugar (5.1.8)e[0m
  - ohai (5.3.0)e[0m
  - ulimit2 (0.2.0)e[0m
  - elasticsearch (4.0.6)e[0m
  - build-essential (8.2.1)e[0m
  - zip (1.1.0)e[0m
  - apt (7.2.0)e[0m
  - poise-python (1.7.0)e[0m
  - yum (5.1.0)e[0m
  - ark (5.0.0)e[0m
  - seven_zip (3.1.2)e[0m
  - mingw (2.1.1)e[0m
  - poise (2.8.2)e[0m
  - poise-languages (2.1.2)e[0m
  - poise-archive (1.5.0)e[0m
Installing Cookbook Gems:e[0m
Compiling Cookbooks...e[0m
Converging 26 resourcese[0m
Recipe: hadoop_spark::yarne[0m
  * directory[/srv/hops/spark-2.4.3.2-bin-without-hadoop-with-hive-with-r/logs] action create (up to date)
  * template[/srv/hops/spark/conf/metrics.properties] action create (up to date)
  * template[/srv/hops/spark-2.4.3.2-bin-without-hadoop-with-hive-with-r/conf/hive-site.xml] action create (up to date)
  * template[/srv/hops/spark-2.4.3.2-bin-without-hadoop-with-hive-with-r/conf/spark-defaults.conf] action create (up to date)
  * hops_hdfs_directory[/user] action create_as_superuser
    * bash[mk-dir-/user] action run (skipped due to not_if)
     (up to date)
  * hops_hdfs_directory[/user/spark] action create_as_superuser
    * bash[mk-dir-/user/spark] action run (skipped due to not_if)
     (up to date)
  * hops_hdfs_directory[/user/spark/eventlog] action create_as_superuser
    * bash[mk-dir-/user/spark/eventlog] action run (skipped due to not_if)
     (up to date)
  * hops_hdfs_directory[/user/spark/spark-warehouse] action create_as_superuser
    * bash[mk-dir-/user/spark/spark-warehouse] action run (skipped due to not_if)
     (up to date)
  * hops_hdfs_directory[/user/spark/share/lib] action create_as_superuser
    * bash[mk-dir-/user/spark/share/lib] action run (skipped due to not_if)
     (up to date)
  * remote_file[/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] action create (up to date)
  * hops_hdfs_directory[/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] action replace_as_superuser
    * hops_hdfs_directory[/user/spark/hops-verification-assembly-1.4.0.jar] action rm_as_superuser
      * bash[rm-/user/spark/hops-verification-assembly-1.4.0.jar] action run (skipped due to only_if)
       (up to date)
    * hops_hdfs_directory[/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] action put_as_superuser
      * bash[hdfs-put-dir-/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] action run
        e[0m
        ================================================================================e[0m
        e[31mError executing action `run` on resource 'bash[hdfs-put-dir-/tmp/chef-solo/hops-verification-assembly-1.4.0.jar]'e[0m
        ================================================================================e[0m
        
e[0m        Mixlib::ShellOut::ShellCommandFailede[0m
        ------------------------------------e[0m
        Expected process to exit with [0], but received '1'
e[0m        ---- Begin output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
e[0m        STDOUT: 
e[0m        STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m        chown: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
e[0m        chgrp: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
e[0m        chmod: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
e[0m        ---- End output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
e[0m        Ran "bash"  "/tmp/chef-script20201021-21294-13cbpl4" returned 1e[0m
        
e[0m        Resource Declaration:e[0m
        ---------------------e[0m
        # In /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb
e[0m        
e[0m         53:   bash "hdfs-put-dir-#{new_resource.name}" do
e[0m         54:     user node['hops']['hdfs']['user']
e[0m         55:     group node['hops']['group']
e[0m         56:     code <<-EOF
e[0m         57:      EXISTS=1
e[0m         58:      . #{node['hops']['base_dir']}/sbin/set-env.sh
e[0m         59:      if [ -z $ISDIR ] ; then
e[0m         60:         #{node['hops']['base_dir']}/bin/hdfs dfs -test -e #{new_resource.dest}
e[0m         61:         EXISTS=$?
e[0m         62:      else
e[0m         63:         #{node['hops']['base_dir']}/bin/hdfs dfs -test -f #{new_resource.dest}
e[0m         64:         EXISTS=$?
e[0m         65:      fi
e[0m         66:      if ([ $EXISTS -ne 0 ] || [ #{new_resource.isDir} ]) ; then
e[0m         67:         #{node['hops']['base_dir']}/bin/hdfs dfs -copyFromLocal #{new_resource.name} #{new_resource.dest}
e[0m         68:         #{node['hops']['base_dir']}/bin/hdfs dfs -chown #{new_resource.owner} #{new_resource.dest}
e[0m         69:         #{node['hops']['base_dir']}/bin/hdfs dfs -chgrp #{new_resource.group} #{new_resource.dest}
e[0m         70:         if [ "#{new_resource.mode}" != "" ] ; then
e[0m         71:            #{node['hops']['base_dir']}/bin/hadoop fs -chmod #{new_resource.mode} #{new_resource.dest}
e[0m         72:         fi
e[0m         73:      fi
e[0m         74:     EOF
e[0m         75:   end
e[0m         76: end
e[0m        
e[0m        Compiled Resource:e[0m
        ------------------e[0m
        # Declared in /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb:53:in `block in class_from_file'
e[0m        
e[0m        bash("hdfs-put-dir-/tmp/chef-solo/hops-verification-assembly-1.4.0.jar") do
e[0m          action [:run]
e[0m          default_guard_interpreter :default
e[0m          command nil
e[0m          backup 5
e[0m          interpreter "bash"
e[0m          declared_type :bash
e[0m          cookbook_name "hadoop_spark"
e[0m          user "hdfs"
e[0m          group "hadoop"
e[0m          code "     EXISTS=1\n     . /srv/hops/hadoop/sbin/set-env.sh\n     if [ -z $ISDIR ] ; then\n        /srv/hops/hadoop/bin/hdfs dfs -test -e /user/spark/hops-verification-assembly-1.4.0.jar\n        EXISTS=$?\n     else\n        /srv/hops/hadoop/bin/hdfs dfs -test -f /user/spark/hops-verification-assembly-1.4.0.jar\n        EXISTS=$?\n     fi\n     if ([ $EXISTS -ne 0 ] || [ false ]) ; then\n        /srv/hops/hadoop/bin/hdfs dfs -copyFromLocal /tmp/chef-solo/hops-verification-assembly-1.4.0.jar /user/spark/hops-verification-assembly-1.4.0.jar\n        /srv/hops/hadoop/bin/hdfs dfs -chown spark /user/spark/hops-verification-assembly-1.4.0.jar\n        /srv/hops/hadoop/bin/hdfs dfs -chgrp hadoop /user/spark/hops-verification-assembly-1.4.0.jar\n        if [ \"1755\" != \"\" ] ; then\n           /srv/hops/hadoop/bin/hadoop fs -chmod 1755 /user/spark/hops-verification-assembly-1.4.0.jar\n        fi\n     fi\n"
e[0m          domain nil
e[0m        end
e[0m        
e[0m        System Info:e[0m
        ------------e[0m
        chef_version=14.10.9
e[0m        platform=ubuntu
e[0m        platform_version=18.04
e[0m        ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
e[0m        program_name=/usr/bin/chef-solo
e[0m        executable=/opt/chefdk/bin/chef-soloe[0m
        
e[0m      e[0m
      ================================================================================e[0m
      e[31mError executing action `put_as_superuser` on resource 'hops_hdfs_directory[/tmp/chef-solo/hops-verification-assembly-1.4.0.jar]'e[0m
      ================================================================================e[0m
      
e[0m      Mixlib::ShellOut::ShellCommandFailede[0m
      ------------------------------------e[0m
      bash[hdfs-put-dir-/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
e[0m      ---- Begin output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
e[0m      STDOUT: 
e[0m      STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m      chown: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
e[0m      chgrp: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
e[0m      chmod: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
e[0m      ---- End output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
e[0m      Ran "bash"  "/tmp/chef-script20201021-21294-13cbpl4" returned 1e[0m
      
e[0m      Resource Declaration:e[0m
      ---------------------e[0m
      # In /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb
e[0m      
e[0m      143:   hops_hdfs_directory "#{new_resource.name}" do
e[0m      144:     owner "#{new_resource.owner}"
e[0m      145:     group "#{new_resource.group}"
e[0m      146:     mode "#{new_resource.mode}"
e[0m      147:     dest "#{new_resource.dest}"
e[0m      148:     action :put_as_superuser
e[0m      149:   end
e[0m      150: 
e[0m      
e[0m      Compiled Resource:e[0m
      ------------------e[0m
      # Declared in /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb:143:in `block in class_from_file'
e[0m      
e[0m      hops_hdfs_directory("/tmp/chef-solo/hops-verification-assembly-1.4.0.jar") do
e[0m        action [:put_as_superuser]
e[0m        default_guard_interpreter :default
e[0m        declared_type :hops_hdfs_directory
e[0m        cookbook_name "hadoop_spark"
e[0m        owner "spark"
e[0m        group "hadoop"
e[0m        mode "1755"
e[0m        dest "/user/spark/hops-verification-assembly-1.4.0.jar"
e[0m      end
e[0m      
e[0m      System Info:e[0m
      ------------e[0m
      chef_version=14.10.9
e[0m      platform=ubuntu
e[0m      platform_version=18.04
e[0m      ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
e[0m      program_name=/usr/bin/chef-solo
e[0m      executable=/opt/chefdk/bin/chef-soloe[0m
      
e[0m    e[0m
    ================================================================================e[0m
    e[31mError executing action `replace_as_superuser` on resource 'hops_hdfs_directory[/tmp/chef-solo/hops-verification-assembly-1.4.0.jar]'e[0m
    ================================================================================e[0m
    
e[0m    Mixlib::ShellOut::ShellCommandFailede[0m
    ------------------------------------e[0m
    hops_hdfs_directory[/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 143) had an error: Mixlib::ShellOut::ShellCommandFailed: bash[hdfs-put-dir-/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
e[0m    ---- Begin output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
e[0m    STDOUT: 
e[0m    STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
e[0m    chown: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
e[0m    chgrp: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
e[0m    chmod: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
e[0m    ---- End output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
e[0m    Ran "bash"  "/tmp/chef-script20201021-21294-13cbpl4" returned 1e[0m
    
e[0m    Resource Declaration:e[0m
    ---------------------e[0m
    # In /tmp/chef-solo/cookbooks/hadoop_spark/recipes/yarn.rb
e[0m    
e[0m    110:   hops_hdfs_directory "#{Chef::Config['file_cache_path']}/#{hopsVerification}" do
e[0m    111:     owner node['hadoop_spark']['user']
e[0m    112:     group node['hops']['group']
e[0m    113:     mode "1755"
e[0m    114:     dest "/user/#{node['hadoop_spark']['user']}/#{hopsVerification}"
e[0m    115:     action :replace_as_superuser
e[0m    116:   end
e[0m    117: 
e[0m    
e[0m    Compiled Resource:e[0m
    ------------------e[0m
    # Declared in /tmp/chef-solo/cookbooks/hadoop_spark/recipes/yarn.rb:110:in `from_file'
e[0m    
e[0m    hops_hdfs_directory("/tmp/chef-solo/hops-verification-assembly-1.4.0.jar") do
e[0m      action [:replace_as_superuser]
e[0m      default_guard_interpreter :default
e[0m      declared_type :hops_hdfs_directory
e[0m      cookbook_name "hadoop_spark"
e[0m      recipe_name "yarn"
e[0m      owner "spark"
e[0m      group "hadoop"
e[0m      mode "1755"
e[0m      dest "/user/spark/hops-verification-assembly-1.4.0.jar"
e[0m    end
e[0m    
e[0m    System Info:e[0m
    ------------e[0m
    chef_version=14.10.9
e[0m    platform=ubuntu
e[0m    platform_version=18.04
e[0m    ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
e[0m    program_name=/usr/bin/chef-solo
e[0m    executable=/opt/chefdk/bin/chef-soloe[0m
    
e[0me[0m
Running handlers:e[0m
[2020-10-21T23:58:07+00:00] ERROR: Running exception handlers
Running handlers complete
e[0m[2020-10-21T23:58:07+00:00] ERROR: Exception handlers complete
Chef Client failed. 0 resources updated in 07 minutes 24 secondse[0m
[2020-10-21T23:58:07+00:00] FATAL: Stacktrace dumped to /tmp/chef-solo/chef-stacktrace.out
[2020-10-21T23:58:07+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2020-10-21T23:58:07+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: hops_hdfs_directory[/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] (hadoop_spark::yarn line 110) had an error: Mixlib::ShellOut::ShellCommandFailed: hops_hdfs_directory[/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 143) had an error: Mixlib::ShellOut::ShellCommandFailed: bash[hdfs-put-dir-/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
STDOUT: 
STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
chown: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
chgrp: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
chmod: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
---- End output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
Ran "bash"  "/tmp/chef-script20201021-21294-13cbpl4" returned 1

chef-stacktrace.out:

Generated at 2020-10-21 23:58:07 +0000
Mixlib::ShellOut::ShellCommandFailed: hops_hdfs_directory[/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] (hadoop_spark::yarn line 110) had an error: Mixlib::ShellOut::ShellCommandFailed: hops_hdfs_directory[/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 143) had an error: Mixlib::ShellOut::ShellCommandFailed: bash[hdfs-put-dir-/tmp/chef-solo/hops-verification-assembly-1.4.0.jar] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
STDOUT: 
STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
chown: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
chgrp: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
chmod: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
---- End output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
Ran "bash"  "/tmp/chef-script20201021-21294-13cbpl4" returned 1
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:297:in `invalid!'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:284:in `error!'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/mixin/shell_out.rb:202:in `shell_out_compacted!'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/mixin/shell_out.rb:124:in `shell_out!'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider/execute.rb:58:in `block in action_run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/mixin/why_run.rb:51:in `add_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:227:in `converge_by'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider/execute.rb:56:in `action_run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider/script.rb:64:in `action_run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:182:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource.rb:578:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:70:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block (2 levels) in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `each'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:94:in `block in execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:114:in `call_iterator_block'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:103:in `iterate'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:92:in `execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:97:in `converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:237:in `compile_and_converge_action'
(eval):2:in `action_put_as_superuser'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:182:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource.rb:578:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:70:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block (2 levels) in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `each'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:94:in `block in execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:114:in `call_iterator_block'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:103:in `iterate'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:92:in `execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:97:in `converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:237:in `compile_and_converge_action'
(eval):2:in `action_replace_as_superuser'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:182:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource.rb:578:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:70:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block (2 levels) in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `each'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:94:in `block in execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:114:in `call_iterator_block'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:103:in `iterate'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:92:in `execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:97:in `converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/client.rb:720:in `block in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/client.rb:715:in `catch'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/client.rb:715:in `converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/client.rb:754:in `converge_and_save'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/client.rb:286:in `run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application.rb:303:in `run_with_graceful_exit_option'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application.rb:279:in `block in run_chef_client'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/local_mode.rb:44:in `with_server_connectivity'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application.rb:261:in `run_chef_client'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application/client.rb:444:in `run_application'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application.rb:66:in `run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application/solo.rb:224:in `run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/bin/chef-solo:24:in `<top (required)>'
/usr/bin/chef-solo:306:in `load'
/usr/bin/chef-solo:306:in `<main>'

>>>> Caused by Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
STDOUT: 
STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
chown: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
chgrp: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
chmod: `/user/spark/hops-verification-assembly-1.4.0.jar': No such file or directory
---- End output of "bash"  "/tmp/chef-script20201021-21294-13cbpl4" ----
Ran "bash"  "/tmp/chef-script20201021-21294-13cbpl4" returned 1
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:297:in `invalid!'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:284:in `error!'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/mixin/shell_out.rb:202:in `shell_out_compacted!'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/mixin/shell_out.rb:124:in `shell_out!'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider/execute.rb:58:in `block in action_run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/mixin/why_run.rb:51:in `add_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:227:in `converge_by'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider/execute.rb:56:in `action_run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider/script.rb:64:in `action_run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:182:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource.rb:578:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:70:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block (2 levels) in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `each'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:94:in `block in execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:114:in `call_iterator_block'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:103:in `iterate'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:92:in `execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:97:in `converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:237:in `compile_and_converge_action'
(eval):2:in `action_put_as_superuser'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:182:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource.rb:578:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:70:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block (2 levels) in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `each'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:94:in `block in execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:114:in `call_iterator_block'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:103:in `iterate'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:92:in `execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:97:in `converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:237:in `compile_and_converge_action'
(eval):2:in `action_replace_as_superuser'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/provider.rb:182:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource.rb:578:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:70:in `run_action'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block (2 levels) in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `each'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:98:in `block in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:94:in `block in execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:114:in `call_iterator_block'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:103:in `iterate'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/resource_collection/resource_list.rb:92:in `execute_each_resource'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/runner.rb:97:in `converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/client.rb:720:in `block in converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/client.rb:715:in `catch'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/client.rb:715:in `converge'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/client.rb:754:in `converge_and_save'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/client.rb:286:in `run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application.rb:303:in `run_with_graceful_exit_option'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application.rb:279:in `block in run_chef_client'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/local_mode.rb:44:in `with_server_connectivity'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application.rb:261:in `run_chef_client'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application/client.rb:444:in `run_application'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application.rb:66:in `run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/application/solo.rb:224:in `run'
/opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/bin/chef-solo:24:in `<top (required)>'
/usr/bin/chef-solo:306:in `load'
/usr/bin/chef-solo:306:in `<main>'

I’m afraid you have a problem with the Namenode again. Can you check the logs again?

What’s your machine’s specifications, RAM and CPU?

I will try to reproduce your problem.

I run the installer script on my own and it finished correctly. Could you make sure you give enough resources to your VM

This is a bare metal install, not currently using a VM.

Intel® Xeon® CPU E3-1230 V2 @ 3.30GHz
32gb RAM with 8gb swap partition
250Gb HD

The relevant bits of the datanode and namenode files seem to be:

datanode:

2020-10-21 23:42:23,637 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = fstore/127.0.1.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 3.2.0.0-RC4
<TRIMMED PATH FOR MESSAGE LENGTH>
STARTUP_MSG:   build = git@github.com:hopshadoop/hops.git -r 5e3672f34a246afc247b6b89176465874b9dd48e; compiled by 'jenkins' on 2020-10-02T09:51Z
STARTUP_MSG:   java = 1.8.0_265
************************************************************/
2020-10-21 23:42:23,643 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
2020-10-21 23:42:24,034 WARN org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-datanode.properties,hadoop-metrics2.properties
2020-10-21 23:42:24,050 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2020-10-21 23:42:24,051 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2020-10-21 23:42:24,055 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Disabled block scanner.
2020-10-21 23:42:24,056 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is fstore
2020-10-21 23:42:24,060 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 0
2020-10-21 23:42:24,077 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /0.0.0.0:50010
2020-10-21 23:42:24,078 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2020-10-21 23:42:24,079 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50
2020-10-21 23:42:24,110 INFO org.eclipse.jetty.util.log: Logging initialized @1236ms
2020-10-21 23:42:24,191 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2020-10-21 23:42:24,194 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.datanode is not defined
2020-10-21 23:42:24,202 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2020-10-21 23:42:24,204 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
2020-10-21 23:42:24,204 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2020-10-21 23:42:24,204 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2020-10-21 23:42:24,222 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 39311
2020-10-21 23:42:24,223 INFO org.eclipse.jetty.server.Server: jetty-9.3.24.v20180605, build timestamp: 2018-06-05T17:11:56Z, git hash: 84205aa28f11a4f31f2a3b86d1bba2cc8ab69827
2020-10-21 23:42:24,279 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@1e81f160{/logs,file:///srv/hops/hadoop-3.2.0.0-RC4/logs/,AVAILABLE}
2020-10-21 23:42:24,279 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@6986852{/static,file:///srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/hdfs/webapps/static/,AVAILABLE}
2020-10-21 23:42:24,340 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.w.WebAppContext@16eccb2e{/,file:///srv/hops/hadoop-3.2.0.0-RC4/share/hadoop/hdfs/webapps/datanode/,AVAILABLE}{/datanode}
2020-10-21 23:42:24,343 INFO org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@7d446ed1{HTTP/1.1,[http/1.1]}{localhost:39311}
2020-10-21 23:42:24,343 INFO org.eclipse.jetty.server.Server: Started @1470ms
2020-10-21 23:42:24,669 INFO org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Listening HTTP traffic on /0.0.0.0:50075
2020-10-21 23:42:24,673 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
2020-10-21 23:42:24,700 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hdfs
2020-10-21 23:42:24,700 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = hdfs
2020-10-21 23:42:24,740 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 1000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2020-10-21 23:42:24,757 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2020-10-21 23:42:24,757 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #2 for port 50020
2020-10-21 23:42:24,757 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #3 for port 50020
2020-10-21 23:42:24,902 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:50020
2020-10-21 23:42:25,084 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to /192.168.100.249:8020 starting to offer service
2020-10-21 23:42:25,098 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2020-10-21 23:42:25,099 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2020-10-21 23:42:25,317 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /srv/hops/hopsdata/hdfs/dn/in_use.lock acquired by nodename 12720@fstore
2020-10-21 23:42:25,319 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /srv/hops/hopsdata/hdfs/dn is not formatted for BP-1218611357-127.0.1.1-1603319801320
2020-10-21 23:42:25,319 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2020-10-21 23:42:25,406 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-1218611357-127.0.1.1-1603319801320
2020-10-21 23:42:25,406 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /srv/hops/hopsdata/hdfs/dn/current/BP-1218611357-127.0.1.1-1603319801320
2020-10-21 23:42:25,407 INFO org.apache.hadoop.hdfs.server.common.Storage: Block pool storage directory /srv/hops/hopsdata/hdfs/dn/current/BP-1218611357-127.0.1.1-1603319801320 is not formatted for BP-1218611357-127.0.1.1-1603319801320
2020-10-21 23:42:25,407 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2020-10-21 23:42:25,407 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-1218611357-127.0.1.1-1603319801320 directory /srv/hops/hopsdata/hdfs/dn/current/BP-1218611357-127.0.1.1-1603319801320/current
2020-10-21 23:42:25,452 INFO org.apache.hadoop.hdfs.server.common.Storage: Restored 0 block files from trash.
2020-10-21 23:42:25,477 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: nsid=911;bpid=BP-1218611357-127.0.1.1-1603319801320;lv=-50;nsInfo=lv=-55;cid=CID-0dcfe208-8599-4cd4-b816-a48677cb81d3;nsid=911;c=1603319801273;bpid=BP-1218611357-127.0.1.1-1603319801320
2020-10-21 23:42:25,519 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Generated and persisted new Datanode UUID 3dce914b-0353-4f3f-bd39-3c955847cc2b
2020-10-21 23:42:25,535 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-1bceeccc-5353-4aba-8d9e-e01ceb743646
2020-10-21 23:42:25,535 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /srv/hops/hopsdata/hdfs/dn/current, StorageType: DISK
2020-10-21 23:42:25,537 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered FSDatasetState MBean
2020-10-21 23:42:25,543 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Periodic Directory Tree Verification scan starting at 1603325950543 with interval 21600000
2020-10-21 23:42:25,543 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding block pool BP-1218611357-127.0.1.1-1603319801320
2020-10-21 23:42:25,543 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1218611357-127.0.1.1-1603319801320 on volume /srv/hops/hopsdata/hdfs/dn/current...
2020-10-21 23:42:25,576 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-1218611357-127.0.1.1-1603319801320 on /srv/hops/hopsdata/hdfs/dn/current: 33ms
2020-10-21 23:42:25,576 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to scan all replicas for block pool BP-1218611357-127.0.1.1-1603319801320: 33ms
2020-10-21 23:42:25,577 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-1218611357-127.0.1.1-1603319801320 on volume /srv/hops/hopsdata/hdfs/dn/current...
2020-10-21 23:42:25,577 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice: Replica Cache file: /srv/hops/hopsdata/hdfs/dn/current/BP-1218611357-127.0.1.1-1603319801320/current/replicas doesn't exist 
2020-10-21 23:42:25,577 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1218611357-127.0.1.1-1603319801320 on volume /srv/hops/hopsdata/hdfs/dn/current: 0ms
2020-10-21 23:42:25,577 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to add all replicas to map: 1ms
2020-10-21 23:42:25,630 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block pool BP-1218611357-127.0.1.1-1603319801320 (Datanode Uuid 3dce914b-0353-4f3f-bd39-3c955847cc2b) service to /192.168.100.249:8020 successfully registered with NN
2020-10-21 23:42:25,640 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block Report skipped as other BPServiceActors are connected to the namenodes 
2020-10-21 23:42:25,640 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1218611357-127.0.1.1-1603319801320 (Datanode Uuid 3dce914b-0353-4f3f-bd39-3c955847cc2b) service to 127.0.1.1/127.0.1.1:8020 starting to offer service
2020-10-21 23:42:25,640 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1218611357-127.0.1.1-1603319801320 (Datanode Uuid 3dce914b-0353-4f3f-bd39-3c955847cc2b) service to /192.168.100.249:8020
2020-10-21 23:42:25,647 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block pool BP-1218611357-127.0.1.1-1603319801320 (Datanode Uuid 3dce914b-0353-4f3f-bd39-3c955847cc2b) service to 127.0.1.1/127.0.1.1:8020 successfully registered with NN
2020-10-21 23:42:25,647 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode 127.0.1.1/127.0.1.1:8020 using BLOCKREPORT_INTERVAL of 3600000msec CACHEREPORT_INTERVAL of 10000msec Initial delay: 0msec; heartBeatInterval=3000
2020-10-21 23:42:25,678 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in whirlingLikeASufi
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(192.168.100.249:50010, datanodeUuid=3dce914b-0353-4f3f-bd39-3c955847cc2b, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-50;cid=CID-0dcfe208-8599-4cd4-b816-a48677cb81d3;nsid=911;c=1603319801273) is attempting to report storage ID 3dce914b-0353-4f3f-bd39-3c955847cc2b. Node 127.0.0.1:50010 is expected to serve this storage.
	at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:511)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.getNextNamenodeToSendBlockReport(NameNode.java:1337)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getNextNamenodeToSendBlockReport(NameNodeRpcServer.java:1256)
	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.getNextNamenodeToSendBlockReport(DatanodeProtocolServerSideTranslatorPB.java:332)
	at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:35625)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:868)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:814)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1821)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2900)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1523)
	at org.apache.hadoop.ipc.Client.call(Client.java:1469)
	at org.apache.hadoop.ipc.Client.call(Client.java:1379)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
	at com.sun.proxy.$Proxy18.getNextNamenodeToSendBlockReport(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.getNextNamenodeToSendBlockReport(DatanodeProtocolClientSideTranslatorPB.java:366)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.nextNNForBlkReport(BPServiceActor.java:648)
	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.nextNNForBlkReport(BPOfferService.java:1448)
	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.blockReport(BPOfferService.java:1007)
	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.whirlingLikeASufi(BPOfferService.java:808)
	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.run(BPOfferService.java:1284)
	at java.lang.Thread.run(Thread.java:748)
2020-10-21 23:42:26,692 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in whirlingLikeASufi
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(192.168.100.249:50010, datanodeUuid=3dce914b-0353-4f3f-bd39-3c955847cc2b, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-50;cid=CID-0dcfe208-8599-4cd4-b816-a48677cb81d3;nsid=911;c=1603319801273) is attempting to report storage ID 3dce914b-0353-4f3f-bd39-3c955847cc2b. Node 127.0.0.1:50010 is expected to serve this storage.
	at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:511)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.getNextNamenodeToSendBlockReport(NameNode.java:1337)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getNextNamenodeToSendBlockReport(NameNodeRpcServer.java:1256)
	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.getNextNamenodeToSendBlockReport(DatanodeProtocolServerSideTranslatorPB.java:332)
	at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:35625)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:868)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:814)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1821)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2900)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1523)
	at org.apache.hadoop.ipc.Client.call(Client.java:1469)
	at org.apache.hadoop.ipc.Client.call(Client.java:1379)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
	at com.sun.proxy.$Proxy18.getNextNamenodeToSendBlockReport(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.getNextNamenodeToSendBlockReport(DatanodeProtocolClientSideTranslatorPB.java:366)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.nextNNForBlkReport(BPServiceActor.java:648)
	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.nextNNForBlkReport(BPOfferService.java:1448)
	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.blockReport(BPOfferService.java:1007)
	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.whirlingLikeASufi(BPOfferService.java:808)
	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.run(BPOfferService.java:1284)
	at java.lang.Thread.run(Thread.java:748)

I couldn’t get enough of the namenode log to fit and still seem to capture the relevant context. I put the contents into a gist at https://gist.github.com/bowyern/87ddd5529a978019f292e0b63e1d810f

I tried on a VM on different physical hardware: 8 cores, 32gb ram, 150gb SSD, ubuntu 18.04. The install failed near hive2:default. The namenode/datanode logs look similar to the bare metal install I attempted earlier. I followed the exact same procedure installing the guest OS and installing hopsworks.

I was watching memory and CPU usage on the host (Proxmox), and neither appeared depleted around the time the install failed.

What guest OS and VM specs are you using on your successful install? Are you setting up passwordless sudo or providing the install script a password via -pwd?

I destroyed that vm and tried again with the same specs, but I setup passwordless sudo this time. It failed in the same place.

Last part of install.log:

Recipe: hive2::tez
  * group[hadoop] action create (skipped due to not_if)
  * linux_user[tez] action create (skipped due to not_if)
  * group[hadoop] action modify (up to date)
  * remote_file[/tmp/apache-tez-0.9.1.3.tar.gz] action create_if_missing (up to date)
  * bash[extract-tez] action run (skipped due to not_if)
  * hops_hdfs_directory[/apps/tez] action create_as_superuser (skipped due to not_if)
  * hops_hdfs_directory[/tmp/apache-tez-0.9.1.3.tar.gz] action put_as_superuser
    * bash[hdfs-put-dir-/tmp/apache-tez-0.9.1.3.tar.gz] action run

      ================================================================================
      Error executing action `run` on resource 'bash[hdfs-put-dir-/tmp/apache-tez-0.9.1.3.tar.gz]'
      ================================================================================

      Mixlib::ShellOut::ShellCommandFailed
      ------------------------------------
      Expected process to exit with [0], but received '1'
      ---- Begin output of "bash"  "/tmp/chef-script20201025-11364-lk9d98" ----
      STDOUT:
      STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
      chown: `/apps/tez/apache-tez-0.9.1.3.tar.gz': No such file or directory
      chgrp: `/apps/tez/apache-tez-0.9.1.3.tar.gz': No such file or directory
      chmod: `/apps/tez/apache-tez-0.9.1.3.tar.gz': No such file or directory
      ---- End output of "bash"  "/tmp/chef-script20201025-11364-lk9d98" ----
      Ran "bash"  "/tmp/chef-script20201025-11364-lk9d98" returned 1

      Resource Declaration:
      ---------------------
      # In /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb

       53:   bash "hdfs-put-dir-#{new_resource.name}" do
       54:     user node['hops']['hdfs']['user']
       55:     group node['hops']['group']
       56:     code <<-EOF
       57:      EXISTS=1
       58:      . #{node['hops']['base_dir']}/sbin/set-env.sh
       59:      if [ -z $ISDIR ] ; then
       60:         #{node['hops']['base_dir']}/bin/hdfs dfs -test -e #{new_resource.dest}
       61:         EXISTS=$?
       62:      else
       63:         #{node['hops']['base_dir']}/bin/hdfs dfs -test -f #{new_resource.dest}
       64:         EXISTS=$?
       65:      fi
       66:      if ([ $EXISTS -ne 0 ] || [ #{new_resource.isDir} ]) ; then
       67:         #{node['hops']['base_dir']}/bin/hdfs dfs -copyFromLocal #{new_resource.name} #{new_resource.dest}
       68:         #{node['hops']['base_dir']}/bin/hdfs dfs -chown #{new_resource.owner} #{new_resource.dest}
       69:         #{node['hops']['base_dir']}/bin/hdfs dfs -chgrp #{new_resource.group} #{new_resource.dest}
       70:         if [ "#{new_resource.mode}" != "" ] ; then
       71:            #{node['hops']['base_dir']}/bin/hadoop fs -chmod #{new_resource.mode} #{new_resource.dest}
       72:         fi
       73:      fi
       74:     EOF
       75:   end
       76: end

      Compiled Resource:
      ------------------
      # Declared in /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb:53:in `block in class_from_file'

      bash("hdfs-put-dir-/tmp/apache-tez-0.9.1.3.tar.gz") do
        action [:run]
        default_guard_interpreter :default
        command nil
        backup 5
        interpreter "bash"
        declared_type :bash
        cookbook_name "hive2"
        user "hdfs"
        group "hadoop"
        code "     EXISTS=1\n     . /srv/hops/hadoop/sbin/set-env.sh\n     if [ -z $ISDIR ] ; then\n        /srv/hops/hadoop/bin/hdfs dfs -test -e /apps/tez/apache-tez-0.9.1.3.tar.gz\n        EXISTS=$?\n     else\n        /srv/hops/hadoop/bin/hdfs dfs -test -f /apps/tez/apache-tez-0.9.1.3.tar.gz\n        EXISTS=$?\n     fi\n     if ([ $EXISTS -ne 0 ] || [ false ]) ; then\n        /srv/hops/hadoop/bin/hdfs dfs -copyFromLocal /tmp/apache-tez-0.9.1.3.tar.gz /apps/tez/apache-tez-0.9.1.3.tar.gz\n        /srv/hops/hadoop/bin/hdfs dfs -chown hdfs /apps/tez/apache-tez-0.9.1.3.tar.gz\n        /srv/hops/hadoop/bin/hdfs dfs -chgrp hadoop /apps/tez/apache-tez-0.9.1.3.tar.gz\n        if [ \"775\" != \"\" ] ; then\n           /srv/hops/hadoop/bin/hadoop fs -chmod 775 /apps/tez/apache-tez-0.9.1.3.tar.gz\n        fi\n     fi\n"
        domain nil
      end

      System Info:
      ------------
      chef_version=14.10.9
      platform=ubuntu
      platform_version=18.04
      ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
      program_name=/usr/bin/chef-solo
      executable=/opt/chefdk/bin/chef-solo


    ================================================================================
    Error executing action `put_as_superuser` on resource 'hops_hdfs_directory[/tmp/apache-tez-0.9.1.3.tar.gz]'
    ================================================================================

    Mixlib::ShellOut::ShellCommandFailed
    ------------------------------------
    bash[hdfs-put-dir-/tmp/apache-tez-0.9.1.3.tar.gz] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
    ---- Begin output of "bash"  "/tmp/chef-script20201025-11364-lk9d98" ----
    STDOUT:
    STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
    chown: `/apps/tez/apache-tez-0.9.1.3.tar.gz': No such file or directory
    chgrp: `/apps/tez/apache-tez-0.9.1.3.tar.gz': No such file or directory
    chmod: `/apps/tez/apache-tez-0.9.1.3.tar.gz': No such file or directory
    ---- End output of "bash"  "/tmp/chef-script20201025-11364-lk9d98" ----
    Ran "bash"  "/tmp/chef-script20201025-11364-lk9d98" returned 1

    Resource Declaration:
    ---------------------
    # In /tmp/chef-solo/cookbooks/hive2/recipes/tez.rb

     66: hops_hdfs_directory cached_package_filename do
     67:   action :put_as_superuser
     68:   owner node['hops']['hdfs']['user']
     69:   group node['hops']['group']
     70:   dest "#{node['tez']['hopsfs_dir']}/#{base_package_filename}"
     71:   mode "775"
     72: end
     73:

    Compiled Resource:
    ------------------
    # Declared in /tmp/chef-solo/cookbooks/hive2/recipes/tez.rb:66:in `from_file'

    hops_hdfs_directory("/tmp/apache-tez-0.9.1.3.tar.gz") do
      action [:put_as_superuser]
      default_guard_interpreter :default
      declared_type :hops_hdfs_directory
      cookbook_name "hive2"
      recipe_name "tez"
      owner "hdfs"
      group "hadoop"
      mode "775"
      dest "/apps/tez/apache-tez-0.9.1.3.tar.gz"
    end

    System Info:
    ------------
    chef_version=14.10.9
    platform=ubuntu
    platform_version=18.04
    ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
    program_name=/usr/bin/chef-solo
    executable=/opt/chefdk/bin/chef-solo


Running handlers:
[2020-10-25T05:43:04+00:00] ERROR: Running exception handlers
Running handlers complete
[2020-10-25T05:43:04+00:00] ERROR: Exception handlers complete
Chef Client failed. 11 resources updated in 07 minutes 24 seconds
[2020-10-25T05:43:04+00:00] FATAL: Stacktrace dumped to /tmp/chef-solo/chef-stacktrace.out
[2020-10-25T05:43:04+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2020-10-25T05:43:04+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: hops_hdfs_directory[/tmp/apache-tez-0.9.1.3.tar.gz] (hive2::tez line 66) had an error: Mixlib::ShellOut::ShellCommandFailed: bash[hdfs-put-dir-/tmp/apache-tez-0.9.1.3.tar.gz] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of "bash"  "/tmp/chef-script20201025-11364-lk9d98" ----
STDOUT:
STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas.
chown: `/apps/tez/apache-tez-0.9.1.3.tar.gz': No such file or directory
chgrp: `/apps/tez/apache-tez-0.9.1.3.tar.gz': No such file or directory
chmod: `/apps/tez/apache-tez-0.9.1.3.tar.gz': No such file or directory
---- End output of "bash"  "/tmp/chef-script20201025-11364-lk9d98" ----
Ran "bash"  "/tmp/chef-script20201025-11364-lk9d98" returned 1

ERROR [2020-10-25 05:43:08,846] se.kth.karamel.backend.machines.SshMachine: -------------------------------------------------------------------------------

ERROR [2020-10-25 05:43:08,846] se.kth.karamel.backend.machines.SshMachine: End Log for Failed: 'hive2::default' '192.168.1.122'
ERROR [2020-10-25 05:43:08,846] se.kth.karamel.backend.machines.SshMachine: -------------------------------------------------------------------------------

INFO  [2020-10-25 05:43:12,349] se.kth.karamel.backend.machines.MachinesMonitor: Sending pause signal to all machines

Hi.

That’s really strange, especially the fact that you’re getting this error consistently. The VM I used was 8 vCores and 32GB of RAM and the OS was Ubuntu 18.04, same as yours. Using passwordless sudo account it doesn’t matter at this stage.

I would suspect there is some garbage left behind from previous installations but you said you tried on a fresh clean machine.

My next suspicion is your hostname settings. Can you paste or send me in private the content of /etc/hosts ?

@antonios pointed out that I should have my hostname resolve to my private IP address. By default ubuntu points the hostname to 127.0.0.1 on an install. When I changed that to 192.168.100.250 I was able to complete installation.

Huge thanks to @antonios for patiently working through this with me.

1 Like