I attempted to use the Hopsworks installer on a new Ubuntu 18.04 machine doing a single-host installation. The installation failed on the tensorflow step.
Install command:
./hopsworks-installer.sh -pwd changeme
Options Selected:
(1) Install a single-host Hopsworks cluster.
(1) On-premises or private cloud.
tensorflow__install.log:
[sudo] password for nbowyer: Starting Chef Client, version 14.10.9e[0m resolving cookbooks for run list: ["tensorflow::install"]e[0m Synchronizing Cookbooks:e[0m - tensorflow (1.4.0)e[0m - java (7.0.0)e[0m - magic_shell (1.0.0)e[0m - build-essential (8.2.1)e[0m - zip (1.1.0)e[0m - apt (7.2.0)e[0m - homebrew (5.0.8)e[0m - kagent (1.4.0)e[0m - ndb (1.4.0)e[0m - hops (1.4.0)e[0m - conda (1.4.0)e[0m - windows (7.0.2)e[0m - seven_zip (3.1.2)e[0m - openssl (4.4.0)e[0m - mingw (2.1.1)e[0m - hostsfile (2.4.6)e[0m - ntp (2.0.3)e[0m - sudo (4.0.1)e[0m - consul (1.4.0)e[0m - ulimit (1.4.0)e[0m - sysctl (1.0.5)e[0m - cmake (0.3.0)e[0m - kzookeeper (1.4.0)e[0m - elastic (1.4.0)e[0m - chef-sugar (5.1.8)e[0m - ohai (5.3.0)e[0m - ulimit2 (0.2.0)e[0m - elasticsearch (4.0.6)e[0m - yum (5.1.0)e[0m - ark (5.0.0)e[0m Installing Cookbook Gems:e[0m Compiling Cookbooks...e[0m Converging 9 resourcese[0m Recipe: tensorflow::installe[0m * apt_package[pkg-config, zip, g++, zlib1g-dev, unzip, swig, git, build-essential, cmake, unzip, libopenblas-dev, liblapack-dev, linux-image-4.15.0-121-generic, linux-headers-4.15.0-121-generic, python2.7, python2.7-numpy, python2.7-dev, python-pip, python2.7-lxml, python-pillow, libcupti-dev, libcurl3-dev, python-wheel, python-six, pciutils] action install e[32m- install version 0.29.1-0ubuntu2 of package pkg-confige[0m e[32m- install version 3.0-11build1 of package zipe[0m e[32m- install version 1:1.2.11.dfsg-0ubuntu2 of package zlib1g-deve[0m e[32m- install version 3.0.12-1 of package swige[0m e[32m- install version 3.10.2-1ubuntu2.18.04.1 of package cmakee[0m e[32m- install version 0.2.20+ds-4 of package libopenblas-deve[0m e[32m- install version 3.7.1-4ubuntu1 of package liblapack-deve[0m e[32m- install version 1:1.13.3-2ubuntu1 of package python2.7-numpye[0m e[32m- install version 9.0.1-2.3~ubuntu1.18.04.3 of package python-pipe[0m e[32m- install version 4.2.1-1ubuntu0.1 of package python2.7-lxmle[0m e[32m- install version 5.1.0-1ubuntu0.3 of package python-pillowe[0m e[32m- install version 9.1.85-3ubuntu1 of package libcupti-deve[0m e[32m- install version 7.58.0-2ubuntu3.10 of package libcurl3-deve[0m e[32m- install version 0.30.0-0.2 of package python-wheele[0m e[32m- install version 1.11.0-2 of package python-sixe[0m e[0mRecipe: java::notifye[0m * log[jdk-version-changed] action nothing (skipped due to action :nothing) Recipe: java::openjdke[0m * apt_package[openjdk-8-jdk, openjdk-8-jre-headless] action install (up to date) * java_alternatives[set-java-alternatives] action set (up to date) Recipe: java::default_java_symlinke[0m * link[/usr/lib/jvm/default-java] action create (up to date) Recipe: java::set_java_homee[0m * directory[/etc/profile.d] action create (up to date) * template[/etc/profile.d/jdk.sh] action create (up to date) * ruby_block[Set JAVA_HOME in /etc/environment] action run e[32m- execute the ruby block Set JAVA_HOME in /etc/environmente[0m e[0mRecipe: tensorflow::installe[0m * magic_shell_environment[HADOOP_HDFS_HOME] action add Recipe: <Dynamically Defined Resource>e[0m * file[/etc/profile.d/HADOOP_HDFS_HOME.sh] action create e[32m- create new file /etc/profile.d/HADOOP_HDFS_HOME.she[0m e[32m- update content in file /etc/profile.d/HADOOP_HDFS_HOME.sh from none to 5e6d2ae[0m e[37m--- /etc/profile.d/HADOOP_HDFS_HOME.sh 2020-10-15 19:56:54.116977346 +0000e[0m e[37m+++ /etc/profile.d/.chef-HADOOP_HDFS_HOME20201015-22945-1bif74l.sh 2020-10-15 19:56:54.116977346 +0000e[0m e[37m@@ -1 +1,7 @@e[0m e[37m+ #e[0m e[37m+ # This file was generated by Chef for fstoree[0m e[37m+ # Do NOT modify this file by hand!e[0m e[37m+ #e[0m e[37m+e[0m e[37m+ export HADOOP_HDFS_HOME="/srv/hops/hadoop"e[0m e[32m- change mode from '' to '0755'e[0m e[32m- change owner from '' to 'root'e[0m e[32m- change group from '' to 'root'e[0m e[0m (up to date) e[0m Running handlers:e[0m Running handlers complete e[0me[0m Deprecated features used!e[0m Resource openssl_dhparam from a cookbook is overriding the resource from the client. Please upgrade your cookbook or remove the cookbook from your run_list before the next major release of Chef. at 1 location:e[0m - /opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/log.rb:51:in `caller_location'e[0m See https://docs.chef.io/deprecations_map_collision.html for further details.e[0m Resource openssl_rsa_key from a cookbook is overriding the resource from the client. Please upgrade your cookbook or remove the cookbook from your run_list before the next major release of Chef. at 1 location:e[0m - /opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/log.rb:51:in `caller_location'e[0m See https://docs.chef.io/deprecations_map_collision.html for further details.e[0m Resource sudo from a cookbook is overriding the resource from the client. Please upgrade your cookbook or remove the cookbook from your run_list before the next major release of Chef. at 1 location:e[0m - /opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/log.rb:51:in `caller_location'e[0m See https://docs.chef.io/deprecations_map_collision.html for further details.e[0m Resource sysctl_param from a cookbook is overriding the resource from the client. Please upgrade your cookbook or remove the cookbook from your run_list before the next major release of Chef. at 1 location:e[0m - /opt/chefdk/embedded/lib/ruby/gems/2.5.0/gems/chef-14.10.9/lib/chef/log.rb:51:in `caller_location'e[0m See https://docs.chef.io/deprecations_map_collision.html for further details.e[0m e[0m Chef Client finished, 3/10 resources updated in 01 minutes 05 secondse[0m
tensorflow__default.log:
[sudo] password for nbowyer: Starting Chef Client, version 14.10.9e[0m resolving cookbooks for run list: ["tensorflow::default"]e[0m Synchronizing Cookbooks:e[0m - tensorflow (1.4.0)e[0m - java (7.0.0)e[0m - magic_shell (1.0.0)e[0m - zip (1.1.0)e[0m - build-essential (8.2.1)e[0m - apt (7.2.0)e[0m - homebrew (5.0.8)e[0m - kagent (1.4.0)e[0m - ndb (1.4.0)e[0m - hops (1.4.0)e[0m - conda (1.4.0)e[0m - windows (7.0.2)e[0m - seven_zip (3.1.2)e[0m - mingw (2.1.1)e[0m - openssl (4.4.0)e[0m - hostsfile (2.4.6)e[0m - ntp (2.0.3)e[0m - sudo (4.0.1)e[0m - consul (1.4.0)e[0m - ulimit (1.4.0)e[0m - sysctl (1.0.5)e[0m - cmake (0.3.0)e[0m - kzookeeper (1.4.0)e[0m - elastic (1.4.0)e[0m - chef-sugar (5.1.8)e[0m - ohai (5.3.0)e[0m - ulimit2 (0.2.0)e[0m - elasticsearch (4.0.6)e[0m - yum (5.1.0)e[0m - ark (5.0.0)e[0m Installing Cookbook Gems:e[0m Compiling Cookbooks...e[0m Converging 8 resourcese[0m Recipe: tensorflow::defaulte[0m * remote_file[/tmp/chef-solo/demo-1.4.1.tar.gz] action create (up to date) * bash[extract_notebooks] action run e[32m- execute "bash" "/tmp/chef-script20201016-19082-x890bs"e[0m e[0m * hops_hdfs_directory[/user/hdfs/tensorflow_demo] action create_as_superuser * bash[mk-dir-/user/hdfs/tensorflow_demo] action run (skipped due to not_if) (up to date) * hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] action replace_as_superuser * hops_hdfs_directory[/user/hdfs/tensorflow_demo] action rm_as_superuser * bash[rm-/user/hdfs/tensorflow_demo] action run e[32m- execute "bash" "/tmp/chef-script20201016-19082-1r2v52x"e[0m e[0m e[0m * hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] action put_as_superuser * bash[hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] action run e[0m ================================================================================e[0m e[31mError executing action `run` on resource 'bash[hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo]'e[0m ================================================================================e[0m e[0m Mixlib::ShellOut::CommandTimeoute[0m --------------------------------e[0m Command timed out after 3600s: e[0m Command exceeded allowed execution time, process terminated e[0m ---- Begin output of "bash" "/tmp/chef-script20201016-19082-1c0a6xc" ---- e[0m STDOUT: e[0m STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m ---- End output of "bash" "/tmp/chef-script20201016-19082-1c0a6xc" ---- e[0m Ran "bash" "/tmp/chef-script20201016-19082-1c0a6xc" returned e[0m e[0m Resource Declaration:e[0m ---------------------e[0m # In /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb e[0m e[0m 53: bash "hdfs-put-dir-#{new_resource.name}" do e[0m 54: user node['hops']['hdfs']['user'] e[0m 55: group node['hops']['group'] e[0m 56: code <<-EOF e[0m 57: EXISTS=1 e[0m 58: . #{node['hops']['base_dir']}/sbin/set-env.sh e[0m 59: if [ -z $ISDIR ] ; then e[0m 60: #{node['hops']['base_dir']}/bin/hdfs dfs -test -e #{new_resource.dest} e[0m 61: EXISTS=$? e[0m 62: else e[0m 63: #{node['hops']['base_dir']}/bin/hdfs dfs -test -f #{new_resource.dest} e[0m 64: EXISTS=$? e[0m 65: fi e[0m 66: if ([ $EXISTS -ne 0 ] || [ #{new_resource.isDir} ]) ; then e[0m 67: #{node['hops']['base_dir']}/bin/hdfs dfs -copyFromLocal #{new_resource.name} #{new_resource.dest} e[0m 68: #{node['hops']['base_dir']}/bin/hdfs dfs -chown #{new_resource.owner} #{new_resource.dest} e[0m 69: #{node['hops']['base_dir']}/bin/hdfs dfs -chgrp #{new_resource.group} #{new_resource.dest} e[0m 70: if [ "#{new_resource.mode}" != "" ] ; then e[0m 71: #{node['hops']['base_dir']}/bin/hadoop fs -chmod #{new_resource.mode} #{new_resource.dest} e[0m 72: fi e[0m 73: fi e[0m 74: EOF e[0m 75: end e[0m 76: end e[0m e[0m Compiled Resource:e[0m ------------------e[0m # Declared in /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb:53:in `block in class_from_file' e[0m e[0m bash("hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo") do e[0m action [:run] e[0m default_guard_interpreter :default e[0m command nil e[0m backup 5 e[0m interpreter "bash" e[0m declared_type :bash e[0m cookbook_name "tensorflow" e[0m user "hdfs" e[0m code " EXISTS=1\n . /srv/hops/hadoop/sbin/set-env.sh\n if [ -z $ISDIR ] ; then\n /srv/hops/hadoop/bin/hdfs dfs -test -e /user/hdfs/tensorflow_demo\n EXISTS=$?\n else\n /srv/hops/hadoop/bin/hdfs dfs -test -f /user/hdfs/tensorflow_demo\n EXISTS=$?\n fi\n if ([ $EXISTS -ne 0 ] || [ false ]) ; then\n /srv/hops/hadoop/bin/hdfs dfs -copyFromLocal /tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo /user/hdfs/tensorflow_demo\n /srv/hops/hadoop/bin/hdfs dfs -chown hdfs /user/hdfs/tensorflow_demo\n /srv/hops/hadoop/bin/hdfs dfs -chgrp hadoop /user/hdfs/tensorflow_demo\n if [ \"1755\" != \"\" ] ; then\n /srv/hops/hadoop/bin/hadoop fs -chmod 1755 /user/hdfs/tensorflow_demo\n fi\n fi\n" e[0m domain nil e[0m group "hadoop" e[0m end e[0m e[0m System Info:e[0m ------------e[0m chef_version=14.10.9 e[0m platform=ubuntu e[0m platform_version=18.04 e[0m ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux] e[0m program_name=/usr/bin/chef-solo e[0m executable=/opt/chefdk/bin/chef-soloe[0m e[0m e[0m ================================================================================e[0m e[31mError executing action `put_as_superuser` on resource 'hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo]'e[0m ================================================================================e[0m e[0m Mixlib::ShellOut::CommandTimeoute[0m --------------------------------e[0m bash[hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::CommandTimeout: Command timed out after 3600s: e[0m Command exceeded allowed execution time, process terminated e[0m ---- Begin output of "bash" "/tmp/chef-script20201016-19082-1c0a6xc" ---- e[0m STDOUT: e[0m STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m ---- End output of "bash" "/tmp/chef-script20201016-19082-1c0a6xc" ---- e[0m Ran "bash" "/tmp/chef-script20201016-19082-1c0a6xc" returned e[0m e[0m Resource Declaration:e[0m ---------------------e[0m # In /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb e[0m e[0m 143: hops_hdfs_directory "#{new_resource.name}" do e[0m 144: owner "#{new_resource.owner}" e[0m 145: group "#{new_resource.group}" e[0m 146: mode "#{new_resource.mode}" e[0m 147: dest "#{new_resource.dest}" e[0m 148: action :put_as_superuser e[0m 149: end e[0m 150: e[0m e[0m Compiled Resource:e[0m ------------------e[0m # Declared in /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb:143:in `block in class_from_file' e[0m e[0m hops_hdfs_directory("/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo") do e[0m action [:put_as_superuser] e[0m default_guard_interpreter :default e[0m declared_type :hops_hdfs_directory e[0m cookbook_name "tensorflow" e[0m owner "hdfs" e[0m group "hadoop" e[0m mode "1755" e[0m dest "/user/hdfs/tensorflow_demo" e[0m end e[0m e[0m System Info:e[0m ------------e[0m chef_version=14.10.9 e[0m platform=ubuntu e[0m platform_version=18.04 e[0m ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux] e[0m program_name=/usr/bin/chef-solo e[0m executable=/opt/chefdk/bin/chef-soloe[0m e[0m e[0m ================================================================================e[0m e[31mError executing action `replace_as_superuser` on resource 'hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo]'e[0m ================================================================================e[0m e[0m Mixlib::ShellOut::CommandTimeoute[0m --------------------------------e[0m hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 143) had an error: Mixlib::ShellOut::CommandTimeout: bash[hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::CommandTimeout: Command timed out after 3600s: e[0m Command exceeded allowed execution time, process terminated e[0m ---- Begin output of "bash" "/tmp/chef-script20201016-19082-1c0a6xc" ---- e[0m STDOUT: e[0m STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m copyFromLocal: Unable to close file because the last block does not have enough number of replicas. e[0m ---- End output of "bash" "/tmp/chef-script20201016-19082-1c0a6xc" ---- e[0m Ran "bash" "/tmp/chef-script20201016-19082-1c0a6xc" returned e[0m e[0m Resource Declaration:e[0m ---------------------e[0m # In /tmp/chef-solo/cookbooks/tensorflow/recipes/default.rb e[0m e[0m 37: hops_hdfs_directory "#{Chef::Config['file_cache_path']}/#{node['tensorflow']['hopstfdemo_dir']}-#{node['tensorflow']['examples_version']}/#{node['tensorflow']['hopstfdemo_dir']}" do e[0m 38: action :replace_as_superuser e[0m 39: owner node['hops']['hdfs']['user'] e[0m 40: group node['hops']['group'] e[0m 41: mode "1755" e[0m 42: dest "/user/#{node['hops']['hdfs']['user']}/#{node['tensorflow']['hopstfdemo_dir']}" e[0m 43: end e[0m 44: e[0m e[0m Compiled Resource:e[0m ------------------e[0m # Declared in /tmp/chef-solo/cookbooks/tensorflow/recipes/default.rb:37:in `from_file' e[0m e[0m hops_hdfs_directory("/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo") do e[0m action [:replace_as_superuser] e[0m updated true e[0m updated_by_last_action true e[0m default_guard_interpreter :default e[0m declared_type :hops_hdfs_directory e[0m cookbook_name "tensorflow" e[0m recipe_name "default" e[0m owner "hdfs" e[0m group "hadoop" e[0m mode "1755" e[0m dest "/user/hdfs/tensorflow_demo" e[0m end e[0m e[0m System Info:e[0m ------------e[0m chef_version=14.10.9 e[0m platform=ubuntu e[0m platform_version=18.04 e[0m ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux] e[0m program_name=/usr/bin/chef-solo e[0m executable=/opt/chefdk/bin/chef-soloe[0m e[0me[0m Running handlers:e[0m [2020-10-16T15:19:14+00:00] ERROR: Running exception handlers Running handlers complete e[0m[2020-10-16T15:19:14+00:00] ERROR: Exception handlers complete Chef Client failed. 3 resources updated in 01 hours 01 minutes 03 secondse[0m [2020-10-16T15:19:14+00:00] FATAL: Stacktrace dumped to /tmp/chef-solo/chef-stacktrace.out [2020-10-16T15:19:14+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report [2020-10-16T15:19:14+00:00] FATAL: Mixlib::ShellOut::CommandTimeout: hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (tensorflow::default line 37) had an error: Mixlib::ShellOut::CommandTimeout: hops_hdfs_directory[/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 143) had an error: Mixlib::ShellOut::CommandTimeout: bash[hdfs-put-dir-/tmp/chef-solo/tensorflow_demo-1.4.1/tensorflow_demo] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 53) had an error: Mixlib::ShellOut::CommandTimeout: Command timed out after 3600s: Command exceeded allowed execution time, process terminated ---- Begin output of "bash" "/tmp/chef-script20201016-19082-1c0a6xc" ---- STDOUT: STDERR: copyFromLocal: Unable to close file because the last block does not have enough number of replicas. copyFromLocal: Unable to close file because the last block does not have enough number of replicas. copyFromLocal: Unable to close file because the last block does not have enough number of replicas. copyFromLocal: Unable to close file because the last block does not have enough number of replicas. copyFromLocal: Unable to close file because the last block does not have enough number of replicas. copyFromLocal: Unable to close file because the last block does not have enough number of replicas. copyFromLocal: Unable to close file because the last block does not have enough number of replicas. copyFromLocal: Unable to close file because the last block does not have enough number of replicas. ---- End output of "bash" "/tmp/chef-script20201016-19082-1c0a6xc" ---- Ran "bash" "/tmp/chef-script20201016-19082-1c0a6xc" returned
These logs are after I attempted a “Retry” on the tensorflow item through the Karamel web UI, but the errors or logs look the same or similar to what I saw in the logs on the initial install attempt.