Vagrant install - tensorflow not installing

Hello,

i’ve tried to install a new VM on 2 different physical servers, both servers without firewall,activated using Centos 7.0 and both hopsworks version (1.2 & 1.3) stuck to the same point

==> default: END_OF_FILE
==> default: sudo chef-solo -c /home/vagrant/.karamel/install/solo.rb -j /home/vagrant/.karamel/install/tensorflow__install.json 2>&1 | tee tensorflow__install.log
==> default: echo ‘https://github.com/logicalclocks/tensorflow-chef/tree/1.2/tensorflow::install’ >> succeed_list
==> default: ’ > tensorflow__install.sh ; chmod +x tensorflow__install.sh ; ./tensorflow__install.sh
==> default: ', DAG is stuck here :frowning:
==> default: INFO [2020-07-19 03:58:33,901] se.kth.karamel.backend.machines.MachinesMonitor: Sending pause signal to all machines

i saw this on other comments but was related to bare-metal installation. where mentioned is a problem with github but i can reach the locations (https://github.com/logicalclocks/tensorflow-chef/tree/1.2/tensorflow & https://github.com/logicalclocks/tensorflow-chef/tree/1.3/tensorflow ) without any problem

i checked the install 6 hrs later and this is the end of the install

==> default: INFO  [2020-07-19 03:58:33,901] se.kth.karamel.backend.machines.MachinesMonitor: Sending pause signal to all machines
==> default: INFO  [2020-07-19 19:20:33,648] se.kth.karamel.webservice.KaramelServiceApplication: Bye! Cleaning up first....
==> default: INFO  [2020-07-19 19:20:33,657] org.eclipse.jetty.server.ServerConnector: Stopped karamel-core@19ae6bb{HTTP/1.1}{0.0.0.0:9090}
==> default: INFO  [2020-07-19 19:20:33,659] org.eclipse.jetty.server.handler.ContextHandler: Stopped i.d.j.MutableServletContextHandler@5cad8b7d{/admin,null,UNAVAILABLE}
==> default: INFO  [2020-07-19 19:20:33,659] org.eclipse.jetty.server.handler.ContextHandler: Stopped i.d.j.MutableServletContextHandler@41de5768{/,null,UNAVAILABLE}
==> default: STDERR:
==> default: ---- End output of "bash"  "/tmp/chef-script20200719-13136-cwn5p" ----
==> default: Ran "bash"  "/tmp/chef-script20200719-13136-cwn5p" returned
==> default: [2020-07-19T19:20:36+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
Chef never successfully completed! Any errors should be visible in the
output above. Please fix your recipes so that they properly complete.

Hi @Fernando_Marines,

You can check (inside the vm) under /home/[install user]/.karamel/install the tensorflow__install.log for a more detailed reason for the failure. If you paste the last 20-30 lines of it here, we might be able to help you further.

Thank you @Alex for your reply

reviewing the log i was able to understand what i did wrong,turns out the GPU’s are missing on the Vm and the Nvidia tests fails

^[[0m platform=centos
^[[0m platform_version=7.8.2003
^[[0m ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
^[[0m program_name=/bin/chef-solo
^[[0m executable=/opt/chefdk/bin/chef-solo^[[0m

^[[0m^[[0m
Running handlers:^[[0m
[2020-07-20T06:32:41+00:00] ERROR: Running exception handlers
Running handlers complete
^[[0m[2020-07-20T06:32:41+00:00] ERROR: Exception handlers complete
Chef Client failed. 1 resources updated in 13 seconds^[[0m
[2020-07-20T06:32:41+00:00] FATAL: Stacktrace dumped to /tmp/chef-solo/chef-stacktrace.out
[2020-07-20T06:32:41+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2020-07-20T06:32:41+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: bash[test_nvidia] (tensorflow::install line 216) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1’
---- Begin output of “bash” “/tmp/chef-script20200720-30809-l2w8gh” ----
STDOUT:
STDERR:
---- End output of “bash” “/tmp/chef-script20200720-30809-l2w8gh” ----
Ran “bash” “/tmp/chef-script20200720-30809-l2w8gh” returned 1