i’ve tried to install a new VM on 2 different physical servers, both servers without firewall,activated using Centos 7.0 and both hopsworks version (1.2 & 1.3) stuck to the same point
==> default: END_OF_FILE
==> default: sudo chef-solo -c /home/vagrant/.karamel/install/solo.rb -j /home/vagrant/.karamel/install/tensorflow__install.json 2>&1 | tee tensorflow__install.log
==> default: echo ‘https://github.com/logicalclocks/tensorflow-chef/tree/1.2/tensorflow::install’ >> succeed_list
==> default: ’ > tensorflow__install.sh ; chmod +x tensorflow__install.sh ; ./tensorflow__install.sh
==> default: ', DAG is stuck here
==> default: INFO [2020-07-19 03:58:33,901] se.kth.karamel.backend.machines.MachinesMonitor: Sending pause signal to all machines
i checked the install 6 hrs later and this is the end of the install
==> default: INFO [2020-07-19 03:58:33,901] se.kth.karamel.backend.machines.MachinesMonitor: Sending pause signal to all machines
==> default: INFO [2020-07-19 19:20:33,648] se.kth.karamel.webservice.KaramelServiceApplication: Bye! Cleaning up first....
==> default: INFO [2020-07-19 19:20:33,657] org.eclipse.jetty.server.ServerConnector: Stopped karamel-core@19ae6bb{HTTP/1.1}{0.0.0.0:9090}
==> default: INFO [2020-07-19 19:20:33,659] org.eclipse.jetty.server.handler.ContextHandler: Stopped i.d.j.MutableServletContextHandler@5cad8b7d{/admin,null,UNAVAILABLE}
==> default: INFO [2020-07-19 19:20:33,659] org.eclipse.jetty.server.handler.ContextHandler: Stopped i.d.j.MutableServletContextHandler@41de5768{/,null,UNAVAILABLE}
==> default: STDERR:
==> default: ---- End output of "bash" "/tmp/chef-script20200719-13136-cwn5p" ----
==> default: Ran "bash" "/tmp/chef-script20200719-13136-cwn5p" returned
==> default: [2020-07-19T19:20:36+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
Chef never successfully completed! Any errors should be visible in the
output above. Please fix your recipes so that they properly complete.
You can check (inside the vm) under /home/[install user]/.karamel/install the tensorflow__install.log for a more detailed reason for the failure. If you paste the last 20-30 lines of it here, we might be able to help you further.
^[[0m^[[0m
Running handlers:^[[0m
[2020-07-20T06:32:41+00:00] ERROR: Running exception handlers
Running handlers complete
^[[0m[2020-07-20T06:32:41+00:00] ERROR: Exception handlers complete
Chef Client failed. 1 resources updated in 13 seconds^[[0m
[2020-07-20T06:32:41+00:00] FATAL: Stacktrace dumped to /tmp/chef-solo/chef-stacktrace.out
[2020-07-20T06:32:41+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2020-07-20T06:32:41+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: bash[test_nvidia] (tensorflow::install line 216) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1’
---- Begin output of “bash” “/tmp/chef-script20200720-30809-l2w8gh” ----
STDOUT:
STDERR:
---- End output of “bash” “/tmp/chef-script20200720-30809-l2w8gh” ----
Ran “bash” “/tmp/chef-script20200720-30809-l2w8gh” returned 1