I’m trying to deploy the Hopsworks in a cluster environment (1 master node and 2 worker nodes), but the installation of “kagent::default” has failed on the worker nodes.
The log information is as follows:
Section 1:
template[/srv/hops/kagent/kagent/bin/edit-config-ini-inplace.py] action create (up to date)
template[/srv/hops/kagent/kagent/bin/edit-and-start.sh] action create (up to date)
template[/srv/hops/kagent/etc/config.ini] action create (up to date)
bash[chown_/srv/hops/kagent/host-certs] action run
^[[32m- execute “bash” “/tmp/chef-script20210105-18166-z0wv3p”^[[0m
^[[0m * template[/srv/hops/kagent/host-certs/keystore.sh] action create (up to date)kagent_hopsify[Register Host] action register_hostcerts
- bash[Register Host with Hopsworks] action run
^[[0m
================================================================================^[[0m
^[[31mError executing actionrun
on resource ‘bash[Register Host with Hopsworks]’^[[0m
================================================================================^[[0m^[[0m Mixlib::ShellOut::ShellCommandFailed^[[0m
------------------------------------^[[0m
Expected process to exit with [0], but received ‘1’
^[[0m ---- Begin output of “bash” “/tmp/chef-script20210105-18166-ncisi8” ----
^[[0m STDOUT: time=“2021-01-05T10:17:02+08:00” level=info msg=“Executing host command”
^[[0m time=“2021-01-05T10:17:02+08:00” level=info msg=“Server url https://10.12.9.220:443”
^[[0m time=“2021-01-05T10:17:02+08:00” level=info msg=“Successfully logged in”
^[[0m time=“2021-01-05T10:17:02+08:00” level=error msg=“Failed to perform HTTP operation - status: 404 Retrying… {“type”:“restApiJsonResponse”,“errorCode”:100025,“errorMsg”:“Host was not found.”,“usrMsg”:“hostname: dwfainode1”}”
Section 2:
^[[0m Ran “bash” “/tmp/chef-script20210105-18166-ncisi8” returned 1^[[0m
^[[0m Resource Declaration:^[[0m
---------------------^[[0m
# In /tmp/chef-solo/cookbooks/kagent/providers/hopsify.rb
^[[0m
^[[0m 6: bash “Register Host with Hopsworks” do
^[[0m 7: user node[‘kagent’][‘certs_user’]
^[[0m 8: group node[‘kagent’][‘group’]
^[[0m 9: puts node[‘kagent’][‘certs_user’]
^[[0m 10: code <<-EOH
^[[0m 11: #{node[“kagent”][“certs_dir”]}/hopsify --config #{node[‘kagent’][‘etc’]}/config.ini #{hopsworks_alt_url} host
^[[0m 12: EOH
^[[0m 13: end
^[[0m 14: end
^[[0m
^[[0m Compiled Resource:^[[0m
------------------^[[0m
# Declared in /tmp/chef-solo/cookbooks/kagent/providers/hopsify.rb:6:in `block in class_from_file’
^[[0m
^[[0m bash(“Register Host with Hopsworks”) do
^[[0m action [:run]
^[[0m default_guard_interpreter :default
^[[0m command nil
^[[0m backup 5
^[[0m interpreter “bash”
^[[0m declared_type :bash
^[[0m cookbook_name “kagent”
^[[0m code " /srv/hops/kagent/host-certs/hopsify --config /srv/hops/kagent/etc/config.ini --alt-url https://10.12.9.220:443 host\n"
^[[0m domain nil
^[[0m user “certs”
^[[0m group “kagent”
^[[0m end
^[[0m
^[[0m System Info:^[[0m
------------^[[0m
chef_version=14.10.9
^[[0m platform=centos
^[[0m platform_version=7.9.2009
^[[0m ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
^[[0m program_name=/bin/chef-solo
^[[0m executable=/opt/chefdk/bin/chef-solo^[[0m^[[0m ^[[0m
================================================================================^[[0m
^[[31mError executing actionregister_host
on resource ‘kagent_hopsify[Register Host]’^[[0m
================================================================================^[[0m^[[0m Mixlib::ShellOut::ShellCommandFailed^[[0m
------------------------------------^[[0m
bash[Register Host with Hopsworks] (/tmp/chef-solo/cookbooks/kagent/providers/hopsify.rb line 6) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received ‘1’
^[[0m ---- Begin output of “bash” “/tmp/chef-script20210105-18166-ncisi8” ----
^[[0m STDOUT: time=“2021-01-05T10:17:02+08:00” level=info msg=“Executing host command”
^[[0m time=“2021-01-05T10:17:02+08:00” level=info msg=“Server url https://10.12.9.220:443”
^[[0m time=“2021-01-05T10:17:02+08:00” level=info msg=“Successfully logged in”
^[[0m time=“2021-01-05T10:17:02+08:00” level=error msg=“Failed to perform HTTP operation - status: 404 Retrying… {“type”:“restApiJsonResponse”,“errorCode”:100025,“errorMsg”:“Host was not found.”,“usrMsg”:“hostname: dwfainode1”}”
The 443 port has been occupied by the Glassfish, it looks like that the worker nodes can not access the services of Glassfish, which has been deployed on master node. Or I guess that some Glassfish services did not start.
Has anyone encountered this problem?