I’m trying to deploy the Hopsworks in a cluster environment (1 master node and 2 worker nodes), but the installation of “kagent::default” has failed on the worker nodes.
The log information is as follows:
Section 1:
template[/srv/hops/kagent/kagent/bin/edit-config-ini-inplace.py] action create (up to date)
template[/srv/hops/kagent/kagent/bin/edit-and-start.sh] action create (up to date)
template[/srv/hops/kagent/etc/config.ini] action create (up to date)
bash[chown_/srv/hops/kagent/host-certs] action run
^[[32m- execute “bash” “/tmp/chef-script20210105-18166-z0wv3p”^[[0m
^[[0m * template[/srv/hops/kagent/host-certs/keystore.sh] action create (up to date)
bash[Register Host with Hopsworks] action run
^[[0m
================================================================================^[[0m
^[[31mError executing action run on resource ‘bash[Register Host with Hopsworks]’^[[0m
================================================================================^[[0m
^[[0m Mixlib::ShellOut::ShellCommandFailed^[[0m
------------------------------------^[[0m
Expected process to exit with [0], but received ‘1’
^[[0m ---- Begin output of “bash” “/tmp/chef-script20210105-18166-ncisi8” ----
^[[0m STDOUT: time=“2021-01-05T10:17:02+08:00” level=info msg=“Executing host command”
^[[0m time=“2021-01-05T10:17:02+08:00” level=info msg=“Server url https://10.12.9.220:443”
^[[0m time=“2021-01-05T10:17:02+08:00” level=info msg=“Successfully logged in”
^[[0m time=“2021-01-05T10:17:02+08:00” level=error msg=“Failed to perform HTTP operation - status: 404 Retrying… {“type”:“restApiJsonResponse”,“errorCode”:100025,“errorMsg”:“Host was not found.”,“usrMsg”:“hostname: dwfainode1”}”
Section 2:
^[[0m Ran “bash” “/tmp/chef-script20210105-18166-ncisi8” returned 1^[[0m
^[[0m Resource Declaration:^[[0m
---------------------^[[0m
# In /tmp/chef-solo/cookbooks/kagent/providers/hopsify.rb
^[[0m
^[[0m 6: bash “Register Host with Hopsworks” do
^[[0m 7: user node[‘kagent’][‘certs_user’]
^[[0m 8: group node[‘kagent’][‘group’]
^[[0m 9: puts node[‘kagent’][‘certs_user’]
^[[0m 10: code <<-EOH
^[[0m 11: #{node[“kagent”][“certs_dir”]}/hopsify --config #{node[‘kagent’][‘etc’]}/config.ini #{hopsworks_alt_url} host
^[[0m 12: EOH
^[[0m 13: end
^[[0m 14: end
^[[0m
^[[0m Compiled Resource:^[[0m
------------------^[[0m
# Declared in /tmp/chef-solo/cookbooks/kagent/providers/hopsify.rb:6:in `block in class_from_file’
^[[0m
^[[0m bash(“Register Host with Hopsworks”) do
^[[0m action [:run]
^[[0m default_guard_interpreter :default
^[[0m command nil
^[[0m backup 5
^[[0m interpreter “bash”
^[[0m declared_type :bash
^[[0m cookbook_name “kagent”
^[[0m code " /srv/hops/kagent/host-certs/hopsify --config /srv/hops/kagent/etc/config.ini --alt-url https://10.12.9.220:443 host\n"
^[[0m domain nil
^[[0m user “certs”
^[[0m group “kagent”
^[[0m end
^[[0m
^[[0m System Info:^[[0m
------------^[[0m
chef_version=14.10.9
^[[0m platform=centos
^[[0m platform_version=7.9.2009
^[[0m ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
^[[0m program_name=/bin/chef-solo
^[[0m executable=/opt/chefdk/bin/chef-solo^[[0m
^[[0m Mixlib::ShellOut::ShellCommandFailed^[[0m
------------------------------------^[[0m
bash[Register Host with Hopsworks] (/tmp/chef-solo/cookbooks/kagent/providers/hopsify.rb line 6) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received ‘1’
^[[0m ---- Begin output of “bash” “/tmp/chef-script20210105-18166-ncisi8” ----
^[[0m STDOUT: time=“2021-01-05T10:17:02+08:00” level=info msg=“Executing host command”
^[[0m time=“2021-01-05T10:17:02+08:00” level=info msg=“Server url https://10.12.9.220:443”
^[[0m time=“2021-01-05T10:17:02+08:00” level=info msg=“Successfully logged in”
^[[0m time=“2021-01-05T10:17:02+08:00” level=error msg=“Failed to perform HTTP operation - status: 404 Retrying… {“type”:“restApiJsonResponse”,“errorCode”:100025,“errorMsg”:“Host was not found.”,“usrMsg”:“hostname: dwfainode1”}”
The 443 port has been occupied by the Glassfish, it looks like that the worker nodes can not access the services of Glassfish, which has been deployed on master node. Or I guess that some Glassfish services did not start.
IP address and host name of each node have been configured in “/etc/hosts” in each node. And every node in the cluster can ssh into each other without password.
But the IP address and host name of each node have not been configured in the DNS server. Do I have to configure the mapping relation between IP address and host name in DNS server?
Hi Jim,
By reading the source code (Class: HostsController, Method: findByHostname),I found that the host name was queried from the MySQL cluster. So I want to know when the host name was written to the database.
I remembered that the installation script had asked me to enter the information of each work node, and I entered the IP address. Should I enter the host name at that time?
Additional information:
Just now I read the hopsworks-installer.sh,
add_worker()
{
if [ “$WORKER_DEFAULTS” != “true” ] ; then
printf 'Please enter the IP of the worker you want to add: ’
read WORKER_IP
fi
ssh -t -o StrictHostKeyChecking=no $WORKER_IP "whoami" > /dev/null
if [ $? -ne 0 ] ; then
echo "Failed to ssh using public into: ${USER}@${WORKER_IP}"
echo "Cannot add worker node, as you need to be able to ssh into it using your public key"
echo ""
echo ""
echo "You can setup passwordless SSH to setup to ${USER}@${WORKER_IP} by entering the password."
echo "Running ssh-copy-id.... "
ssh-copy-id -i ${HOME}/.ssh/id_rsa.pub ${USER}@${WORKER_IP}
if [ $? -ne 0 ] ; then
exit_error "Problem setting up passwordless SSH to ${USER}@${WORKER_IP}"
fi
fi
Hi Jim,
The problem has been solved,I have installed the cluster successfully.
The cause of the issue is that some host information was not initialized into the “Hosts” table correctly, and I modified these data manually.
But I still don’t know the real cause of this issue.
BTW, Could you please tell me if the Hopsworks has provided the APIs for redevelopment?
Hi Jim,
We may do the second-development based on Hopsworks in the future, so I am more concerned about whether Hopsworks has provided the SDK or RESTful API, so that we can develop our own apps quickly.
I have looked up the pages of the official website, but I didn’t find the relevant information.
Hi Jim,
Thank you for the specific answer, these links are really helpful to us.
But when I tried to test these RESTful APIs (Version: 1.4), I ran into an authorization problem.
For example, I called a RESTful API whose URL is https://Host_IP/hopsworks-api/api/admin/projects, the returned JSON was :
I noticed that there was a “Api keys” menu in “Settings”. Then I generated an API key, which included all scopes, and set it into header parameter, the returned JSON was :
{
“type”: “restApiJsonResponse”,
“errorCode”: 200003,
“errorMsg”: “Invalidated Api key.”
}
I used Postman to test and the name of header parameter was “Authorization”.
Could you tell me how to get the correct authorization code?