Error executing action `create_as_superuser` on resource 'hops_hdfs_directory[/tmp]'

When I executed hops__nn.sh, I encountered the two errors:

Error executing action run on resource ‘bash[mk-dir-/tmp]’

Error executing action create_as_superuser on resource ‘hops_hdfs_directory[/tmp]’

Some of the error messages are as follows :

  • ruby_block[wait_until_nn_started] action run

    • execute the ruby block wait_until_nn_started
  • hops_hdfs_directory[/tmp] action create_as_superuser

    • bash[mk-dir-/tmp] action run

      ================================================================================
      Error executing action run on resource ‘bash[mk-dir-/tmp]’

      Mixlib::ShellOut::ShellCommandFailed

      Expected process to exit with [0], but received ‘1’
      ---- Begin output of “bash” “/tmp/chef-script20210526-35426-if9my9” ----
      STDOUT:
      STDERR: mkdir: Could not authenticate client with CN aicore.goldwind.com.cn remote IP /10.12.3.189 and username hdfs for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol
      mkdir: Could not authenticate client with CN aicore.goldwind.com.cn remote IP /10.12.3.189 and username hdfs for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol
      chown: Could not authenticate client with CN aicore.goldwind.com.cn remote IP /10.12.3.189 and username hdfs for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol
      chgrp: Could not authenticate client with CN aicore.goldwind.com.cn remote IP /10.12.3.189 and username hdfs for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol
      chmod: Could not authenticate client with CN aicore.goldwind.com.cn remote IP /10.12.3.189 and username hdfs for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol
      ---- End output of “bash” “/tmp/chef-script20210526-35426-if9my9” ----
      Ran “bash” “/tmp/chef-script20210526-35426-if9my9” returned 1

      Resource Declaration:

      In /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb

      87: bash “mk-dir-#{new_resource.name}” do
      88: user node[‘hops’][‘hdfs’][‘user’]
      89: group node[‘hops’][‘group’]
      90: retries 1
      91: code <<-EOF
      92: . #{node[‘hops’][‘base_dir’]}/sbin/set-env.sh
      93: #{node[‘hops’][‘base_dir’]}/bin/hdfs dfs -mkdir #{recursive} #{new_resource.name}
      94: if [ $? -ne 0 ] ; then
      95: sleep 10
      96: #{node[‘hops’][‘base_dir’]}/bin/hdfs dfs -mkdir #{recursive} #{new_resource.name}
      97: fi
      98: #{node[‘hops’][‘base_dir’]}/bin/hdfs dfs -chown #{new_resource.owner} #{new_resource.name}
      99: #{node[‘hops’][‘base_dir’]}/bin/hdfs dfs -chgrp #{new_resource.group} #{new_resource.name}
      100: if [ “#{new_resource.mode}” != “” ] ; then
      101: #{node[‘hops’][‘base_dir’]}/bin/hadoop fs -chmod #{new_resource.mode} #{new_resource.name}
      102: fi
      103: EOF
      104: not_if “su #{node[‘hops’][‘hdfs’][‘user’]} -c “. #{node[‘hops’][‘base_dir’]}/sbin/set-env.sh && #{node[‘hops’][‘base_dir’]}/bin/hdfs dfs -test -d #{new_resource.name}””
      105: end
      106:

      Compiled Resource:

      Declared in /tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb:87:in `block in class_from_file’

      bash(“mk-dir-/tmp”) do
      action [:run]
      default_guard_interpreter :default
      command nil
      backup 5
      interpreter “bash”
      declared_type :bash
      cookbook_name “hops”
      user “hdfs”
      group “hadoop”
      code " . /srv/hops/hadoop/sbin/set-env.sh\n /srv/hops/hadoop/bin/hdfs dfs -mkdir -p /tmp\n if [ $? -ne 0 ] ; then\n sleep 10\n /srv/hops/hadoop/bin/hdfs dfs -mkdir -p /tmp\n fi\n /srv/hops/hadoop/bin/hdfs dfs -chown hdfs /tmp\n /srv/hops/hadoop/bin/hdfs dfs -chgrp hadoop /tmp\n if [ “1775” != “” ] ; then\n /srv/hops/hadoop/bin/hadoop fs -chmod 1775 /tmp\n fi\n"
      domain nil
      retries 1
      not_if “su hdfs -c “. /srv/hops/hadoop/sbin/set-env.sh && /srv/hops/hadoop/bin/hdfs dfs -test -d /tmp””
      end

      System Info:

      chef_version=14.10.9
      platform=centos
      platform_version=7.9.2009
      ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
      program_name=/bin/chef-solo
      executable=/opt/chefdk/bin/chef-solo

    ================================================================================
    Error executing action create_as_superuser on resource ‘hops_hdfs_directory[/tmp]’

    Mixlib::ShellOut::ShellCommandFailed

    bash[mk-dir-/tmp] (/tmp/chef-solo/cookbooks/hops/providers/hdfs_directory.rb line 87) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received ‘1’
    ---- Begin output of “bash” “/tmp/chef-script20210526-35426-if9my9” ----
    STDOUT:
    STDERR: mkdir: Could not authenticate client with CN aicore.goldwind.com.cn remote IP /10.12.3.189 and username hdfs for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol
    mkdir: Could not authenticate client with CN aicore.goldwind.com.cn remote IP /10.12.3.189 and username hdfs for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol
    chown: Could not authenticate client with CN aicore.goldwind.com.cn remote IP /10.12.3.189 and username hdfs for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol
    chgrp: Could not authenticate client with CN aicore.goldwind.com.cn remote IP /10.12.3.189 and username hdfs for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol
    chmod: Could not authenticate client with CN aicore.goldwind.com.cn remote IP /10.12.3.189 and username hdfs for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol
    ---- End output of “bash” “/tmp/chef-script20210526-35426-if9my9” ----
    Ran “bash” “/tmp/chef-script20210526-35426-if9my9” returned 1

    Resource Declaration:

    In /tmp/chef-solo/cookbooks/hops/recipes/nn.rb

    178: hops_hdfs_directory d do
    179: action :create_as_superuser
    180: owner node[‘hops’][‘hdfs’][‘user’]
    181: group node[‘hops’][‘group’]
    182: mode “1775”
    183: end
    184: end

    Compiled Resource:

    Declared in /tmp/chef-solo/cookbooks/hops/recipes/nn.rb:178:in `block in from_file’

    hops_hdfs_directory("/tmp") do
    action [:create_as_superuser]
    default_guard_interpreter :default
    declared_type :hops_hdfs_directory
    cookbook_name “hops”
    recipe_name “nn”
    owner “hdfs”
    group “hadoop”
    mode “1775”
    end

    System Info:

    chef_version=14.10.9
    platform=centos
    platform_version=7.9.2009
    ruby=ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
    program_name=/bin/chef-solo
    executable=/opt/chefdk/bin/chef-solo

  • service[namenode] action enable

    • enable service service[namenode]
  • service[namenode] action restart

    • restart service service[namenode]

Has anyone ever had this problem? Any comments will be much appreciated.

@Freeman - This usually happens if there is an issue with the fqdn/hostname configuration. Usually this error means that the client is using a certificate which is not valid for aicore.goldwind.com.cn. - Can you make sure that your /etc/hosts is correct?

Can you also post the output of this command: keytool -v -list -keystore /srv/hops/super_crypto/hdfs/hdfs__kstore.jks - When it asks for the password, just don’t enter anything and press enter.

What I’m looking for is that the Onwer of the certificate is something like:

Owner: CN=aicore.goldwind.com.cn, OU=0, L=hdfs, ST=Sweden, C=SE

@Fabio ,
Sorry for my late reply. I post the output of the relevant command here :

[appadm@aicore ~]$ hostname
aicore.goldwind.com.cn
[appadm@aicore ~]$ cat /etc/hosts
127.0.0.1 aicore.goldwind.com.cn localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.12.3.189 aicore.goldwind.com.cn
10.12.60.21 aicoreworker1
10.12.60.22 aicoreworker2
10.12.60.23 aicoreworker3
10.12.60.24 aicoreworker4

aicore.goldwind.com.cn is the host name of the master node, it is also an internal domain name bound to the master node.
aicoreworker1, aicoreworker2, aicoreworker3, aicoreworker4 are the host name of the four worker nodes.

And I found the following information in the output of the command keytool -v -list -keystore … :

Certificate[1]:
Owner: CN=aicore.goldwind.com.cn, OU=0, L=hdfs, ST=Sweden, C=SE

All outputs are as follows:

[appadm@aicore ~]$ sudo keytool -v -list -keystore /srv/hops/super_crypto/hdfs/hdfs__kstore.jks
Enter keystore password:

***************** WARNING WARNING WARNING *****************

  • The integrity of the information stored in your keystore *
  • has NOT been verified! In order to verify its integrity, *
  • you must provide your keystore password. *
    ***************** WARNING WARNING WARNING *****************

Keystore type: jks
Keystore provider: SUN

Your keystore contains 1 entry

Alias name: aicore.goldwind.com.cn
Creation date: May 26, 2021
Entry type: PrivateKeyEntry
Certificate chain length: 2
Certificate[1]:
Owner: CN=aicore.goldwind.com.cn, OU=0, L=hdfs, ST=Sweden, C=SE
Issuer: CN=HopsIntermediateCA, O=SICS, ST=Sweden, C=SE
Serial number: 100f
Valid from: Wed May 26 22:55:26 CST 2021 until: Sat May 24 22:55:26 CST 2031
Certificate fingerprints:
SHA1: 92:E3:40:DF:82:B5:F8:A3:D3:28:A8:AF:2E:06:3B:CD:A9:A4:14:82
SHA256: 7C:2B:16:35:C9:AF:DA:C0:5C:AB:3F:0F:AC:68:5F:1C:E1:63:65:A5:36:3E:6A:DB:36:90:24:E5:46:4E:94:C8
Signature algorithm name: SHA256withRSA
Subject Public Key Algorithm: 2048-bit RSA key (3)
Version: {10}

Extensions:

#1: ObjectId: 2.16.840.1.113730.1.13 Criticality=false
0000: 16 24 4F 70 65 6E 53 53 4C 20 47 65 6E 65 72 61 .$OpenSSL Genera
0010: 74 65 64 20 43 6C 69 65 6E 74 20 43 65 72 74 69 ted Client Certi
0020: 66 69 63 61 74 65 ficate

#2: ObjectId: 2.5.29.35 Criticality=false
AuthorityKeyIdentifier [
KeyIdentifier [
0000: 0E EE 42 0B F6 67 AD 79 24 7E 4A DC 9B F5 1D A2 …B…g.y$.J…
0010: E1 42 74 E2 .Bt.
]
]

#3: ObjectId: 2.5.29.19 Criticality=false
BasicConstraints:[
CA:false
PathLen: undefined
]

#4: ObjectId: 2.5.29.37 Criticality=false
ExtendedKeyUsages [
clientAuth
emailProtection
serverAuth
]

#5: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
DigitalSignature
Non_repudiation
Key_Encipherment
]

#6: ObjectId: 2.16.840.1.113730.1.1 Criticality=false
NetscapeCertType [
SSL client
SSL server
S/MIME
]

#7: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 36 2E 0C 2F 79 3A 44 8C D1 F8 92 A7 C9 C6 2B ED 6…/y:D…+.
0010: 99 E0 EB 8C …
]
]

Certificate[2]:
Owner: CN=HopsIntermediateCA, O=SICS, ST=Sweden, C=SE
Issuer: CN=HopsRootCA, O=SICS, L=Stockholm, ST=Sweden, C=SE
Serial number: 1000
Valid from: Mon May 24 21:46:33 CST 2021 until: Thu May 22 21:46:33 CST 2031
Certificate fingerprints:
SHA1: 4B:60:0B:B4:35:87:CA:B1:BB:AE:42:62:0B:F0:4A:46:4D:80:CD:C9
SHA256: 76:72:2A:81:7C:79:DA:66:49:B5:73:4E:50:5F:EE:AB:E4:A1:31:99:85:49:39:CB:FD:44:B7:36:0C:20:10:E3
Signature algorithm name: SHA256withRSA
Subject Public Key Algorithm: 4096-bit RSA key (3)
Version: {10}

Extensions:

#1: ObjectId: 2.5.29.35 Criticality=false
AuthorityKeyIdentifier [
KeyIdentifier [
0000: 74 58 B9 C4 D9 6E 2C 09 AD 57 61 AC A1 25 DD 89 tX…n,…Wa…%…
0010: A8 72 C7 CE .r…
]
]

#2: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
CA:true
PathLen:0
]

#3: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
DigitalSignature
Key_CertSign
Crl_Sign
]

#4: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 0E EE 42 0B F6 67 AD 79 24 7E 4A DC 9B F5 1D A2 …B…g.y$.J…
0010: E1 42 74 E2 .Bt.
]
]



Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using “keytool -importkeystore -srckeystore /srv/hops/super_crypto/hdfs/hdfs__kstore.jks -destkeystore /srv/hops/super_crypto/hdfs/hdfs__kstore.jks -deststoretype pkcs12”.

@Freeman - Can you try to remove this section from the /etc/hosts and try again?

Dear @Fabio ,
I removed aicore.goldwind.com.cn from the first line in /etc/hosts and tried again, finally I found that this issue was solved.

[appadm@aicore install]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.12.3.189 aicore.goldwind.com.cn

But I have forgotten why I had to modify this section. Would you like to tell me the detailed reasons, if you like.

Thank you very much for your help. Please forgive me for not replying to you in time,because of the time difference.