1.4 hopswoks installation

Hi all,
I installed 1.4 of hopsworks on single node (CentOS 7.6). The installation was ok, but when i started the first job spark, on YARN logs, i have this error:

User:	demo_spark_admin000__meb10000
Name:	SparkPi
Application Type:	Hopsworks-Yarn
Application Tags:	
Application Priority:	0 (Higher Integer value indicates higher priority)
YarnApplicationState:	FAILED
Queue:	default
FinalStatus Reported by AM:	FAILED
Started:	Tue Oct 13 12:15:34 +0000 2020
Launched:	Tue Oct 13 12:15:36 +0000 2020
Finished:	Tue Oct 13 12:15:43 +0000 2020
Elapsed:	8sec
Tracking URL:	History
Log Aggregation Status:	SUCCEEDED
Application Timeout (Remaining Time):	Unlimited

Diagnostics:	Application application_1602591219032_0001 failed 2 times due to AM Container for appattempt_1602591219032_0001_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2020-10-13 12:15:43.583]Exception from container-launch.
Container id: container_e02_1602591219032_0001_02_000001
Exit code: 1
Exception message: Launch container failed
Shell output: main : command provided 4
main : run as user is yarnapp
main : requested yarn user is demo_spark_admin000__meb10000
ca4c598d5a4ea49decd1b686315b51e4f8441b67b037def0cd16a318368c654c
Creating script paths...
Creating local dirs...
Getting exit code file...
Changing effective user to root...
Inspecting docker container...
Docker inspect command: /bin/docker inspect --format {{.State.Pid}} container_e02_1602591219032_0001_02_000001
pid from docker inspect: 30540
Writing pid file...
Writing to tmp file /opt/giotto/hopsdata/tmp/nm-local-dir/nmPrivate/application_1602591219032_0001/container_e02_1602591219032_0001_02_000001/container_e02_1602591219032_0001_02_000001.pid.tmp
Waiting for docker container to finish.
Obtaining the exit code...
Docker inspect command: /bin/docker inspect --format {{.State.ExitCode}} container_e02_1602591219032_0001_02_000001
Exit code from docker inspect: 1
Wrote the exit code 1 to /opt/giotto/hopsdata/tmp/nm-local-dir/nmPrivate/application_1602591219032_0001/container_e02_1602591219032_0001_02_000001/container_e02_1602591219032_0001_02_000001.pid.exitcode
[2020-10-13 12:15:43.612]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Error files: stderr, stderr.txt.
Last 4096 bytes of stderr :
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 7 more
[2020-10-13 12:15:43.613]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Error files: stderr, stderr.txt.
Last 4096 bytes of stderr :
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 7 more
For more detailed output, check the application tracking page: http://resourcemanager.service.consul:8088/cluster/app/application_1602591219032_0001 Then click on links to logs of each attempt.
. Failing the application.

Could you help me?

Thanks a lot,
Antony

Hej Antony,

Could you please share the cluster definition you used to build this vm.

Cheers,
Alex

Hi Alex,
thanks for the reply. Here below the cluster definition:

name: hops14SingleNode
baremetal:
    username: centos

 

cookbooks:
  hopsworks:
    github: logicalclocks/hopsworks-chef
    branch: 1.4

 

attrs:
  install:
    dir: /opt/hops
  hops:
    rmappsecurity:
      actor_class: "org.apache.hadoop.yarn.server.resourcemanager.security.DevHopsworksRMAppSecurityActions"
  alertmanager:
    email:
      to: hops@hops.it
      from: hops@hops.it
      smtp_host: mail.hops.it
  prometheus:
    retention_time: "2h"
  hopsworks:
    featurestore_online: true
    kagent_liveness:
      enabled: true
      threshold: "40s"
  elastic:
    opendistro_security:
      jwt:
        exp_ms: "1800000"
      audit:
        enable_rest: "true"
        enable_transport: "false"
groups:
  namenodes:
    size: 1
    baremetal:
      ip: 10.206.195.42
    recipes:
      - kagent
      - conda
      - ndb::mgmd
      - ndb::ndbd
      - ndb::mysqld
      - hops::ndb
      - hops::rm
      - hops::nn
      - hops::jhs
      - hadoop_spark::yarn
      - hadoop_spark::historyserver
      - flink::yarn
      - flink::historyserver
      - elastic
      - livy
      - kzookeeper
      - kkafka
      - epipe
      - hopsworks
      - hopsmonitor
      - hopslog
      - hopslog::_filebeat-spark
      - hopslog::_filebeat-serving
      - hopslog::_filebeat-beam
      - hopslog::_filebeat-jupyter
      - hops::dn
      - hops::nm
      - tensorflow
      - hive2
      - hops_airflow
      - hops_airflow::sqoop
      - hopsmonitor::prometheus
      - hopsmonitor::alertmanager
      - hopsmonitor::node_exporter
      - consul::master
      - hops::docker_registry

Thanks a lot,
Antony

Hi Anthony,

We think there is a mismatch within the docker image. The issue is caused by the modified default install path. We are currently looking into fixing this.
In the meantime, as a workaround, you can try to install again, but this time use the default install location: attrs -> install -> dir /srv/hops

Regards,
Alex