Hello,
I am constantly getting the following error when i try to force remove a project:
<2021-03-02T21:20:28.726>*** SUCCESS ***Project found in the database *custfeatures* <2021-03-02T21:20:28.797>*** SUCCESS ***Updated team role *custfeatures* <2021-03-02T21:35:29.285>*** SUCCESS ***Killed Yarn jobs *custfeatures* <2021-03-02T21:35:29.302>*** SUCCESS ***Removed Jupyter *custfeatures* <2021-03-02T21:35:29.313>*** SUCCESS ***Logged project removal *custfeatures* <2021-03-02T21:35:33.243>*** SUCCESS ***Changed ownership of dummy inode *custfeatures* <2021-03-02T21:35:33.466>*** SUCCESS ***Removed Kafka topics *custfeatures* <2021-03-02T21:35:37.708>*** SUCCESS ***Removed quotas *custfeatures* <2021-03-02T21:35:37.715>*** SUCCESS ***Fixed shared datasets *custfeatures* <2021-03-02T21:35:42.147>*** SUCCESS ***Removed ElasticSearch *custfeatures* <2021-03-02T21:35:42.267>*** SUCCESS ***Removed HDFS Groups and Users *custfeatures* <2021-03-02T21:35:42.273>*** SUCCESS ***Removed local TensorBoards *custfeatures* <2021-03-02T21:35:42.28>*** SUCCESS ***Removed servings *custfeatures* <2021-03-02T21:35:42.298>*** SUCCESS ***Removed Airflow DAGs and security references *custfeatures* <2021-03-02T21:35:42.395>*** SUCCESS ***Removed all X.509 certificates related to the Project from CertificateMaterializer *custfeatures* <2021-03-02T21:35:42.441>*** SUCCESS ***Removed conda envs *custfeatures* <2021-03-02T21:35:42.449>*** SUCCESS ***Removed dummy Inode *custfeatures* <2021-03-02T21:35:29.268>*** ERROR ***Error when reading YARN apps during project cleanup *custfeatures* <2021-03-02T21:35:29.268>*** ERROR ***Retry interrupted *custfeatures* <2021-03-02T21:35:29.302>*** ERROR ***Error when getting Yarn logs during project cleanup *custfeatures* <2021-03-02T21:35:29.303>*** ERROR ***null *custfeatures* <2021-03-02T21:35:33.238>*** ERROR ***Error when changing ownership of root Project dir during project cleanup *custfeatures* <2021-03-02T21:35:33.238>*** ERROR ***Cannot set owner for /Projects/custfeatures. Name node is in safe mode. Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1008) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:526) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:574) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:868) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:814) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1821) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2900) *custfeatures* <2021-03-02T21:35:37.706>*** ERROR ***Error when removing project-related files during project cleanup *custfeatures* <2021-03-02T21:35:37.706>*** ERROR ***Cannot delete /user/yarn/logs/custfeatures__coreysto. Name node is in safe mode. Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3622) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:748) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:725) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:868) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:814) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1821) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2900) *custfeatures* <2021-03-02T21:35:41.077>*** ERROR ***Error when removing hive db during project cleanup *custfeatures* <2021-03-02T21:35:41.077>*** ERROR ***Cannot delete /tmp/hive/custfeatures__mikemoun. Name node is in safe mode. Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3622) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:748) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:725) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:868) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:814) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1821) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2900) *custfeatures* <2021-03-02T21:35:46.094>*** ERROR ***Error when removing root Project dir during project cleanup *custfeatures* <2021-03-02T21:35:46.095>*** ERROR ***Cannot delete /Projects/custfeatures. Name node is in safe mode. Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3622) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:748) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:725) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:868) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:814) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1821) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2900) *custfeatures*
I also notice that the YARN resoucrcemanager service is in a BAD state and everytime I try to restart it it just shuts down again.
I have tried stopping and restarting all services already.
Any suggestions ?