The hdfs_quota_update table grows unexpectedly

arosc · June 22, 2021, 4:24pm

Hi,
We have a 5 nodes hopsfs cluster with 2 namenodes and 3 datanodes with about 10 pyspark jobs running continuously that writes on hopsfs. Suddenly, the table hdfs_quota_update has began to grow up populating lots of records every seconds. The table growth seems not to be stoppable, so I ask why this table is growing up and if it’s possible to delete it’s content.

This is the software version:

hopsworks: 1.0.0
hopsfs: 2.8.2.8

salman · June 22, 2021, 9:29pm

File system changes, such as data written, deleted, or moved is written to hdfs_quota_update table. The content of this table is consumed by a thread on the Leader Namenode. After processing the content of this table the rows are deleted. I suspect for some reason (bug) this thread is not making any progress. The first course of action would be to restart the Namenode to see if it helps. you can restart namenode from the command line
systemctl restart namenode
or you can do it using the admin panel
Admin → Services → HDFS → namenode → Start/Stop

arosc · June 23, 2021, 7:56am

Hi @salman,
Thank you for your answer and your support. Unfortunally also after namenodes restart the table continues to grow up, I can see many writes every seconds and, I think, no delete. So, it’s possible to clean up this table or disable this mechanism of hdfs quota?

Thank you

salman · June 23, 2021, 8:33am

Yes, it is possible to disable quota system. You will have to modify the hdfs-site.xml file in the /srv/hops/hadoop/etc/hadoop folder. Set the property dfs.namenode.quota.enabled to false

  <property>
    <name>dfs.namenode.quota.enabled</name>
    <value>false</value>
    <description></description>
  </property>

You only need to change this parameter on the machine where namenode is running.

Would it be possible for you to share the logs for the namenode so that we can identify the problem and fix it? The logs I am interested in are in /srv/hops/hadoop/logs folder. I would need logs for the namenode which are named hadoop-hdfs-namenode*

arosc · June 23, 2021, 4:37pm

Hi @salman,
You can download the logs here HopsFSNNlogs

It’s possibile to truncate the table if I set up the flag to false value?

Thank you for your support

salman · June 23, 2021, 6:58pm

Yes, you can truncate the table. Shutdown the namenode before truncating the hdfs_quota_update table.

salman · June 23, 2021, 6:59pm

@arosc Thanks for the logs.

salman · June 23, 2021, 7:15pm

@arosc ok I know why it is failing. This is a configuration issue ([HOPS-1632] Reduce quota manager batch size · hopshadoop/hops@3ab7194 · GitHub)

If you have not truncated the table and still want to use the quota system then you can fix this by adding the following to hdfs-site.xml for all the namenodes

  <property>
    <name>dfs.namenode.quota.update.limit</name>
    <value>5000</value>
    <description></description>
  </property>

arosc · June 24, 2021, 9:05am

Hi @salman,
I changed the configuration according to your suggestion but the table keeps growing. Do you have any other ideas? Otherwise I proceed with the deletion of the content. Thanks so much again

salman · June 24, 2021, 9:26am

Oei! Based on the logs, it was failing because of the largea batch size for quota updates. If reducing the batch size does not help then it could be some other bug.
Proceed with disabling quota. I will try to reproduce the problem.
How big is the hdfs_quota_update table now? Would it be possible to provide me latest logs after you changed the configuration? Thanks for your help

arosc · June 24, 2021, 12:55pm

@salman the table now has 6893268 records Here the logs: logs

Thank you for your support

PS. How about this error? SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .