[FAIL] Create more than 10000 files on Hopsworks

I used t3a.2xlarge (with 64GB storage) to deploy a single hops cluster. However, when I create more than 10000 files in HopsFS. it threw an OutOfMemoryError.


sudo ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op create -threads 1 -files 10000 -filesPerDir 100000 -keepResults -logLevel INFO
20/03/12 15:36:03 INFO namenode.NameNode: createNameNode []
20/03/12 15:36:03 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-namenode.properties,hadoop-metrics2.properties
20/03/12 15:36:03 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
20/03/12 15:36:03 INFO impl.MetricsSystemImpl: NameNode metrics system started
20/03/12 15:36:03 WARN util.NativeCodeLoader: Loaded the native-hadoop library
20/03/12 15:36:03 INFO resolvingcache.Cache: starting Resolving Cache [InMemoryCache]
20/03/12 15:36:03 INFO ndb.ClusterjConnector: Database connect string: 10.0.0.243:1186 
20/03/12 15:36:03 INFO ndb.ClusterjConnector: Database name: hops
20/03/12 15:36:03 INFO ndb.ClusterjConnector: Max Transactions: 1024
Database connect string: 10.0.0.243:1186 
Database name: hops
Max Transactions: 1024
HopsFS created a ClusterJ 7.6.12 sesseion factory.
20/03/12 15:36:05 INFO security.UsersGroups: UsersGroups Initialized.
20/03/12 15:36:05 INFO hdfs.DFSUtil: Starting Web-server for hdfs at: http://ip-10-0-0-243.ec2.internal:50070
20/03/12 15:36:05 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
20/03/12 15:36:05 INFO http.HttpRequestLog: Http request log for http.requests.namenode is not defined
20/03/12 15:36:05 INFO http.HttpServer3: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer3$QuotingInputFilter)
20/03/12 15:36:05 INFO http.HttpServer3: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context hdfs
20/03/12 15:36:05 INFO http.HttpServer3: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
20/03/12 15:36:05 INFO http.HttpServer3: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
20/03/12 15:36:05 INFO http.HttpServer3: Added filter 'org.apache.hadoop.hdfs.web.AuthFilter' (class=org.apache.hadoop.hdfs.web.AuthFilter)
20/03/12 15:36:05 INFO http.HttpServer3: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
20/03/12 15:36:05 INFO http.HttpServer3: Jetty bound to port 50070
20/03/12 15:36:05 INFO mortbay.log: jetty-6.1.26
20/03/12 15:36:05 INFO mortbay.log: Started HttpServer3$SelectChannelConnectorWithSafeStartup@ip-10-0-0-243.ec2.internal:50070
20/03/12 15:36:05 INFO namenode.FSNamesystem: No KeyProvider found.
20/03/12 15:36:05 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
20/03/12 15:36:05 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
20/03/12 15:36:05 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
20/03/12 15:36:05 INFO blockmanagement.BlockManager: The block deletion will start around 2020 Mar 12 15:36:05
20/03/12 15:36:05 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
20/03/12 15:36:05 INFO blockmanagement.BlockManager: defaultReplication         = 3
20/03/12 15:36:05 INFO blockmanagement.BlockManager: maxReplication             = 512
20/03/12 15:36:05 INFO blockmanagement.BlockManager: minReplication             = 1
20/03/12 15:36:05 INFO blockmanagement.BlockManager: maxReplicationStreams      = 50
20/03/12 15:36:05 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
20/03/12 15:36:05 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
20/03/12 15:36:05 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
20/03/12 15:36:05 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
20/03/12 15:36:05 INFO blockmanagement.BlockManager: slicerBatchSize            = 500
20/03/12 15:36:05 INFO blockmanagement.BlockManager: misReplicatedNoOfBatchs    = 20
20/03/12 15:36:05 INFO blockmanagement.BlockManager: slicerNbOfBatchs           = 20
20/03/12 15:36:06 INFO hikari.HikariDataSource: HikariCP pool HikariPool-0 is starting.
20/03/12 15:36:06 WARN common.IDsGeneratorFactory: Called setConfiguration more than once.
20/03/12 15:36:06 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
20/03/12 15:36:06 INFO namenode.FSNamesystem: superGroup          = hdfs
20/03/12 15:36:06 INFO namenode.FSNamesystem: isPermissionEnabled = true
20/03/12 15:36:06 INFO namenode.FSNamesystem: Append Enabled: true
20/03/12 15:36:06 INFO namenode.FSDirectory: Added new root inode
20/03/12 15:36:06 INFO namenode.FSDirectory: ACLs enabled? false
20/03/12 15:36:06 INFO namenode.FSDirectory: XAttrs enabled? true
20/03/12 15:36:06 INFO namenode.FSDirectory: Maximum size of an xattr: 13755
20/03/12 15:36:06 INFO namenode.NameNode: The maximum number of xattrs per inode is set to 32
20/03/12 15:36:06 INFO namenode.NameNode: Caching file names occuring more than 10 times
20/03/12 15:36:06 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
20/03/12 15:36:06 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
20/03/12 15:36:06 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
20/03/12 15:36:06 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
20/03/12 15:36:06 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
20/03/12 15:36:06 INFO namenode.NameCache: initialized with 0 entries 0 lookups
20/03/12 15:36:06 INFO namenode.NameNode: RPC server is binding to ip-10-0-0-243.ec2.internal:8020
20/03/12 15:36:06 INFO ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 12000 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
20/03/12 15:36:06 INFO ipc.Server: Starting Socket Reader #1 for port 8020
20/03/12 15:36:06 INFO ipc.Server: Starting Socket Reader #2 for port 8020
20/03/12 15:36:06 INFO ipc.Server: Starting Socket Reader #3 for port 8020
20/03/12 15:36:06 INFO util.JvmPauseMonitor: Starting JVM pause monitor
20/03/12 15:36:06 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/12 15:36:06 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/12 15:36:06 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/12 15:36:06 INFO namenode.FSNamesystem: Registered FSNamesystemState MBean
20/03/12 15:36:07 WARN namenode.FSNamesystem: cealring the safe blocks tabl, this may take some time.
20/03/12 15:36:07 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
20/03/12 15:36:07 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
20/03/12 15:36:07 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
20/03/12 15:36:07 INFO namenode.LeaseManager: Number of blocks under construction: 0
20/03/12 15:36:07 INFO hdfs.StateChange: STATE* Leaving safe mode after 2 secs
20/03/12 15:36:07 INFO hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes
20/03/12 15:36:07 INFO hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks
20/03/12 15:36:07 WARN namenode.FSNamesystem: cealring the safe blocks tabl, this may take some time.
20/03/12 15:36:08 INFO blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
20/03/12 15:36:08 INFO ipc.Server: IPC Server Responder: starting
20/03/12 15:36:08 INFO ipc.Server: IPC Server listener on 8020: starting
20/03/12 15:36:08 INFO namenode.NameNode: Leader Node RPC up at: ip-10-0-0-243.ec2.internal/10.0.0.243:8020
20/03/12 15:36:08 INFO namenode.FSNamesystem: Starting services required for active state
20/03/12 15:36:08 INFO namenode.FSNamesystem: Catching up to latest edits from old active before taking over writer role in edits logs
20/03/12 15:36:08 INFO blockmanagement.DatanodeManager: Marking all datandoes as stale
20/03/12 15:36:08 INFO namenode.FSNamesystem: Reprocessing replication and invalidation queues
20/03/12 15:36:08 INFO namenode.FSNamesystem: initializing replication queues
20/03/12 15:36:08 INFO blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
20/03/12 15:36:08 INFO blockmanagement.BlockManager: processMisReplicated read  0/10000 in the Ids range [0 - 10000] (max inodeId when the process started: 1)
20/03/12 15:36:08 INFO blockmanagement.BlockManager: Total number of blocks            = 0
20/03/12 15:36:08 INFO blockmanagement.BlockManager: Number of invalid blocks          = 0
20/03/12 15:36:08 INFO blockmanagement.BlockManager: Number of under-replicated blocks = 0
20/03/12 15:36:08 INFO blockmanagement.BlockManager: Number of  over-replicated blocks = 0
20/03/12 15:36:08 INFO blockmanagement.BlockManager: Number of blocks being written    = 0
20/03/12 15:36:08 INFO hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 54 msec
20/03/12 15:36:08 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 60 minutes.
20/03/12 15:36:08 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 60 minutes.
20/03/12 15:36:08 INFO namenode.NNThroughputBenchmark: Starting benchmark: create
20/03/12 15:36:08 INFO hdfs.StateChange: STATE* Safe mode is already OFF
20/03/12 15:36:08 INFO namenode.NNThroughputBenchmark: Generate 10000 intputs for create
20/03/12 15:36:08 FATAL namenode.NNThroughputBenchmark: Log level = INFO
20/03/12 15:36:08 INFO namenode.NNThroughputBenchmark: Starting 10000 create(s).

20/03/12 15:39:31 WARN handler.RequestHandler: START_FILE TX Failed. TX Time: 677 ms, RetryCount: 0, TX Stats -- Setup: 0ms, AcquireLocks: -1ms, InMemoryProcessing: -1ms, CommitTime: -1ms. Locks: INodeLock {paths=[/nnThroughputBenchmark/create/ThroughputBenchDir0/ThroughputBench7015], lockType=WRITE_ON_TARGET_AND_PARENT }. java.lang.OutOfMemoryError: Direct buffer memory
java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:695)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
        at com.mysql.clusterj.tie.FixedByteBufferPoolImpl.borrowBuffer(FixedByteBufferPoolImpl.java:109)
        at com.mysql.clusterj.tie.NdbRecordImpl.newBuffer(NdbRecordImpl.java:306)
        at com.mysql.clusterj.tie.NdbRecordOperationImpl.allocateValueBuffer(NdbRecordOperationImpl.java:336)
        at com.mysql.clusterj.tie.NdbRecordScanOperationImpl.nextResultCopyOut(NdbRecordScanOperationImpl.java:239)
        at com.mysql.clusterj.tie.NdbRecordScanResultDataImpl.next(NdbRecordScanResultDataImpl.java:135)
        at com.mysql.clusterj.core.query.QueryDomainTypeImpl.getResultList(QueryDomainTypeImpl.java:190)
        at com.mysql.clusterj.core.query.QueryImpl.getResultList(QueryImpl.java:153)
        at io.hops.metadata.ndb.wrapper.HopsQuery.getResultList(HopsQuery.java:46)
        at io.hops.metadata.ndb.dalimpl.hdfs.LeasePathClusterj.findByHolderId(LeasePathClusterj.java:117)
        at io.hops.transaction.context.LeasePathContext.findByHolderId(LeasePathContext.java:144)
        at io.hops.transaction.context.LeasePathContext.findList(LeasePathContext.java:88)
        at io.hops.transaction.context.TransactionContext.findList(TransactionContext.java:150)
        at io.hops.transaction.EntityManager.findList(EntityManager.java:93)
        at io.hops.transaction.lock.Lock.acquireLockList(Lock.java:120)
        at io.hops.transaction.lock.LeasePathLock.acquireLeasePaths(LeasePathLock.java:85)
        at io.hops.transaction.lock.LeasePathLock.acquire(LeasePathLock.java:68)
        at io.hops.transaction.lock.HdfsTransactionalLockAcquirer.acquire(HdfsTransactionalLockAcquirer.java:32)
        at io.hops.transaction.handler.TransactionalRequestHandler.execute(TransactionalRequestHandler.java:89)
        at io.hops.transaction.handler.HopsTransactionalRequestHandler.execute(HopsTransactionalRequestHandler.java:50)
        at io.hops.transaction.handler.RequestHandler.handle(RequestHandler.java:68)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2124)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:534)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$CreateFileStats.executeOp(NNThroughputBenchmark.java:633)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.benchmarkOne(NNThroughputBenchmark.java:453)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:436)
Exception in thread "StatsDaemon-0" java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:695)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
        at com.mysql.clusterj.tie.FixedByteBufferPoolImpl.borrowBuffer(FixedByteBufferPoolImpl.java:109)
        at com.mysql.clusterj.tie.NdbRecordImpl.newBuffer(NdbRecordImpl.java:306)
        at com.mysql.clusterj.tie.NdbRecordOperationImpl.allocateValueBuffer(NdbRecordOperationImpl.java:336)
        at com.mysql.clusterj.tie.NdbRecordScanOperationImpl.nextResultCopyOut(NdbRecordScanOperationImpl.java:239)
        at com.mysql.clusterj.tie.NdbRecordScanResultDataImpl.next(NdbRecordScanResultDataImpl.java:135)
        at com.mysql.clusterj.core.query.QueryDomainTypeImpl.getResultList(QueryDomainTypeImpl.java:190)
        at com.mysql.clusterj.core.query.QueryImpl.getResultList(QueryImpl.java:153)
        at io.hops.metadata.ndb.wrapper.HopsQuery.getResultList(HopsQuery.java:46)
        at io.hops.metadata.ndb.dalimpl.hdfs.LeasePathClusterj.findByHolderId(LeasePathClusterj.java:117)
        at io.hops.transaction.context.LeasePathContext.findByHolderId(LeasePathContext.java:144)
        at io.hops.transaction.context.LeasePathContext.findList(LeasePathContext.java:88)
        at io.hops.transaction.context.TransactionContext.findList(TransactionContext.java:150)
        at io.hops.transaction.EntityManager.findList(EntityManager.java:93)
        at io.hops.transaction.lock.Lock.acquireLockList(Lock.java:120)
        at io.hops.transaction.lock.LeasePathLock.acquireLeasePaths(LeasePathLock.java:85)
        at io.hops.transaction.lock.LeasePathLock.acquire(LeasePathLock.java:68)
        at io.hops.transaction.lock.HdfsTransactionalLockAcquirer.acquire(HdfsTransactionalLockAcquirer.java:32)
        at io.hops.transaction.handler.TransactionalRequestHandler.execute(TransactionalRequestHandler.java:89)
        at io.hops.transaction.handler.HopsTransactionalRequestHandler.execute(HopsTransactionalRequestHandler.java:50)
        at io.hops.transaction.handler.RequestHandler.handle(RequestHandler.java:68)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2124)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:534)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$CreateFileStats.executeOp(NNThroughputBenchmark.java:633)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.benchmarkOne(NNThroughputBenchmark.java:453)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:436)
``

This is because the namenode is not configured properly. We will release new AMI soon with a fix.
In the mean time you can fix the issue by changing the parameters manually

edit the file /srv/hops/hadoop/etc/hadoop/hadoop-env.sh
at line 52 increase the DirectMemorySize from 50 to 500. If you are writing lot of files then I would recommend increasing both DirectMemorySize and HADOOP_HEAP (line 48), for example, set both to 1000.

1 Like

After the above changes restart the namenode

systemctl restart namenode

1 Like

I changed both HADOOP_HEAP and DIrectMemorySize to 1000M. It works now.

However, When I

created 100 files, throughout is 65 ops/sec
created 1000 files, throughput is 100 ops/sec
created 10000 files, throughput is 26 ops/sec
created 100000 files, detected pause in JVM or host machine (eg GC).

Just curious, is there any data structures or system componments in Hopsfs occupied the memory?
If I set the heap size to 1 GB, HDFS can easily support 1 million files.

HDFS and HopsFS have very different architectures. HDFS supports very fast metadata operations as all the metadata is stored in the memory of the metadata server, the namenode. However, HDFS is not scalable and it only supports a single metadata server. In HopsFS the metadata is stored in an external database, which makes the file system scalable but at the same time, the file system operations are bit slow (compared to HDFS), because the file system operations have to fetch the metadata from the external database and store the metadata back in the database.

NNThroughputBenchMark:

In NNThroughptBenchmark (NNTBM) for HDFS, the benchmark clients (-threads) and the namenode are in the same process. That is why when you run NNTBM with one thread you get very good performance for HDFS. However, with multiple threads, the performance does not increase linearly because of bottlenecks in HDFS namenode architecture. It is hard to get more than 5 - 10 thousand ops/sec using NNTBM using 8 core machine as in your setup

For HopsFS, in NNTBM, the clients and the namenode are in the same process as in HDFS, but the namenode communicates with and external database over the network for each operation. So per-operation latency is higher than HDFS. For greater throughput you need to run the benchmark with multiple clients, for example, -threads 20. With more threads, the performance will increase. I ran the benchmark with 30 and 100 threads and I got around 600 - 900 ops/sec. However, the performance is not very impressive for NNTBM because of the suboptimal setup. Basically, the machine is running a lot of processes, that is, namenode, NDB database, hopsworks, hive, … For better results, it is recommended to run the database on a separate node(s) and bind the database threads to CPU cores.

If both HDFS and HopsFS are optimally set up, then HDFS will run around 80 - 100 thousand ops/sec and HopsFS will run around 20-25 thousand ops/sec per namenode. The throughput of HopsfS will scale when more namenodes are added to the equation. However, the performance of HDFS will not increase because of the limitation of HDFS namenode architecture.

Now the question of why we need to have 1000M XMX and 1000M DirectBufferMemroy.
If you want to perform large file system operation (deleting millions of files) then, in that case, a lot of metadata is read from the database. You need large XMX to fit all the metadata in the memory while the operation is being performed.

Try again with more threads. I managed to write 200,000 files using 100 threads without GC issues. Try again and let me know

Tip: use -keepResults param, otherwise for large number of files the benchmark might fail, when the benchmark tries to delete these files. This happens because of fewer resources allocation to the database. you can call ‘hdfs namenode -format’ to delete the junk files created by the benchmark. If you do not remove these files then these files will be overwritten in the next run. Some times operations fail because of transaction deadlock in the database. In normal setups, these failures are handled by the clients and the operations are retried. If you see failed operation in the benchmark then it is because the NNTBM does not use DFSClients that are responsible for retrying operations that failed due to transient exceptions.

./hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark  -op create -threads 100 -files 100000 -filesPerDir 1000 -logLevel INFO
...
INFO namenode.NNThroughputBenchmark:  Ops per sec: 900.000349000
...

hdfs namenode -format

./hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark  -op create -threads 30 -files 100000 -filesPerDir 1000 -logLevel INFO 
....
INFO namenode.NNThroughputBenchmark:  Ops per sec: 630.00063000063

.

1 Like

FOR 630 OPS/SEC, did you run the database on the separate node?

I also ran the commands

./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op creat
e -threads 1 -files 100000 -filesPerDir 10000000 -keepResults -logLevel INFO

It failed.

20/03/13 04:07:47 FATAL namenode.NNThroughputBenchmark: Log level = INFO
20/03/13 04:07:47 INFO namenode.NNThroughputBenchmark: Starting 100000 create(s).
20/03/13 04:11:41 INFO util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1360ms
No GCs detected
20/03/13 05:11:09 ERROR leaderElection.LeaderElection: LE Status: id 1 LeaderElection: Update Tx took very long time to update: 1949, time_perid is 1000
20/03/13 07:25:58 WARN handler.RequestHandler: START_FILE TX Failed. TX Time: 571 ms, RetryCount: 0, TX Stats -- Setup: 0ms, AcquireLocks: 568ms, InMemoryProcessing: 1ms, CommitTime: -1ms. Locks: INodeLock {paths=[/nnThroughputBenchmark/create/ThroughputBenchDir0/ThroughputBench59101], lockType=WRITE_ON_TARGET_AND_PARENT }. io.hops.exception.LockUpgradeException: Trying to update inode id=2 acquired lock was READ_COMMITTED
io.hops.exception.LockUpgradeException: Trying to update inode id=2 acquired lock was READ_COMMITTED
        at io.hops.transaction.context.INodeContext.prepare(INodeContext.java:160)
        at io.hops.transaction.context.TransactionContext.commit(TransactionContext.java:79)
        at io.hops.transaction.EntityManager.commit(EntityManager.java:71)
        at io.hops.transaction.handler.TransactionalRequestHandler.execute(TransactionalRequestHandler.java:120)
        at io.hops.transaction.handler.HopsTransactionalRequestHandler.execute(HopsTransactionalRequestHandler.java:50)
        at io.hops.transaction.handler.RequestHandler.handle(RequestHandler.java:68)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2124)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:534)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$CreateFileStats.executeOp(NNThroughputBenchmark.java:633)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.benchmarkOne(NNThroughputBenchmark.java:453)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:436)
20/03/13 07:25:58 ERROR namenode.NNThroughputBenchmark: StatsDaemon 0 failed: 
io.hops.exception.LockUpgradeException: Trying to update inode id=2 acquired lock was READ_COMMITTED
        at io.hops.transaction.context.INodeContext.prepare(INodeContext.java:160)
        at io.hops.transaction.context.TransactionContext.commit(TransactionContext.java:79)
        at io.hops.transaction.EntityManager.commit(EntityManager.java:71)
        at io.hops.transaction.handler.TransactionalRequestHandler.execute(TransactionalRequestHandler.java:120)
        at io.hops.transaction.handler.HopsTransactionalRequestHandler.execute(HopsTransactionalRequestHandler.java:50)
        at io.hops.transaction.handler.RequestHandler.handle(RequestHandler.java:68)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2124)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:534)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$CreateFileStats.executeOp(NNThroughputBenchmark.java:633)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.benchmarkOne(NNThroughputBenchmark.java:453)
        at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:436)

20/03/13 07:26:03 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:26:03 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:26:05 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:26:05 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:26:05 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:26:05 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 1000 new TP: 1100
20/03/13 07:26:20 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:26:20 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:26:22 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:26:22 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:26:22 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:26:22 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 1100 new TP: 1200
20/03/13 07:26:35 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:26:35 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:26:37 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:26:37 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:26:37 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:26:37 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 1200 new TP: 1300
20/03/13 07:26:52 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:26:52 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:26:54 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:26:54 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:26:54 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:26:54 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 1300 new TP: 1400
20/03/13 07:27:08 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:27:08 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:27:11 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:27:11 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:27:11 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:27:11 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 1400 new TP: 1500
20/03/13 07:27:23 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:27:23 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:27:26 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:27:26 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:27:26 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:27:26 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 1500 new TP: 1600
20/03/13 07:27:41 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:27:41 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:27:44 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:27:44 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:27:44 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:27:44 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 1600 new TP: 1700
20/03/13 07:27:56 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:27:56 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:27:59 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:27:59 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:27:59 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:27:59 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 1700 new TP: 1800
20/03/13 07:28:12 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:28:12 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:28:15 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:28:15 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:28:15 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:28:15 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 1800 new TP: 1900
20/03/13 07:28:29 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:28:29 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:28:32 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:28:32 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:28:32 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:28:32 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 1900 new TP: 2000
20/03/13 07:28:44 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:28:44 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:28:48 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:28:48 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:28:48 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:28:48 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 2000 new TP: 2100
20/03/13 07:29:01 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:29:01 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:29:05 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:29:05 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:29:05 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:29:05 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 2100 new TP: 2200
20/03/13 07:29:16 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:29:16 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:29:21 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:29:21 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:29:21 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:29:21 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 2200 new TP: 2300
20/03/13 07:29:32 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1
20/03/13 07:29:32 INFO leaderElection.LETransaction: LE Status: id 1 I am a NON_LEADER process 
20/03/13 07:29:37 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/13 07:29:37 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/13 07:29:37 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/13 07:29:37 WARN leaderElection.LETransaction: LE Status: id 1 I am LEADER and I am updating the time period. Old Tp: 2300 new TP: 2400
20/03/13 07:29:49 WARN leaderElection.LETransaction: LE Status: id 1 I was kicked out. Old Id was 1

My question: When I set thread = 1 and file = 100000, why it failed? looks there have some leaderElection problem. When I increase the number of threads, why it can work?

./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op creat
e -threads 8 -files 100000 -filesPerDir 10000000 -keepResults -logLevel INFO

Ops/sec: 22.74

./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op creat
e -threads 16 -files 100000 -filesPerDir 10000000 -keepResults -logLevel INFO

Ops/sec: 42.88

./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op creat
e -threads 32 -files 100000 -filesPerDir 10000000 -keepResults -logLevel INFO

Ops/sec: 73.97

./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op creat
e -threads 64 -files 100000 -filesPerDir 10000000 -keepResults -logLevel INFO

Ops/sec: 114.08

thread=64, files = 100000, Outputs:

20/03/14 00:48:51 INFO ndb.ClusterjConnector: Database connect string: 10.0.0.243:1186 
20/03/14 00:48:51 INFO ndb.ClusterjConnector: Database name: hops
20/03/14 00:48:51 INFO ndb.ClusterjConnector: Max Transactions: 1024
Database connect string: 10.0.0.243:1186 
Database name: hops
Max Transactions: 1024
HopsFS created a ClusterJ 7.6.12 sesseion factory.
20/03/14 00:48:52 INFO security.UsersGroups: UsersGroups Initialized.
20/03/14 00:48:52 INFO hdfs.DFSUtil: Starting Web-server for hdfs at: http://ip-10-0-0-243.ec2.internal:50070
20/03/14 00:48:52 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
20/03/14 00:48:52 INFO http.HttpRequestLog: Http request log for http.requests.namenode is not defined
20/03/14 00:48:52 INFO http.HttpServer3: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer3$QuotingInputFilter)
20/03/14 00:48:52 INFO http.HttpServer3: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context hdfs
20/03/14 00:48:52 INFO http.HttpServer3: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
20/03/14 00:48:52 INFO http.HttpServer3: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
20/03/14 00:48:52 INFO http.HttpServer3: Added filter 'org.apache.hadoop.hdfs.web.AuthFilter' (class=org.apache.hadoop.hdfs.web.AuthFilter)
20/03/14 00:48:52 INFO http.HttpServer3: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
20/03/14 00:48:52 INFO http.HttpServer3: Jetty bound to port 50070
20/03/14 00:48:52 INFO mortbay.log: jetty-6.1.26
20/03/14 00:48:53 INFO mortbay.log: Started HttpServer3$SelectChannelConnectorWithSafeStartup@ip-10-0-0-243.ec2.internal:50070
20/03/14 00:48:53 INFO namenode.FSNamesystem: No KeyProvider found.
20/03/14 00:48:53 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
20/03/14 00:48:53 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
20/03/14 00:48:53 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
20/03/14 00:48:53 INFO blockmanagement.BlockManager: The block deletion will start around 2020 Mar 14 00:48:53
20/03/14 00:48:53 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
20/03/14 00:48:53 INFO blockmanagement.BlockManager: defaultReplication         = 3
20/03/14 00:48:53 INFO blockmanagement.BlockManager: maxReplication             = 512
20/03/14 00:48:53 INFO blockmanagement.BlockManager: minReplication             = 1
20/03/14 00:48:53 INFO blockmanagement.BlockManager: maxReplicationStreams      = 50
20/03/14 00:48:53 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
20/03/14 00:48:53 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
20/03/14 00:48:53 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
20/03/14 00:48:53 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
20/03/14 00:48:53 INFO blockmanagement.BlockManager: slicerBatchSize            = 500
20/03/14 00:48:53 INFO blockmanagement.BlockManager: misReplicatedNoOfBatchs    = 20
20/03/14 00:48:53 INFO blockmanagement.BlockManager: slicerNbOfBatchs           = 20
20/03/14 00:48:53 INFO hikari.HikariDataSource: HikariCP pool HikariPool-0 is starting.
20/03/14 00:48:53 WARN common.IDsGeneratorFactory: Called setConfiguration more than once.
20/03/14 00:48:53 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
20/03/14 00:48:53 INFO namenode.FSNamesystem: superGroup          = hdfs
20/03/14 00:48:53 INFO namenode.FSNamesystem: isPermissionEnabled = true
20/03/14 00:48:53 INFO namenode.FSNamesystem: Append Enabled: true
20/03/14 00:48:54 INFO namenode.FSDirectory: Added new root inode
20/03/14 00:48:54 INFO namenode.FSDirectory: ACLs enabled? false
20/03/14 00:48:54 INFO namenode.FSDirectory: XAttrs enabled? true
20/03/14 00:48:54 INFO namenode.FSDirectory: Maximum size of an xattr: 13755
20/03/14 00:48:54 INFO namenode.NameNode: The maximum number of xattrs per inode is set to 32
20/03/14 00:48:54 INFO namenode.NameNode: Caching file names occuring more than 10 times
20/03/14 00:48:54 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
20/03/14 00:48:54 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
20/03/14 00:48:54 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
20/03/14 00:48:54 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
20/03/14 00:48:54 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
20/03/14 00:48:54 INFO namenode.NameCache: initialized with 0 entries 0 lookups
20/03/14 00:48:54 INFO namenode.NameNode: RPC server is binding to ip-10-0-0-243.ec2.internal:8020
20/03/14 00:48:54 INFO ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 12000 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
20/03/14 00:48:54 INFO ipc.Server: Starting Socket Reader #1 for port 8020
20/03/14 00:48:54 INFO ipc.Server: Starting Socket Reader #2 for port 8020
20/03/14 00:48:54 INFO ipc.Server: Starting Socket Reader #3 for port 8020
20/03/14 00:48:54 INFO util.JvmPauseMonitor: Starting JVM pause monitor
20/03/14 00:48:54 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/14 00:48:54 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/14 00:48:54 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER. 
20/03/14 00:48:54 INFO namenode.FSNamesystem: Registered FSNamesystemState MBean
20/03/14 00:48:55 WARN namenode.FSNamesystem: cealring the safe blocks tabl, this may take some time.
20/03/14 00:48:55 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
20/03/14 00:48:55 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
20/03/14 00:48:55 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
20/03/14 00:48:55 INFO namenode.LeaseManager: Number of blocks under construction: 0
20/03/14 00:48:55 INFO hdfs.StateChange: STATE* Leaving safe mode after 2 secs
20/03/14 00:48:55 INFO hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes
20/03/14 00:48:55 INFO hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks
20/03/14 00:48:55 WARN namenode.FSNamesystem: cealring the safe blocks tabl, this may take some time.
20/03/14 00:48:55 INFO blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
20/03/14 00:48:55 INFO ipc.Server: IPC Server Responder: starting
20/03/14 00:48:55 INFO ipc.Server: IPC Server listener on 8020: starting
20/03/14 00:48:55 INFO namenode.NameNode: Leader Node RPC up at: ip-10-0-0-243.ec2.internal/10.0.0.243:8020
20/03/14 00:48:55 INFO namenode.FSNamesystem: Starting services required for active state
20/03/14 00:48:55 INFO namenode.FSNamesystem: Catching up to latest edits from old active before taking over writer role in edits logs
20/03/14 00:48:55 INFO blockmanagement.DatanodeManager: Marking all datandoes as stale
20/03/14 00:48:55 INFO namenode.FSNamesystem: Reprocessing replication and invalidation queues
20/03/14 00:48:55 INFO namenode.FSNamesystem: initializing replication queues
20/03/14 00:48:55 INFO blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
20/03/14 00:48:55 INFO blockmanagement.BlockManager: processMisReplicated read  0/10000 in the Ids range [0 - 10000] (max inodeId when the process started: 1)
20/03/14 00:48:55 INFO blockmanagement.BlockManager: Total number of blocks            = 0
20/03/14 00:48:55 INFO blockmanagement.BlockManager: Number of invalid blocks          = 0
20/03/14 00:48:55 INFO blockmanagement.BlockManager: Number of under-replicated blocks = 0
20/03/14 00:48:55 INFO blockmanagement.BlockManager: Number of  over-replicated blocks = 0
20/03/14 00:48:55 INFO blockmanagement.BlockManager: Number of blocks being written    = 0
20/03/14 00:48:55 INFO hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 33 msec
20/03/14 00:48:56 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 60 minutes.
20/03/14 00:48:56 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 60 minutes.
20/03/14 00:48:56 INFO namenode.NNThroughputBenchmark: Starting benchmark: create
20/03/14 00:48:56 INFO hdfs.StateChange: STATE* Safe mode is already OFF
20/03/14 00:48:56 INFO namenode.NNThroughputBenchmark: Generate 100000 intputs for create
20/03/14 00:48:56 FATAL namenode.NNThroughputBenchmark: Log level = INFO
20/03/14 00:48:56 INFO namenode.NNThroughputBenchmark: Starting 100000 create(s).
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 274, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
20/03/14 00:48:58 WARN handler.RequestHandler: APPLY_QUOTA_UPDATE TX Failed. TX Time: 1652 ms, RetryCount: 0, TX Stats -- Setup: 0ms, AcquireLocks: -1ms, InMemoryProcessing: -1ms, CommitTime: -1ms. Locks: Individual InodeLock = { ID: 1 Name: null Lock: WRITE }. io.hops.exception.TransientStorageException: com.mysql.clusterj.ClusterJDatastoreException: Error in NdbJTie: returnCode -1, code 274, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
io.hops.exception.TransientStorageException: com.mysql.clusterj.ClusterJDatastoreException: Error in NdbJTie: returnCode -1, code 274, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
        at io.hops.metadata.ndb.wrapper.HopsExceptionHelper.wrap(HopsExceptionHelper.java:31)
        at io.hops.metadata.ndb.wrapper.HopsQuery.getResultList(HopsQuery.java:48)
        at io.hops.metadata.ndb.dalimpl.hdfs.INodeClusterj.findInodeByIdFTIS(INodeClusterj.java:251)
        at io.hops.metadata.ndb.dalimpl.hdfs.INodeClusterj.findInodeByIdFTIS(INodeClusterj.java:51)
        at io.hops.metadata.adaptor.INodeDALAdaptor.findInodeByIdFTIS(INodeDALAdaptor.java:58)
        at io.hops.metadata.adaptor.INodeDALAdaptor.findInodeByIdFTIS(INodeDALAdaptor.java:45)
        at io.hops.transaction.context.INodeContext.findByInodeIdFTIS(INodeContext.java:227)
        at io.hops.transaction.context.INodeContext.find(INodeContext.java:78)
        at io.hops.transaction.context.INodeContext.find(INodeContext.java:48)
        at io.hops.transaction.context.TransactionContext.find(TransactionContext.java:139)
        at io.hops.transaction.EntityManager.find(EntityManager.java:98)
        at io.hops.transaction.lock.BaseINodeLock.find(BaseINodeLock.java:250)
        at io.hops.transaction.lock.IndividualINodeLock.acquire(IndividualINodeLock.java:62)
        at io.hops.transaction.lock.HdfsTransactionalLockAcquirer.acquire(HdfsTransactionalLockAcquirer.java:32)
        at io.hops.transaction.handler.TransactionalRequestHandler.execute(TransactionalRequestHandler.java:89)
        at io.hops.transaction.handler.HopsTransactionalRequestHandler.execute(HopsTransactionalRequestHandler.java:50)
        at io.hops.transaction.handler.RequestHandler.handle(RequestHandler.java:68)
        at org.apache.hadoop.hdfs.server.namenode.QuotaUpdateManager.applyBatchedUpdate(QuotaUpdateManager.java:301)
        at org.apache.hadoop.hdfs.server.namenode.QuotaUpdateManager.processNextUpdateBatch(QuotaUpdateManager.java:199)
        at org.apache.hadoop.hdfs.server.namenode.QuotaUpdateManager.access$400(QuotaUpdateManager.java:51)
        at org.apache.hadoop.hdfs.server.namenode.QuotaUpdateManager$QuotaUpdateMonitor.run(QuotaUpdateManager.java:125)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.mysql.clusterj.ClusterJDatastoreException: Error in NdbJTie: returnCode -1, code 274, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
        at com.mysql.clusterj.tie.Utility.throwError(Utility.java:1340)
        at com.mysql.clusterj.tie.Utility.throwError(Utility.java:1326)
        at com.mysql.clusterj.tie.NdbRecordScanResultDataImpl.next(NdbRecordScanResultDataImpl.java:167)
        at com.mysql.clusterj.core.query.QueryDomainTypeImpl.getResultList(QueryDomainTypeImpl.java:190)
        at com.mysql.clusterj.core.query.QueryImpl.getResultList(QueryImpl.java:153)
        at io.hops.metadata.ndb.wrapper.HopsQuery.getResultList(HopsQuery.java:46)
        ... 20 more
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
Mar 14, 2020 12:48:58 AM com.mysql.clusterj.tie.Utility throwError
SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, status 1, classification 10, message Time-out in NDB, probably caused by deadlock .
20/03/14 01:03:32 INFO hdfs.StateChange: STATE* Safe mode is already OFF
20/03/14 01:03:32 INFO namenode.NNThroughputBenchmark: 
20/03/14 01:03:32 INFO namenode.NNThroughputBenchmark: --- create inputs ---
20/03/14 01:03:32 INFO namenode.NNThroughputBenchmark: nrFiles = 100000
20/03/14 01:03:32 INFO namenode.NNThroughputBenchmark: nrThreads = 64
20/03/14 01:03:32 INFO namenode.NNThroughputBenchmark: nrFilesPerDir = 10000000
20/03/14 01:03:32 INFO namenode.NNThroughputBenchmark: --- create stats  ---
20/03/14 01:03:32 INFO namenode.NNThroughputBenchmark: # operations: 100000
20/03/14 01:03:32 INFO namenode.NNThroughputBenchmark: Elapsed Time: 876529
20/03/14 01:03:32 INFO namenode.NNThroughputBenchmark:  Ops per sec: 114.08635652670932
20/03/14 01:03:32 INFO namenode.NNThroughputBenchmark: Average Time: 558
20/03/14 01:03:32 INFO namenode.FSNamesystem: Stopping services started for active state
20/03/14 01:03:32 INFO erasure_coding.ErasureCodingManager: ErasureCodingMonitor stopped
20/03/14 01:03:32 ERROR namenode.FSNamesystem: Exception in RetryCacheCleaner: 
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$RetryCacheCleaner.run(FSNamesystem.java:9078)
        at java.lang.Thread.run(Thread.java:748)
20/03/14 01:03:32 WARN leaderElection.LeaderElection: LE Status: id 1 got Interrupted java.lang.InterruptedException: sleep interrupted
20/03/14 01:03:32 INFO ipc.Server: Stopping server on 8020
20/03/14 01:03:32 INFO ipc.Server: Stopping IPC Server listener on 8020
20/03/14 01:03:32 INFO blockmanagement.BlockManager: Stopping ReplicationMonitor.
20/03/14 01:03:32 INFO ipc.Server: Stopping IPC Server Responder
20/03/14 01:03:33 INFO namenode.FSNamesystem: Stopping services started for active state
20/03/14 01:03:33 INFO erasure_coding.ErasureCodingManager: ErasureCodingMonitor stopped
20/03/14 01:03:33 WARN namenode.MDCleaner: java.lang.InterruptedException: sleep interrupted
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.server.namenode.MDCleaner$Monitor.run(MDCleaner.java:72)
        at java.lang.Thread.run(Thread.java:748)
20/03/14 01:03:33 INFO mortbay.log: Stopped HttpServer3$SelectChannelConnectorWithSafeStartup@ip-10-0-0-243.ec2.internal:50070
20/03/14 01:03:33 INFO impl.MetricsSystemImpl: Stopping NameNode metrics system...
20/03/14 01:03:33 INFO impl.MetricsSystemImpl: NameNode metrics system stopped.
20/03/14 01:03:33 INFO impl.MetricsSystemImpl: NameNode metrics system shutdown complete.

FOR 630 OPS/SEC, did you run the database on the separate node?

No everything was running on one node.

Deadlock Exceptions

At any given time only one client can update an inode. For example, in this case, one thread is trying to update the quota values for the directory and the other thread is trying to create new files in the same directory. As both operations are trying to update the same inode, these operations are serialized. Some of these operations might timeout and throw deadlock exceptions. HopsFS clients (DFSClient.java) are responsible for catching these exceptions and then retry them. However, in the NNThroughputBench mark, these clients are not used. NNThroughputputBenchmark is currently broken for HopsFS as it does not use HopsFS clients, that is, it is calling file system internal APIs in ways that we do not support.

I can look into NNThroughputBenchmark and change it to use HopsFS clients. Due to current workload this might take couple of weeks

Regarding GC
Can you make sure that the XMX is properly applied to the process

ps -ax | grep java

Yes. I am sure

20357 ?        Sl    17:56 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx1000m -XX:MaxDirectMemorySize=1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/srv/hops/hadoop-2.8.2.9/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/srv/hops/hadoop-2.8.2.9 -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,console -Djava.library.path=/srv/hops/hadoop-2.8.2.9/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op delete -threads 64 -files 100000 -filesPerDir 10000000 -keepResults -logLevel INFO

That because you change HDFS global lock to fine-grained locks. HDFS community has lots of similar idea but it’s very hard to make it stable.

Yes, please. However, even you can retry the client’s requests after the deadlock exception, I don’t think the overall throughput will change.

Just to make sure, when I specify 1 thread and one namenode, why do you have to do leader election?
To best of my knowledge, leader election only happen when we have multiple namenodes, you want to use paxos or ZKeeper (leader/ slave mode) to keep metadata consistency.