I used t3a.2xlarge (with 64GB storage) to deploy a single hops cluster. However, when I create more than 10000 files in HopsFS. it threw an OutOfMemoryError.
sudo ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op create -threads 1 -files 10000 -filesPerDir 100000 -keepResults -logLevel INFO
20/03/12 15:36:03 INFO namenode.NameNode: createNameNode []
20/03/12 15:36:03 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-namenode.properties,hadoop-metrics2.properties
20/03/12 15:36:03 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
20/03/12 15:36:03 INFO impl.MetricsSystemImpl: NameNode metrics system started
20/03/12 15:36:03 WARN util.NativeCodeLoader: Loaded the native-hadoop library
20/03/12 15:36:03 INFO resolvingcache.Cache: starting Resolving Cache [InMemoryCache]
20/03/12 15:36:03 INFO ndb.ClusterjConnector: Database connect string: 10.0.0.243:1186
20/03/12 15:36:03 INFO ndb.ClusterjConnector: Database name: hops
20/03/12 15:36:03 INFO ndb.ClusterjConnector: Max Transactions: 1024
Database connect string: 10.0.0.243:1186
Database name: hops
Max Transactions: 1024
HopsFS created a ClusterJ 7.6.12 sesseion factory.
20/03/12 15:36:05 INFO security.UsersGroups: UsersGroups Initialized.
20/03/12 15:36:05 INFO hdfs.DFSUtil: Starting Web-server for hdfs at: http://ip-10-0-0-243.ec2.internal:50070
20/03/12 15:36:05 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
20/03/12 15:36:05 INFO http.HttpRequestLog: Http request log for http.requests.namenode is not defined
20/03/12 15:36:05 INFO http.HttpServer3: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer3$QuotingInputFilter)
20/03/12 15:36:05 INFO http.HttpServer3: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context hdfs
20/03/12 15:36:05 INFO http.HttpServer3: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
20/03/12 15:36:05 INFO http.HttpServer3: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
20/03/12 15:36:05 INFO http.HttpServer3: Added filter 'org.apache.hadoop.hdfs.web.AuthFilter' (class=org.apache.hadoop.hdfs.web.AuthFilter)
20/03/12 15:36:05 INFO http.HttpServer3: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
20/03/12 15:36:05 INFO http.HttpServer3: Jetty bound to port 50070
20/03/12 15:36:05 INFO mortbay.log: jetty-6.1.26
20/03/12 15:36:05 INFO mortbay.log: Started HttpServer3$SelectChannelConnectorWithSafeStartup@ip-10-0-0-243.ec2.internal:50070
20/03/12 15:36:05 INFO namenode.FSNamesystem: No KeyProvider found.
20/03/12 15:36:05 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
20/03/12 15:36:05 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
20/03/12 15:36:05 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
20/03/12 15:36:05 INFO blockmanagement.BlockManager: The block deletion will start around 2020 Mar 12 15:36:05
20/03/12 15:36:05 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
20/03/12 15:36:05 INFO blockmanagement.BlockManager: defaultReplication = 3
20/03/12 15:36:05 INFO blockmanagement.BlockManager: maxReplication = 512
20/03/12 15:36:05 INFO blockmanagement.BlockManager: minReplication = 1
20/03/12 15:36:05 INFO blockmanagement.BlockManager: maxReplicationStreams = 50
20/03/12 15:36:05 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
20/03/12 15:36:05 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
20/03/12 15:36:05 INFO blockmanagement.BlockManager: encryptDataTransfer = false
20/03/12 15:36:05 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
20/03/12 15:36:05 INFO blockmanagement.BlockManager: slicerBatchSize = 500
20/03/12 15:36:05 INFO blockmanagement.BlockManager: misReplicatedNoOfBatchs = 20
20/03/12 15:36:05 INFO blockmanagement.BlockManager: slicerNbOfBatchs = 20
20/03/12 15:36:06 INFO hikari.HikariDataSource: HikariCP pool HikariPool-0 is starting.
20/03/12 15:36:06 WARN common.IDsGeneratorFactory: Called setConfiguration more than once.
20/03/12 15:36:06 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)
20/03/12 15:36:06 INFO namenode.FSNamesystem: superGroup = hdfs
20/03/12 15:36:06 INFO namenode.FSNamesystem: isPermissionEnabled = true
20/03/12 15:36:06 INFO namenode.FSNamesystem: Append Enabled: true
20/03/12 15:36:06 INFO namenode.FSDirectory: Added new root inode
20/03/12 15:36:06 INFO namenode.FSDirectory: ACLs enabled? false
20/03/12 15:36:06 INFO namenode.FSDirectory: XAttrs enabled? true
20/03/12 15:36:06 INFO namenode.FSDirectory: Maximum size of an xattr: 13755
20/03/12 15:36:06 INFO namenode.NameNode: The maximum number of xattrs per inode is set to 32
20/03/12 15:36:06 INFO namenode.NameNode: Caching file names occuring more than 10 times
20/03/12 15:36:06 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
20/03/12 15:36:06 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
20/03/12 15:36:06 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
20/03/12 15:36:06 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
20/03/12 15:36:06 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
20/03/12 15:36:06 INFO namenode.NameCache: initialized with 0 entries 0 lookups
20/03/12 15:36:06 INFO namenode.NameNode: RPC server is binding to ip-10-0-0-243.ec2.internal:8020
20/03/12 15:36:06 INFO ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 12000 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
20/03/12 15:36:06 INFO ipc.Server: Starting Socket Reader #1 for port 8020
20/03/12 15:36:06 INFO ipc.Server: Starting Socket Reader #2 for port 8020
20/03/12 15:36:06 INFO ipc.Server: Starting Socket Reader #3 for port 8020
20/03/12 15:36:06 INFO util.JvmPauseMonitor: Starting JVM pause monitor
20/03/12 15:36:06 INFO leaderElection.LETransaction: LE Status: id 1 I can be the leader but I have weak locks. Retry with stronger lock
20/03/12 15:36:06 INFO leaderElection.LETransaction: LE Status: id 1 periodic update. Stronger locks requested in next round
20/03/12 15:36:06 INFO leaderElection.LETransaction: LE Status: id 1 I am the new LEADER.
20/03/12 15:36:06 INFO namenode.FSNamesystem: Registered FSNamesystemState MBean
20/03/12 15:36:07 WARN namenode.FSNamesystem: cealring the safe blocks tabl, this may take some time.
20/03/12 15:36:07 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
20/03/12 15:36:07 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
20/03/12 15:36:07 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
20/03/12 15:36:07 INFO namenode.LeaseManager: Number of blocks under construction: 0
20/03/12 15:36:07 INFO hdfs.StateChange: STATE* Leaving safe mode after 2 secs
20/03/12 15:36:07 INFO hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes
20/03/12 15:36:07 INFO hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks
20/03/12 15:36:07 WARN namenode.FSNamesystem: cealring the safe blocks tabl, this may take some time.
20/03/12 15:36:08 INFO blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
20/03/12 15:36:08 INFO ipc.Server: IPC Server Responder: starting
20/03/12 15:36:08 INFO ipc.Server: IPC Server listener on 8020: starting
20/03/12 15:36:08 INFO namenode.NameNode: Leader Node RPC up at: ip-10-0-0-243.ec2.internal/10.0.0.243:8020
20/03/12 15:36:08 INFO namenode.FSNamesystem: Starting services required for active state
20/03/12 15:36:08 INFO namenode.FSNamesystem: Catching up to latest edits from old active before taking over writer role in edits logs
20/03/12 15:36:08 INFO blockmanagement.DatanodeManager: Marking all datandoes as stale
20/03/12 15:36:08 INFO namenode.FSNamesystem: Reprocessing replication and invalidation queues
20/03/12 15:36:08 INFO namenode.FSNamesystem: initializing replication queues
20/03/12 15:36:08 INFO blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
20/03/12 15:36:08 INFO blockmanagement.BlockManager: processMisReplicated read 0/10000 in the Ids range [0 - 10000] (max inodeId when the process started: 1)
20/03/12 15:36:08 INFO blockmanagement.BlockManager: Total number of blocks = 0
20/03/12 15:36:08 INFO blockmanagement.BlockManager: Number of invalid blocks = 0
20/03/12 15:36:08 INFO blockmanagement.BlockManager: Number of under-replicated blocks = 0
20/03/12 15:36:08 INFO blockmanagement.BlockManager: Number of over-replicated blocks = 0
20/03/12 15:36:08 INFO blockmanagement.BlockManager: Number of blocks being written = 0
20/03/12 15:36:08 INFO hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 54 msec
20/03/12 15:36:08 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 60 minutes.
20/03/12 15:36:08 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 60 minutes.
20/03/12 15:36:08 INFO namenode.NNThroughputBenchmark: Starting benchmark: create
20/03/12 15:36:08 INFO hdfs.StateChange: STATE* Safe mode is already OFF
20/03/12 15:36:08 INFO namenode.NNThroughputBenchmark: Generate 10000 intputs for create
20/03/12 15:36:08 FATAL namenode.NNThroughputBenchmark: Log level = INFO
20/03/12 15:36:08 INFO namenode.NNThroughputBenchmark: Starting 10000 create(s).
20/03/12 15:39:31 WARN handler.RequestHandler: START_FILE TX Failed. TX Time: 677 ms, RetryCount: 0, TX Stats -- Setup: 0ms, AcquireLocks: -1ms, InMemoryProcessing: -1ms, CommitTime: -1ms. Locks: INodeLock {paths=[/nnThroughputBenchmark/create/ThroughputBenchDir0/ThroughputBench7015], lockType=WRITE_ON_TARGET_AND_PARENT }. java.lang.OutOfMemoryError: Direct buffer memory
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:695)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at com.mysql.clusterj.tie.FixedByteBufferPoolImpl.borrowBuffer(FixedByteBufferPoolImpl.java:109)
at com.mysql.clusterj.tie.NdbRecordImpl.newBuffer(NdbRecordImpl.java:306)
at com.mysql.clusterj.tie.NdbRecordOperationImpl.allocateValueBuffer(NdbRecordOperationImpl.java:336)
at com.mysql.clusterj.tie.NdbRecordScanOperationImpl.nextResultCopyOut(NdbRecordScanOperationImpl.java:239)
at com.mysql.clusterj.tie.NdbRecordScanResultDataImpl.next(NdbRecordScanResultDataImpl.java:135)
at com.mysql.clusterj.core.query.QueryDomainTypeImpl.getResultList(QueryDomainTypeImpl.java:190)
at com.mysql.clusterj.core.query.QueryImpl.getResultList(QueryImpl.java:153)
at io.hops.metadata.ndb.wrapper.HopsQuery.getResultList(HopsQuery.java:46)
at io.hops.metadata.ndb.dalimpl.hdfs.LeasePathClusterj.findByHolderId(LeasePathClusterj.java:117)
at io.hops.transaction.context.LeasePathContext.findByHolderId(LeasePathContext.java:144)
at io.hops.transaction.context.LeasePathContext.findList(LeasePathContext.java:88)
at io.hops.transaction.context.TransactionContext.findList(TransactionContext.java:150)
at io.hops.transaction.EntityManager.findList(EntityManager.java:93)
at io.hops.transaction.lock.Lock.acquireLockList(Lock.java:120)
at io.hops.transaction.lock.LeasePathLock.acquireLeasePaths(LeasePathLock.java:85)
at io.hops.transaction.lock.LeasePathLock.acquire(LeasePathLock.java:68)
at io.hops.transaction.lock.HdfsTransactionalLockAcquirer.acquire(HdfsTransactionalLockAcquirer.java:32)
at io.hops.transaction.handler.TransactionalRequestHandler.execute(TransactionalRequestHandler.java:89)
at io.hops.transaction.handler.HopsTransactionalRequestHandler.execute(HopsTransactionalRequestHandler.java:50)
at io.hops.transaction.handler.RequestHandler.handle(RequestHandler.java:68)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2124)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:534)
at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$CreateFileStats.executeOp(NNThroughputBenchmark.java:633)
at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.benchmarkOne(NNThroughputBenchmark.java:453)
at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:436)
Exception in thread "StatsDaemon-0" java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:695)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at com.mysql.clusterj.tie.FixedByteBufferPoolImpl.borrowBuffer(FixedByteBufferPoolImpl.java:109)
at com.mysql.clusterj.tie.NdbRecordImpl.newBuffer(NdbRecordImpl.java:306)
at com.mysql.clusterj.tie.NdbRecordOperationImpl.allocateValueBuffer(NdbRecordOperationImpl.java:336)
at com.mysql.clusterj.tie.NdbRecordScanOperationImpl.nextResultCopyOut(NdbRecordScanOperationImpl.java:239)
at com.mysql.clusterj.tie.NdbRecordScanResultDataImpl.next(NdbRecordScanResultDataImpl.java:135)
at com.mysql.clusterj.core.query.QueryDomainTypeImpl.getResultList(QueryDomainTypeImpl.java:190)
at com.mysql.clusterj.core.query.QueryImpl.getResultList(QueryImpl.java:153)
at io.hops.metadata.ndb.wrapper.HopsQuery.getResultList(HopsQuery.java:46)
at io.hops.metadata.ndb.dalimpl.hdfs.LeasePathClusterj.findByHolderId(LeasePathClusterj.java:117)
at io.hops.transaction.context.LeasePathContext.findByHolderId(LeasePathContext.java:144)
at io.hops.transaction.context.LeasePathContext.findList(LeasePathContext.java:88)
at io.hops.transaction.context.TransactionContext.findList(TransactionContext.java:150)
at io.hops.transaction.EntityManager.findList(EntityManager.java:93)
at io.hops.transaction.lock.Lock.acquireLockList(Lock.java:120)
at io.hops.transaction.lock.LeasePathLock.acquireLeasePaths(LeasePathLock.java:85)
at io.hops.transaction.lock.LeasePathLock.acquire(LeasePathLock.java:68)
at io.hops.transaction.lock.HdfsTransactionalLockAcquirer.acquire(HdfsTransactionalLockAcquirer.java:32)
at io.hops.transaction.handler.TransactionalRequestHandler.execute(TransactionalRequestHandler.java:89)
at io.hops.transaction.handler.HopsTransactionalRequestHandler.execute(HopsTransactionalRequestHandler.java:50)
at io.hops.transaction.handler.RequestHandler.handle(RequestHandler.java:68)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2124)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:534)
at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$CreateFileStats.executeOp(NNThroughputBenchmark.java:633)
at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.benchmarkOne(NNThroughputBenchmark.java:453)
at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:436)
``