Tomcat server, limitation for user is 4096. java service will become no response after started up or runs for day(s).

In log file, found lots of “too many open files”.

Investigation:

lsof | grep java  | wc -l

more than 5000 opened file for the user.

lsof |grep java | grep protocol | wc -l

count for command is 1321

java      30758   bbuser 4714u     sock                0,6       0t0 1366904313 can't identify protocol

The number keep growing if run above command several minutes later, and eat up limitation for the user.

 

Root cause for this case:

There are over 39000 users need to be synchronized to other servers. The sync task will be run each minutes, and such amount of users cannot be completed in one minute. The task is asynchronous.