Project

General

Profile

Rocko, possible memory leak?

Added by V J over 4 years ago

Hi

I have built the Rocko version with 4.9.76-rt61-ltsi RT-kernel and with a custom FPGA and device tree. We use two Ethernet ports.

Running "free" shows that memory used is increasing with approximately 2.4 MB each hour. I.e. it seems that there is a memory leak here.

Test start at time at 01:30:

root@mitysom-c5:~# date;free;ps aux --sort -rss | head -40 
Mon Mar  9 01:30:40 UTC 2019

              total        used        free      shared  buff/cache   available
Mem:         768732       16280      730960         176       21492      740344
Swap:             0           0           0

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1083  0.0  0.3  14576  2824 ?        Ssl  01:30   0:00 /usr/sbin/tcf-agent -d -L- -l0
root      1102  0.1  0.3   3296  2576 ttyS0    Ss   01:30   0:00 -sh
29        1017  0.0  0.2   2928  2180 ?        Ss   01:30   0:00 /usr/sbin/rpc.statd
root      1062  0.4  0.2   2576  2072 ?        Ss   01:30   0:00 /sbin/klogd
root      1088  0.0  0.2   2828  2060 ?        Ss   01:30   0:00 /bin/sh /bin/start_getty 115200 ttyS0 vt102
root       809  0.3  0.2   3184  1732 ?        Ss   01:30   0:00 /sbin/udevd -d
root      1109  0.0  0.2   3176  1712 ttyS0    R+   01:30   0:00 ps aux --sort -rss
root      1045  0.0  0.2   3108  1632 ?        Ss   01:30   0:00 /usr/sbin/rpc.mountd
root      1008  0.0  0.2   4916  1628 ?        Ss   01:30   0:00 /usr/sbin/sshd
rpc       1012  0.0  0.2   2212  1620 ?        Ss   01:30   0:00 /usr/sbin/rpcbind
message+   959  0.0  0.2   3212  1572 ?        Ss   01:30   0:00 /usr/bin/dbus-daemon --system
root      1065  0.0  0.1   2496  1524 ?        Ss   01:30   0:00 /usr/sbin/crond
root      1022  0.0  0.1   2172  1492 ?        S    01:30   0:00 /usr/sbin/atd -f
root      1089  0.0  0.1   2732  1484 tty1     Ss+  01:30   0:00 /sbin/getty 38400 tty1
root      1060  0.1  0.1   1944  1428 ?        Ss   01:30   0:00 /sbin/syslogd
root         1  3.2  0.1   1716  1172 ?        Ss   01:30   0:01 init [5]
root      1071  0.0  0.1   1804  1064 ?        S    01:30   0:00 /usr/sbin/ifplugd -i eth0 -fwI -u0 -d10
root       952  0.0  0.1   2732   888 ?        Ss   01:30   0:00 udhcpc -R -b -p /var/run/udhcpc.eth0.pid -i eth0
root      1110  0.0  0.1   1728   880 ttyS0    S+   01:30   0:00 head -40
root         2  0.0  0.0      0     0 ?        S    01:30   0:00 [kthreadd]
root         3  0.1  0.0      0     0 ?        S    01:30   0:00 [ksoftirqd/0]
root         4  0.1  0.0      0     0 ?        S    01:30   0:00 [ktimersoftd/0]
root         5  0.0  0.0      0     0 ?        S    01:30   0:00 [kworker/0:0]
root         6  0.0  0.0      0     0 ?        S<   01:30   0:00 [kworker/0:0H]
root         7  0.0  0.0      0     0 ?        S    01:30   0:00 [kworker/u4:0]

Test end at time at 06:03:

root@mitysom-c5:~# date;free;ps aux --sort -rss | head -40 
Mon Mar  9 06:03:49 UTC 2019
              total        used        free      shared  buff/cache   available
Mem:         768732       27116      720044         200       21572      729544
Swap:             0           0           0

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1083  0.0  0.3  14576  2824 ?        Ssl  01:30   0:05 /usr/sbin/tcf-agent -d -L- -l0
root      1102  0.0  0.3   3296  2576 ttyS0    Ss   01:30   0:00 -sh
29        1017  0.0  0.2   2928  2180 ?        Ss   01:30   0:00 /usr/sbin/rpc.statd
root      1062  0.0  0.2   2576  2072 ?        Ss   01:30   0:00 /sbin/klogd
root      1088  0.0  0.2   2828  2060 ?        Ss   01:30   0:00 /bin/sh /bin/start_getty 115200 ttyS0 vt102
root      1148  0.0  0.2   3176  1776 ttyS0    R+   06:03   0:00 ps aux --sort -rss
root       809  0.0  0.2   3184  1732 ?        Ss   01:30   0:00 /sbin/udevd -d
root      1045  0.0  0.2   3108  1632 ?        Ss   01:30   0:00 /usr/sbin/rpc.mountd
root      1008  0.0  0.2   4916  1628 ?        Ss   01:30   0:00 /usr/sbin/sshd
rpc       1012  0.0  0.2   2212  1620 ?        Ss   01:30   0:00 /usr/sbin/rpcbind
message+   959  0.0  0.2   3212  1572 ?        Ss   01:30   0:00 /usr/bin/dbus-daemon --system
root      1065  0.0  0.1   2496  1524 ?        Ss   01:30   0:00 /usr/sbin/crond
root      1022  0.0  0.1   2172  1492 ?        S    01:30   0:00 /usr/sbin/atd -f
root      1089  0.0  0.1   2732  1484 tty1     Ss+  01:30   0:00 /sbin/getty 38400 tty1
root      1060  0.0  0.1   1944  1428 ?        Ss   01:30   0:00 /sbin/syslogd
root         1  0.0  0.1   1716  1172 ?        Ss   01:30   0:01 init [5]
root      1071  0.0  0.1   1804  1064 ?        S    01:30   0:00 /usr/sbin/ifplugd -i eth0 -fwI -u0 -d10
root       952  0.0  0.1   2732   888 ?        Ss   01:30   0:00 udhcpc -R -b -p /var/run/udhcpc.eth0.pid -i eth0
root      1149  0.0  0.1   1728   884 ttyS0    S+   06:03   0:00 head -40
root         2  0.0  0.0      0     0 ?        S    01:30   0:00 [kthreadd]
root         3  0.1  0.0      0     0 ?        S    01:30   0:20 [ksoftirqd/0]
root         4  0.1  0.0      0     0 ?        S    01:30   0:18 [ktimersoftd/0]
root         5  0.0  0.0      0     0 ?        S    01:30   0:00 [kworker/0:0]
root         6  0.0  0.0      0     0 ?        S<   01:30   0:00 [kworker/0:0H]
root         8  0.0  0.0      0     0 ?        S    01:30   0:02 [rcu_preempt]

During these 4hours and 30min approximately 11MB (27116-16280) memory is "lost".
There is only a small amount of memory difference caused by "shared" and "buff/cache".
"ps" show no increase in memory use for the running applications.
Only the "default" applications are running when testing this.
Note: We have allocated 768MB (and not 1GB) for linux, the rest is allocated to the FPGA.

To see the impact this memory issues has on the system I have run a test where the allocated memory for linux has been deceased from 758MB to 75MB.
Result: After a few hours the system becomes unresponsive, probably because of lack of memory.

Are you aware of any memory leak issues on either Rocko, kernel 4.9 or RT?

I do not see this behavior on the original distribution which came with evaluation board.

BR
VJ


Replies (10)

RE: Rocko, possible memory leak? - Added by Daniel Vincelette over 4 years ago

Hello VJ,

We are not aware of memory leaks on the rocko or the 4.9 kernel. Did you by chance run the original distribution while using the 4.9 kernel (so older filesystem with newer kernel)?

I have a test running on my bench to recreate what you are seeing now.

Dan

RE: Rocko, possible memory leak? - Added by V J over 4 years ago

Hello Dan, thank you for your reply.

We have tested the original SD card where we replaced kernel 3.16 with 4.9.78-ltsi (non RT). We also needed to use a device tree from the rocko build to be able to boot properly.

The results are the same as with the rocko build, memory is “lost”:

Test start, uptime 36 min:

root@mitysom-5csx:~# uptime;free;ps aux --sort -rss | head -n 30
 23:32:36 up 36 min,  1 user,  load average: 0.00, 0.00, 0.00
             total       used       free     shared    buffers     cached
Mem:       1029480      45928     983552          0       4824      12384
-/+ buffers/cache:      28720    1000760
Swap:            0          0          0
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1072  0.0  0.2   4152  2516 ?        Ss   22:56   0:00 /usr/sbin/sshd
root      1158  0.0  0.2   2840  2480 ttyS0    Ss   22:56   0:00 -sh
avahi     1115  0.0  0.2   3248  2472 ?        S    22:56   0:00 avahi-daemon: running [mitysom-5csx.local]
root      1108  0.0  0.2   2660  2132 ?        Ss   22:56   0:00 /sbin/klogd
root       793  0.0  0.2   2800  2072 ?        Ss   22:55   0:00 /lib/udev/udevd -d
root       906  0.0  0.1   2728  1776 ?        S    22:55   0:00 /lib/udev/udevd -d
root       905  0.0  0.1   2728  1700 ?        S    22:55   0:00 /lib/udev/udevd -d
root      1038  0.0  0.1   2224  1624 ?        Ss   22:56   0:00 /usr/sbin/rpcbind
root      1188  0.0  0.1   2324  1592 ttyS0    R+   23:32   0:00 ps aux --sort -rss
997       1065  0.0  0.1   2608  1564 ?        Ss   22:56   0:00 /usr/bin/dbus-daemon --system
root      1145  0.0  0.1   2468  1512 ?        Ss   22:56   0:00 /usr/sbin/crond
root      1097  0.0  0.1   2368  1460 ?        Ss   22:56   0:00 /usr/sbin/rpc.mountd -f /etc/exports
root      1055  0.0  0.1   1856  1416 ?        S    22:56   0:00 /usr/sbin/atd -f
root      1099  0.0  0.1   1912  1408 ?        Ss   22:56   0:00 /usr/sbin/rpc.statd
root      1106  0.0  0.1   1776  1388 ?        Ss   22:56   0:00 /sbin/syslogd
avahi     1116  0.0  0.1   3104  1336 ?        S    22:56   0:00 avahi-daemon: chroot helper
root      1159  0.0  0.1   2196  1324 tty1     Ss+  22:56   0:00 /sbin/getty 38400 tty1
root      1130  0.0  0.1   1932  1264 ?        S    22:56   0:00 /usr/sbin/avahi-dnsconfd -D
root      1142  0.0  0.1   2232  1260 ?        S    22:56   0:00 /usr/sbin/lighttpd -f /etc/lighttpd.conf
root         1  0.0  0.1   1688  1176 ?        Ss   22:55   0:01 init [5]
root      1151  0.0  0.1   1740  1072 ?        S    22:56   0:00 /usr/sbin/ifplugd -i eth0 -fwI -u0 -d10
root      1189  0.0  0.0   1696   936 ttyS0    S+   23:32   0:00 head -n 30
root      1033  0.0  0.0   2192   752 ?        Ss   22:56   0:00 udhcpc -R -n -p /var/run/udhcpc.eth0.pid -i eth0
root         2  0.0  0.0      0     0 ?        S    22:55   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    22:55   0:00 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   22:55   0:00 [kworker/0:0H]
root         7  0.0  0.0      0     0 ?        S    22:55   0:00 [rcu_sched]
root         8  0.0  0.0      0     0 ?        S    22:55   0:00 [rcu_bh]
root         9  0.0  0.0      0     0 ?        S    22:55   0:00 [migration/0]

Test stop, uptime 3h 24 min:

root@mitysom-5csx:~# uptime;free;ps aux --sort -rss | head -n 30
 02:19:56 up  3:24,  1 user,  load average: 0.00, 0.00, 0.00
             total       used       free     shared    buffers     cached
Mem:       1029480      52896     976584          0       4924      12488
-/+ buffers/cache:      35484     993996
Swap:            0          0          0
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1072  0.0  0.2   4152  2516 ?        Ss   Nov05   0:00 /usr/sbin/sshd
root      1158  0.0  0.2   2844  2484 ttyS0    Ss   Nov05   0:00 -sh
avahi     1115  0.0  0.2   3248  2476 ?        S    Nov05   0:01 avahi-daemon: running [mitysom-5csx.local]
root      1108  0.0  0.2   2660  2132 ?        Ss   Nov05   0:00 /sbin/klogd
root       793  0.0  0.2   2800  2072 ?        Ss   Nov05   0:00 /lib/udev/udevd -d
root       906  0.0  0.1   2728  1776 ?        S    Nov05   0:00 /lib/udev/udevd -d
root       905  0.0  0.1   2728  1700 ?        S    Nov05   0:00 /lib/udev/udevd -d
root      1038  0.0  0.1   2224  1624 ?        Ss   Nov05   0:00 /usr/sbin/rpcbind
root      1236  0.0  0.1   2324  1604 ttyS0    R+   02:19   0:00 ps aux --sort -rss
997       1065  0.0  0.1   2608  1564 ?        Ss   Nov05   0:00 /usr/bin/dbus-daemon --system
root      1145  0.0  0.1   2468  1512 ?        Ss   Nov05   0:00 /usr/sbin/crond
root      1097  0.0  0.1   2368  1460 ?        Ss   Nov05   0:00 /usr/sbin/rpc.mountd -f /etc/exports
root      1055  0.0  0.1   1856  1416 ?        S    Nov05   0:00 /usr/sbin/atd -f
root      1099  0.0  0.1   1912  1408 ?        Ss   Nov05   0:00 /usr/sbin/rpc.statd
root      1106  0.0  0.1   1776  1388 ?        Ss   Nov05   0:00 /sbin/syslogd
avahi     1116  0.0  0.1   3104  1336 ?        S    Nov05   0:00 avahi-daemon: chroot helper
root      1159  0.0  0.1   2196  1324 tty1     Ss+  Nov05   0:00 /sbin/getty 38400 tty1
root      1130  0.0  0.1   1932  1264 ?        S    Nov05   0:00 /usr/sbin/avahi-dnsconfd -D
root      1142  0.0  0.1   2232  1260 ?        S    Nov05   0:00 /usr/sbin/lighttpd -f /etc/lighttpd.conf
root         1  0.0  0.1   1688  1176 ?        Ss   Nov05   0:01 init [5]
root      1151  0.0  0.1   1740  1072 ?        S    Nov05   0:00 /usr/sbin/ifplugd -i eth0 -fwI -u0 -d10
root      1237  0.0  0.0   1696   916 ttyS0    S+   02:19   0:00 head -n 30
root      1033  0.0  0.0   2192   752 ?        Ss   Nov05   0:00 udhcpc -R -n -p /var/run/udhcpc.eth0.pid -i eth0
root         2  0.0  0.0      0     0 ?        S    Nov05   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Nov05   0:00 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   Nov05   0:00 [kworker/0:0H]
root         7  0.0  0.0      0     0 ?        S    Nov05   0:02 [rcu_sched]
root         8  0.0  0.0      0     0 ?        S    Nov05   0:00 [rcu_bh]
root         9  0.0  0.0      0     0 ?        S    Nov05   0:00 [migration/0]

During these 2h 48min approximately 7MB (52896-45928) memory is "lost". I.e. 2.5MB/hour.

BR
VJ

RE: Rocko, possible memory leak? - Added by Daniel Vincelette over 4 years ago

I enabled CONFIG_DEBUG_KMEMLEAK in the 4.9 kernel, which is supposed to report any kernel memory leaks it finds. I ran it all night and it didn't report any. I am seeing about a 1MB leak an hour through free though.

This leads me to believe it's something on the filesystem causing this issue. My next option is to try the warrior or zeus branch of yocto to see if the issue still persists.

Dan

RE: Rocko, possible memory leak? - Added by V J over 4 years ago

Hi Dan
Thanks for looking into this.
I have run some test during the weekend, and the results are a bit inconclusive. However, it seems that Ethernet has some impact:
Tests where "free" output is logged:
1: No Ethernet cables connected. No increased memory use.
2: Ethernet cables connected, but no dedicated traffic or connections (like ssh) to the board. Sometimes increased memory use, sometimes not.
3a: Ethernet cables connected, but no dedicated traffic or connections (like ssh) to the board, except that the board is pinged every sec. Increased memory use, ca 2.5 MB/hour
3b: Turning of pinging: The memory use stops.
The statistics behind these measurements are poor, so it is only observations, not conclusions.
BR
VJ

RE: Rocko, possible memory leak? - Added by Daniel Vincelette over 4 years ago

Hi VJ

Those are definitely interesting findings leads me to believe like yourself some part of the ethernet subsystem is causing it. Would you say your test for 3b and 2 are the same? If not what are the differences?

I switched to the zeus branch for yocto, hoping it was an old bug in maybe busybox and still see the used memory increase. I'm trying to compile in some tools to give me some more insight.

Dan

RE: Rocko, possible memory leak? - Added by V J over 4 years ago

Hi Dan
Yes, 2 and 3b are the same.
I have done some more monitoring where Eth0 is connected but not used and Eth1 is pinged every sec.
Next eth0 and eth1 are taken down after 11h 30m:

mitysom-c5:~# uptime;free
 17:29:00 up 11:27,  1 user,  load average: 0.00, 0.00, 0.00
              total        used        free      shared  buff/cache   available
Mem:         768724       70256      651328         300       47140      683816
Swap:             0           0

mitysom-c5:~# ifconfig eth0 down
mitysom-c5:~# uptime;free
 17:30:47 up 11:29,  1 user,  load average: 0.00, 0.00, 0.00
              total        used        free      shared  buff/cache   available
Mem:         768724       65000      656608         304       47116      689068
Swap:             0           0  

mitysom-c5:~# ifconfig eth1 down
mitysom-c5:~#
mitysom-c5:~# uptime;free
 17:31:24 up 11:29,  1 user,  load average: 0.00, 0.00, 0.00
              total        used        free      shared  buff/cache   available
Mem:         768724       55624      666040         304       47060      698444
Swap:             0           0           0

This frees up approximately 5MB for eth0 and 10MB for eth1 so it seems that the ethernet driver is quite memory hungry...

Note: kmemleak is enabled so the used memory is a bit higher at startup compared to the previous measurements.

BR
VJ

RE: Rocko, possible memory leak? - Added by V J over 4 years ago

Hi Dan.
I have monitored the memory use over the weekend. See attached plot where "free" is logged every 10 minutes.
Test steps:
1: Booting the board. Ethernet cables are connected to Eth0 and Eth1, but no dedicated traffic or connections (like ssh) to the board.
2: At approximately 1300 minutes (130*10): Starting Eth1 ping every 10ms.
3: At approximately 1800 minutes (180*10): Stopping ping.
Used memory every 10 minutes
It seems that the Ethernet requires a fair amount of memory which is allocated in a slowly increasing (and interesting?) pattern when starting pinging.
However, this memory is slowly freed when the pinging stops.

The first approximately 450 minutes (45*10) has a similar slowly increasing memory use which is partly freed.

Maybe this is just how the Ethernet interface works?
BR
VJ

picture95-1.png (6.03 KB) picture95-1.png Used memory every 10 minutes

RE: Rocko, possible memory leak? - Added by Michael Williamson over 4 years ago

Hello Dan/VJ,

Have you guys seen this stack overflow post about tracking socket buffers /
memory moving through the network stacks?

https://unix.stackexchange.com/questions/419518/how-to-tell-how-much-memory-tcp-buffers-are-actually-using

Might provide some insight....

-Mike

On Mon, Mar 23, 2020 at 6:56 AM <> wrote:

RE: Rocko, possible memory leak? - Added by V J over 4 years ago

Dan, based upon the measurements I have done and input from Mike I think I have to write this off as normal behavior.
I have done some stress test where I stream data over Ethernet and the memory use does not increase uncontrolled.
BR
VJ

RE: Rocko, possible memory leak? - Added by Daniel Vincelette over 4 years ago

VJ,

I agree after looking at your data and with the knowledge that I've run 5CSX SOMs for weeks plugged into the network without having them dead locking.

It looks like the kernel uses a fair amount of memory for ethernet, though it can properly garbage collect it when need be.

Dan

    (1-10/10)
    Go to top
    Add picture from clipboard (Maximum size: 1 GB)