Project

General

Profile

Memory Bandwidth » Tinymembench_Results.txt

Job Sava, 03/31/2025 03:55 PM

 
1
Job Sava
2
Tinymembench Results
3

    
4

    
5
RUN1:
6

    
7
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
8

    
9
==========================================================================
10
== Memory bandwidth tests                                               ==
11
==                                                                      ==
12
== Note 1: 1MB = 1000000 bytes                                          ==
13
== Note 2: Results for 'copy' tests show how many bytes can be          ==
14
==         copied per second (adding together read and writen           ==
15
==         bytes would have provided twice higher numbers)              ==
16
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
17
==         to first fetch data into it, and only then write it to the   ==
18
==         destination (source -> L1 cache, L1 cache -> destination)    ==
19
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
20
==         brackets                                                     ==
21
==========================================================================
22

    
23
 C copy backwards                                     :   1147.9 MB/s (1.3%)
24
 C copy backwards (32 byte blocks)                    :   1121.7 MB/s (1.5%)
25
 C copy backwards (64 byte blocks)                    :    995.3 MB/s (1.3%)
26
 C copy                                               :   1082.9 MB/s (1.0%)
27
 C copy prefetched (32 bytes step)                    :    903.5 MB/s (0.9%)
28
 C copy prefetched (64 bytes step)                    :    954.8 MB/s (0.4%)
29
 C 2-pass copy                                        :    858.6 MB/s
30
 C 2-pass copy prefetched (32 bytes step)             :    597.4 MB/s (0.2%)
31
 C 2-pass copy prefetched (64 bytes step)             :    304.4 MB/s
32
 C fill                                               :   2151.3 MB/s (0.1%)
33
 C fill (shuffle within 16 byte blocks)               :   2149.3 MB/s
34
 C fill (shuffle within 32 byte blocks)               :   2147.5 MB/s
35
 C fill (shuffle within 64 byte blocks)               :   2151.7 MB/s (0.2%)
36
 ---
37
 standard memcpy                                      :   1065.0 MB/s (0.9%)
38
 standard memset                                      :   2151.4 MB/s (0.1%)
39
 ---
40
 NEON LDP/STP copy                                    :   1175.8 MB/s (0.5%)
41
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    790.2 MB/s (0.6%)
42
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    958.6 MB/s
43
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   1192.3 MB/s
44
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   1190.4 MB/s
45
 NEON LD1/ST1 copy                                    :   1094.0 MB/s (1.0%)
46
 NEON STP fill                                        :   2151.3 MB/s (0.2%)
47
 NEON STNP fill                                       :   2068.2 MB/s (3.8%)
48
 ARM LDP/STP copy                                     :   1181.9 MB/s (0.9%)
49
 ARM STP fill                                         :   2151.5 MB/s (0.1%)
50
 ARM STNP fill                                        :   2076.2 MB/s (6.9%)
51

    
52
==========================================================================
53
== Memory latency test                                                  ==
54
==                                                                      ==
55
== Average time is measured for random memory accesses in the buffers   ==
56
== of different sizes. The larger is the buffer, the more significant   ==
57
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
58
== accesses. For extremely large buffer sizes we are expecting to see   ==
59
== page table walk with several requests to SDRAM for almost every      ==
60
== memory access (though 64MiB is not nearly large enough to experience ==
61
== this effect to its fullest).                                         ==
62
==                                                                      ==
63
== Note 1: All the numbers are representing extra time, which needs to  ==
64
==         be added to L1 cache latency. The cycle timings for L1 cache ==
65
==         latency can be usually found in the processor documentation. ==
66
== Note 2: Dual random read means that we are simultaneously performing ==
67
==         two independent memory accesses at a time. In the case if    ==
68
==         the memory subsystem can't handle multiple outstanding       ==
69
==         requests, dual random read has the same timings as two       ==
70
==         single reads performed one after another.                    ==
71
==========================================================================
72

    
73
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
74
      1024 :    0.0 ns          /     0.0 ns
75
      2048 :    0.0 ns          /     0.0 ns
76
      4096 :    0.0 ns          /     0.0 ns
77
      8192 :    0.0 ns          /     0.0 ns
78
     16384 :    0.0 ns          /     0.0 ns
79
     32768 :    0.0 ns          /     0.0 ns
80
     65536 :    4.2 ns          /     7.6 ns
81
    131072 :    6.7 ns          /    10.7 ns
82
    262144 :    8.1 ns          /    12.6 ns
83
    524288 :   12.1 ns          /    18.3 ns
84
   1048576 :  127.3 ns          /   195.4 ns
85
   2097152 :  189.5 ns          /   250.4 ns
86
   4194304 :  226.5 ns          /   272.7 ns
87
   8388608 :  245.3 ns          /   282.3 ns
88
  16777216 :  255.6 ns          /   287.1 ns
89
  33554432 :  262.7 ns          /   289.7 ns
90
  67108864 :  275.4 ns          /   309.8 ns
91

    
92
block size : single random read / dual random read, [MADV_HUGEPAGE]
93
      1024 :    0.0 ns          /     0.0 ns
94
      2048 :    0.0 ns          /     0.0 ns
95
      4096 :    0.0 ns          /     0.0 ns
96
      8192 :    0.0 ns          /     0.0 ns
97
     16384 :    0.0 ns          /     0.0 ns
98
     32768 :    0.0 ns          /     0.0 ns
99
     65536 :    4.2 ns          /     7.3 ns
100
    131072 :    6.7 ns          /    10.8 ns
101
    262144 :    8.1 ns          /    11.9 ns
102
    524288 :   12.3 ns          /    17.0 ns
103
   1048576 :  126.9 ns          /   194.9 ns
104
   2097152 :  187.7 ns          /   248.4 ns
105
   4194304 :  218.2 ns          /   264.8 ns
106
   8388608 :  234.2 ns          /   270.6 ns
107
  16777216 :  242.2 ns          /   272.8 ns
108
  33554432 :  246.1 ns          /   273.7 ns
109
  67108864 :  248.0 ns          /   274.0 ns
110

    
111

    
112
RUN2:
113

    
114

    
115
root@mitysom-am62x:~# tinymembench
116
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
117

    
118
==========================================================================
119
== Memory bandwidth tests                                               ==
120
==                                                                      ==
121
== Note 1: 1MB = 1000000 bytes                                          ==
122
== Note 2: Results for 'copy' tests show how many bytes can be          ==
123
==         copied per second (adding together read and writen           ==
124
==         bytes would have provided twice higher numbers)              ==
125
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
126
==         to first fetch data into it, and only then write it to the   ==
127
==         destination (source -> L1 cache, L1 cache -> destination)    ==
128
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
129
==         brackets                                                     ==
130
==========================================================================
131

    
132
 C copy backwards                                     :   1148.8 MB/s (1.8%)
133
 C copy backwards (32 byte blocks)                    :   1102.7 MB/s (1.0%)
134
 C copy backwards (64 byte blocks)                    :   1001.1 MB/s (1.5%)
135
 C copy                                               :   1070.9 MB/s (0.8%)
136
 C copy prefetched (32 bytes step)                    :    900.7 MB/s (1.0%)
137
 C copy prefetched (64 bytes step)                    :    959.1 MB/s (0.5%)
138
 C 2-pass copy                                        :    859.6 MB/s
139
 C 2-pass copy prefetched (32 bytes step)             :    594.2 MB/s (0.2%)
140
 C 2-pass copy prefetched (64 bytes step)             :    304.6 MB/s
141
 C fill                                               :   2147.6 MB/s
142
 C fill (shuffle within 16 byte blocks)               :   2148.8 MB/s
143
 C fill (shuffle within 32 byte blocks)               :   2150.6 MB/s
144
 C fill (shuffle within 64 byte blocks)               :   2150.5 MB/s
145
 ---
146
 standard memcpy                                      :   1054.6 MB/s (1.0%)
147
 standard memset                                      :   2149.1 MB/s
148
 ---
149
 NEON LDP/STP copy                                    :   1183.2 MB/s (0.8%)
150
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    786.0 MB/s (0.8%)
151
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    957.9 MB/s
152
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   1191.6 MB/s
153
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   1188.3 MB/s
154
 NEON LD1/ST1 copy                                    :   1094.8 MB/s (0.9%)
155
 NEON STP fill                                        :   2148.5 MB/s
156
 NEON STNP fill                                       :   2078.3 MB/s (2.6%)
157
 ARM LDP/STP copy                                     :   1181.0 MB/s (0.7%)
158
 ARM STP fill                                         :   2149.4 MB/s
159
 ARM STNP fill                                        :   2075.5 MB/s (3.8%)
160

    
161
==========================================================================
162
== Memory latency test                                                  ==
163
==                                                                      ==
164
== Average time is measured for random memory accesses in the buffers   ==
165
== of different sizes. The larger is the buffer, the more significant   ==
166
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
167
== accesses. For extremely large buffer sizes we are expecting to see   ==
168
== page table walk with several requests to SDRAM for almost every      ==
169
== memory access (though 64MiB is not nearly large enough to experience ==
170
== this effect to its fullest).                                         ==
171
==                                                                      ==
172
== Note 1: All the numbers are representing extra time, which needs to  ==
173
==         be added to L1 cache latency. The cycle timings for L1 cache ==
174
==         latency can be usually found in the processor documentation. ==
175
== Note 2: Dual random read means that we are simultaneously performing ==
176
==         two independent memory accesses at a time. In the case if    ==
177
==         the memory subsystem can't handle multiple outstanding       ==
178
==         requests, dual random read has the same timings as two       ==
179
==         single reads performed one after another.                    ==
180
==========================================================================
181

    
182
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
183
      1024 :    0.0 ns          /     0.0 ns
184
      2048 :    0.0 ns          /     0.0 ns
185
      4096 :    0.0 ns          /     0.0 ns
186
      8192 :    0.0 ns          /     0.0 ns
187
     16384 :    0.0 ns          /     0.0 ns
188
     32768 :    0.0 ns          /     0.0 ns
189
     65536 :    4.2 ns          /     7.5 ns
190
    131072 :    6.7 ns          /    10.8 ns
191
    262144 :    8.1 ns          /    12.4 ns
192
    524288 :   12.1 ns          /    18.4 ns
193
   1048576 :  127.0 ns          /   195.4 ns
194
   2097152 :  189.6 ns          /   250.5 ns
195
   4194304 :  226.6 ns          /   272.9 ns
196
   8388608 :  245.4 ns          /   282.3 ns
197
  16777216 :  255.7 ns          /   287.2 ns
198
  33554432 :  262.8 ns          /   289.6 ns
199
  67108864 :  275.0 ns          /   309.2 ns
200

    
201
block size : single random read / dual random read, [MADV_HUGEPAGE]
202
      1024 :    0.0 ns          /     0.0 ns
203
      2048 :    0.0 ns          /     0.0 ns
204
      4096 :    0.0 ns          /     0.0 ns
205
      8192 :    0.0 ns          /     0.0 ns
206
     16384 :    0.0 ns          /     0.0 ns
207
     32768 :    0.0 ns          /     0.0 ns
208
     65536 :    4.2 ns          /     7.3 ns
209
    131072 :    6.7 ns          /    10.8 ns
210
    262144 :    8.1 ns          /    12.4 ns
211
    524288 :   12.0 ns          /    18.2 ns
212
   1048576 :  127.0 ns          /   195.2 ns
213
   2097152 :  187.8 ns          /   248.6 ns
214
   4194304 :  218.4 ns          /   265.0 ns
215
   8388608 :  234.4 ns          /   270.8 ns
216
  16777216 :  242.4 ns          /   273.0 ns
217
  33554432 :  246.3 ns          /   273.9 ns
218
  67108864 :  248.2 ns          /   274.2 ns
219

    
220

    
221
RUN3:
222
root@mitysom-am62x:~# tinymembench
223
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
224

    
225
==========================================================================
226
== Memory bandwidth tests                                               ==
227
==                                                                      ==
228
== Note 1: 1MB = 1000000 bytes                                          ==
229
== Note 2: Results for 'copy' tests show how many bytes can be          ==
230
==         copied per second (adding together read and writen           ==
231
==         bytes would have provided twice higher numbers)              ==
232
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
233
==         to first fetch data into it, and only then write it to the   ==
234
==         destination (source -> L1 cache, L1 cache -> destination)    ==
235
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
236
==         brackets                                                     ==
237
==========================================================================
238

    
239
 C copy backwards                                     :   1128.8 MB/s (1.6%)
240
 C copy backwards (32 byte blocks)                    :   1113.4 MB/s (1.7%)
241
 C copy backwards (64 byte blocks)                    :   1006.5 MB/s (2.0%)
242
 C copy                                               :   1079.4 MB/s (1.4%)
243
 C copy prefetched (32 bytes step)                    :    902.2 MB/s (1.1%)
244
 C copy prefetched (64 bytes step)                    :    956.9 MB/s (0.3%)
245
 C 2-pass copy                                        :    859.2 MB/s
246
 C 2-pass copy prefetched (32 bytes step)             :    594.9 MB/s (0.2%)
247
 C 2-pass copy prefetched (64 bytes step)             :    304.2 MB/s
248
 C fill                                               :   2150.6 MB/s (0.2%)
249
 C fill (shuffle within 16 byte blocks)               :   2150.3 MB/s (0.1%)
250
 C fill (shuffle within 32 byte blocks)               :   2148.0 MB/s
251
 C fill (shuffle within 64 byte blocks)               :   2149.0 MB/s
252
 ---
253
 standard memcpy                                      :   1059.1 MB/s (0.9%)
254
 standard memset                                      :   2150.6 MB/s (0.1%)
255
 ---
256
 NEON LDP/STP copy                                    :   1185.5 MB/s (0.7%)
257
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    787.7 MB/s (0.4%)
258
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    959.0 MB/s
259
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   1192.0 MB/s
260
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   1189.7 MB/s (0.1%)
261
 NEON LD1/ST1 copy                                    :   1098.0 MB/s (1.0%)
262
 NEON STP fill                                        :   2148.2 MB/s
263
 NEON STNP fill                                       :   2074.4 MB/s (1.4%)
264
 ARM LDP/STP copy                                     :   1174.4 MB/s (0.4%)
265
 ARM STP fill                                         :   2151.2 MB/s (0.1%)
266
 ARM STNP fill                                        :   2074.4 MB/s (4.1%)
267

    
268

    
269
==========================================================================
270
== Memory latency test                                                  ==
271
==                                                                      ==
272
== Average time is measured for random memory accesses in the buffers   ==
273
== of different sizes. The larger is the buffer, the more significant   ==
274
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
275
== accesses. For extremely large buffer sizes we are expecting to see   ==
276
== page table walk with several requests to SDRAM for almost every      ==
277
== memory access (though 64MiB is not nearly large enough to experience ==
278
== this effect to its fullest).                                         ==
279
==                                                                      ==
280
== Note 1: All the numbers are representing extra time, which needs to  ==
281
==         be added to L1 cache latency. The cycle timings for L1 cache ==
282
==         latency can be usually found in the processor documentation. ==
283
== Note 2: Dual random read means that we are simultaneously performing ==
284
==         two independent memory accesses at a time. In the case if    ==
285
==         the memory subsystem can't handle multiple outstanding       ==
286
==         requests, dual random read has the same timings as two       ==
287
==         single reads performed one after another.                    ==
288
==========================================================================
289

    
290
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
291
      1024 :    0.0 ns          /     0.0 ns
292
      2048 :    0.0 ns          /     0.0 ns
293
      4096 :    0.0 ns          /     0.0 ns
294
      8192 :    0.0 ns          /     0.0 ns
295
     16384 :    0.0 ns          /     0.0 ns
296
     32768 :    0.0 ns          /     0.0 ns
297
     65536 :    4.2 ns          /     7.5 ns
298
    131072 :    6.7 ns          /    10.7 ns
299
    262144 :    8.1 ns          /    12.0 ns
300
    524288 :   12.4 ns          /    18.5 ns
301
   1048576 :  127.5 ns          /   195.3 ns
302
   2097152 :  189.6 ns          /   250.5 ns
303
   4194304 :  226.6 ns          /   272.9 ns
304
   8388608 :  245.4 ns          /   282.4 ns
305
  16777216 :  255.7 ns          /   287.3 ns
306
  33554432 :  262.9 ns          /   289.6 ns
307
  67108864 :  274.4 ns          /   307.5 ns
308

    
309
block size : single random read / dual random read, [MADV_HUGEPAGE]
310
      1024 :    0.0 ns          /     0.0 ns
311
      2048 :    0.0 ns          /     0.0 ns
312
      4096 :    0.0 ns          /     0.0 ns
313
      8192 :    0.0 ns          /     0.0 ns
314
     16384 :    0.0 ns          /     0.0 ns
315
     32768 :    0.0 ns          /     0.0 ns
316
     65536 :    4.2 ns          /     7.4 ns
317
    131072 :    6.7 ns          /    10.8 ns
318
    262144 :    8.1 ns          /    12.5 ns
319
    524288 :   12.3 ns          /    18.1 ns
320
   1048576 :  127.0 ns          /   195.2 ns
321
   2097152 :  187.9 ns          /   248.6 ns
322
   4194304 :  218.5 ns          /   265.0 ns
323
   8388608 :  234.4 ns          /   270.8 ns
324
  16777216 :  242.4 ns          /   273.0 ns
325
  33554432 :  246.3 ns          /   273.8 ns
326
  67108864 :  248.2 ns          /   274.2 ns
327

    
328

    
329
RUN4: (Uboot changes: TCR/ASR disabled)
330

    
331
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
332

    
333
==========================================================================
334
== Memory bandwidth tests                                               ==
335
==                                                                      ==
336
== Note 1: 1MB = 1000000 bytes                                          ==
337
== Note 2: Results for 'copy' tests show how many bytes can be          ==
338
==         copied per second (adding together read and writen           ==
339
==         bytes would have provided twice higher numbers)              ==
340
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
341
==         to first fetch data into it, and only then write it to the   ==
342
==         destination (source -> L1 cache, L1 cache -> destination)    ==
343
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
344
==         brackets                                                     ==
345
==========================================================================
346

    
347
 C copy backwards                                     :   1067.3 MB/s (2.1%)
348
 C copy backwards (32 byte blocks)                    :   1038.0 MB/s (1.9%)
349
 C copy backwards (64 byte blocks)                    :    945.6 MB/s (2.1%)
350
 C copy                                               :   1016.2 MB/s (1.6%)
351
 C copy prefetched (32 bytes step)                    :    851.0 MB/s (0.4%)
352
 C copy prefetched (64 bytes step)                    :    903.4 MB/s (0.4%)
353
 C 2-pass copy                                        :    823.9 MB/s
354
 C 2-pass copy prefetched (32 bytes step)             :    575.5 MB/s (0.6%)
355
 C 2-pass copy prefetched (64 bytes step)             :    292.2 MB/s (0.1%)
356
 C fill                                               :   2143.0 MB/s
357
 C fill (shuffle within 16 byte blocks)               :   2145.5 MB/s
358
 C fill (shuffle within 32 byte blocks)               :   2145.1 MB/s
359
 C fill (shuffle within 64 byte blocks)               :   2146.2 MB/s
360
 ---
361
 standard memcpy                                      :    978.3 MB/s (1.2%)
362
 standard memset                                      :   2146.9 MB/s (0.2%)
363
 ---
364
 NEON LDP/STP copy                                    :   1079.3 MB/s (0.9%)
365
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    741.7 MB/s (0.6%)
366
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    902.3 MB/s
367
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   1115.8 MB/s
368
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   1111.9 MB/s (0.2%)
369
 NEON LD1/ST1 copy                                    :   1034.6 MB/s (1.0%)
370
 NEON STP fill                                        :   2145.8 MB/s
371
 NEON STNP fill                                       :   2054.4 MB/s (1.7%)
372
 ARM LDP/STP copy                                     :   1072.7 MB/s (0.8%)
373
 ARM STP fill                                         :   2144.7 MB/s
374
 ARM STNP fill                                        :   2058.5 MB/s (1.4%)
375

    
376
==========================================================================
377
== Memory latency test                                                  ==
378
==                                                                      ==
379
== Average time is measured for random memory accesses in the buffers   ==
380
== of different sizes. The larger is the buffer, the more significant   ==
381
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
382
== accesses. For extremely large buffer sizes we are expecting to see   ==
383
== page table walk with several requests to SDRAM for almost every      ==
384
== memory access (though 64MiB is not nearly large enough to experience ==
385
== this effect to its fullest).                                         ==
386
==                                                                      ==
387
== Note 1: All the numbers are representing extra time, which needs to  ==
388
==         be added to L1 cache latency. The cycle timings for L1 cache ==
389
==         latency can be usually found in the processor documentation. ==
390
== Note 2: Dual random read means that we are simultaneously performing ==
391
==         two independent memory accesses at a time. In the case if    ==
392
==         the memory subsystem can't handle multiple outstanding       ==
393
==         requests, dual random read has the same timings as two       ==
394
==         single reads performed one after another.                    ==
395
==========================================================================
396

    
397
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
398
      1024 :    0.0 ns          /     0.0 ns
399
      2048 :    0.0 ns          /     0.0 ns
400
      4096 :    0.0 ns          /     0.0 ns
401
      8192 :    0.0 ns          /     0.0 ns
402
     16384 :    0.0 ns          /     0.0 ns
403
     32768 :    0.0 ns          /     0.0 ns
404
     65536 :    4.2 ns          /     7.3 ns
405
    131072 :    6.7 ns          /    10.8 ns
406
    262144 :    8.1 ns          /    12.0 ns
407
    524288 :   12.9 ns          /    17.7 ns
408
   1048576 :  130.9 ns          /   201.5 ns
409
   2097152 :  195.2 ns          /   259.6 ns
410
   4194304 :  234.1 ns          /   283.6 ns
411
   8388608 :  253.7 ns          /   290.5 ns
412
  16777216 :  263.3 ns          /   295.9 ns
413
  33554432 :  268.3 ns          /   301.2 ns
414
  67108864 :  283.6 ns          /   323.3 ns
415

    
416
block size : single random read / dual random read, [MADV_HUGEPAGE]
417
      1024 :    0.0 ns          /     0.0 ns
418
      2048 :    0.0 ns          /     0.0 ns
419
      4096 :    0.0 ns          /     0.0 ns
420
      8192 :    0.0 ns          /     0.0 ns
421
     16384 :    0.0 ns          /     0.0 ns
422
     32768 :    0.0 ns          /     0.0 ns
423
     65536 :    4.2 ns          /     7.5 ns
424
    131072 :    6.7 ns          /    10.8 ns
425
    262144 :    8.1 ns          /    12.4 ns
426
    524288 :   12.5 ns          /    18.4 ns
427
   1048576 :  130.8 ns          /   201.4 ns
428
   2097152 :  193.2 ns          /   257.4 ns
429
   4194304 :  222.4 ns          /   276.4 ns
430
   8388608 :  237.1 ns          /   283.0 ns
431
  16777216 :  244.8 ns          /   285.3 ns
432
  33554432 :  248.7 ns          /   286.0 ns
433
  67108864 :  250.7 ns          /   286.4 ns
434

    
435

    
436
RUN5:
437
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
438

    
439
==========================================================================
440
== Memory bandwidth tests                                               ==
441
==                                                                      ==
442
== Note 1: 1MB = 1000000 bytes                                          ==
443
== Note 2: Results for 'copy' tests show how many bytes can be          ==
444
==         copied per second (adding together read and writen           ==
445
==         bytes would have provided twice higher numbers)              ==
446
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
447
==         to first fetch data into it, and only then write it to the   ==
448
==         destination (source -> L1 cache, L1 cache -> destination)    ==
449
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
450
==         brackets                                                     ==
451
==========================================================================
452

    
453
 C copy backwards                                     :   1043.6 MB/s (1.4%)
454
 C copy backwards (32 byte blocks)                    :   1019.6 MB/s (1.5%)
455
 C copy backwards (64 byte blocks)                    :    940.8 MB/s (1.6%)
456
 C copy                                               :   1039.5 MB/s (1.2%)
457
 C copy prefetched (32 bytes step)                    :    844.6 MB/s (0.4%)
458
 C copy prefetched (64 bytes step)                    :    905.3 MB/s (0.5%)
459
 C 2-pass copy                                        :    824.2 MB/s
460
 C 2-pass copy prefetched (32 bytes step)             :    573.5 MB/s (0.2%)
461
 C 2-pass copy prefetched (64 bytes step)             :    291.8 MB/s
462
 C fill                                               :   2144.6 MB/s
463
 C fill (shuffle within 16 byte blocks)               :   2146.1 MB/s
464
 C fill (shuffle within 32 byte blocks)               :   2147.4 MB/s (0.2%)
465
 C fill (shuffle within 64 byte blocks)               :   2146.8 MB/s
466
 ---
467
 standard memcpy                                      :    980.4 MB/s (0.9%)
468
 standard memset                                      :   2147.9 MB/s (0.5%)
469
 ---
470
 NEON LDP/STP copy                                    :   1077.2 MB/s (0.6%)
471
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    734.3 MB/s
472
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    901.2 MB/s
473
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   1115.3 MB/s
474
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   1111.1 MB/s
475
 NEON LD1/ST1 copy                                    :   1037.7 MB/s (1.4%)
476
 NEON STP fill                                        :   2147.2 MB/s (0.2%)
477
 NEON STNP fill                                       :   2050.7 MB/s (1.6%)
478
 ARM LDP/STP copy                                     :   1076.5 MB/s (0.6%)
479
 ARM STP fill                                         :   2146.2 MB/s (0.1%)
480
 ARM STNP fill                                        :   2050.6 MB/s (2.8%)
481

    
482
==========================================================================
483
== Memory latency test                                                  ==
484
==                                                                      ==
485
== Average time is measured for random memory accesses in the buffers   ==
486
== of different sizes. The larger is the buffer, the more significant   ==
487
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
488
== accesses. For extremely large buffer sizes we are expecting to see   ==
489
== page table walk with several requests to SDRAM for almost every      ==
490
== memory access (though 64MiB is not nearly large enough to experience ==
491
== this effect to its fullest).                                         ==
492
==                                                                      ==
493
== Note 1: All the numbers are representing extra time, which needs to  ==
494
==         be added to L1 cache latency. The cycle timings for L1 cache ==
495
==         latency can be usually found in the processor documentation. ==
496
== Note 2: Dual random read means that we are simultaneously performing ==
497
==         two independent memory accesses at a time. In the case if    ==
498
==         the memory subsystem can't handle multiple outstanding       ==
499
==         requests, dual random read has the same timings as two       ==
500
==         single reads performed one after another.                    ==
501
==========================================================================
502

    
503
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
504
      1024 :    0.0 ns          /     0.0 ns
505
      2048 :    0.0 ns          /     0.0 ns
506
      4096 :    0.0 ns          /     0.0 ns
507
      8192 :    0.0 ns          /     0.0 ns
508
     16384 :    0.0 ns          /     0.0 ns
509
     32768 :    0.0 ns          /     0.0 ns
510
     65536 :    4.2 ns          /     7.4 ns
511
    131072 :    6.7 ns          /    11.2 ns
512
    262144 :    8.1 ns          /    12.0 ns
513
    524288 :   12.4 ns          /    18.3 ns
514
   1048576 :  130.7 ns          /   201.8 ns
515
   2097152 :  195.0 ns          /   259.5 ns
516
   4194304 :  234.0 ns          /   283.5 ns
517
   8388608 :  253.7 ns          /   290.4 ns
518
  16777216 :  263.3 ns          /   295.8 ns
519
  33554432 :  268.3 ns          /   301.1 ns
520
  67108864 :  282.3 ns          /   321.4 ns
521

    
522
block size : single random read / dual random read, [MADV_HUGEPAGE]
523
      1024 :    0.0 ns          /     0.0 ns
524
      2048 :    0.0 ns          /     0.0 ns
525
      4096 :    0.0 ns          /     0.0 ns
526
      8192 :    0.0 ns          /     0.0 ns
527
     16384 :    0.0 ns          /     0.0 ns
528
     32768 :    0.0 ns          /     0.0 ns
529
     65536 :    4.2 ns          /     7.6 ns
530
    131072 :    6.7 ns          /    10.7 ns
531
    262144 :    8.1 ns          /    11.9 ns
532
    524288 :   12.3 ns          /    18.5 ns
533
   1048576 :  130.8 ns          /   201.6 ns
534
   2097152 :  193.2 ns          /   257.3 ns
535
   4194304 :  222.5 ns          /   276.2 ns
536
   8388608 :  237.1 ns          /   282.9 ns
537
  16777216 :  244.8 ns          /   285.2 ns
538
  33554432 :  248.7 ns          /   286.0 ns
539
  67108864 :  250.6 ns          /   286.3 ns
540

    
541
RUN6: 
542
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
543

    
544
==========================================================================
545
== Memory bandwidth tests                                               ==
546
==                                                                      ==
547
== Note 1: 1MB = 1000000 bytes                                          ==
548
== Note 2: Results for 'copy' tests show how many bytes can be          ==
549
==         copied per second (adding together read and writen           ==
550
==         bytes would have provided twice higher numbers)              ==
551
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
552
==         to first fetch data into it, and only then write it to the   ==
553
==         destination (source -> L1 cache, L1 cache -> destination)    ==
554
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
555
==         brackets                                                     ==
556
==========================================================================
557

    
558
 C copy backwards                                     :   1034.0 MB/s (1.1%)
559
 C copy backwards (32 byte blocks)                    :   1009.6 MB/s (1.9%)
560
 C copy backwards (64 byte blocks)                    :    942.8 MB/s (1.2%)
561
 C copy                                               :   1020.0 MB/s (1.5%)
562
 C copy prefetched (32 bytes step)                    :    847.1 MB/s (0.6%)
563
 C copy prefetched (64 bytes step)                    :    899.7 MB/s (0.4%)
564
 C 2-pass copy                                        :    825.8 MB/s (0.2%)
565
 C 2-pass copy prefetched (32 bytes step)             :    575.8 MB/s (0.2%)
566
 C 2-pass copy prefetched (64 bytes step)             :    292.4 MB/s (0.1%)
567
 C fill                                               :   2146.5 MB/s (0.2%)
568
 C fill (shuffle within 16 byte blocks)               :   2144.4 MB/s
569
 C fill (shuffle within 32 byte blocks)               :   2146.4 MB/s
570
 C fill (shuffle within 64 byte blocks)               :   2146.5 MB/s
571
 ---
572
 standard memcpy                                      :    976.3 MB/s (1.2%)
573
 standard memset                                      :   2146.8 MB/s (0.3%)
574
 ---
575
 NEON LDP/STP copy                                    :   1070.0 MB/s (0.6%)
576
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    744.5 MB/s (0.8%)
577
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    902.0 MB/s (0.1%)
578
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   1115.7 MB/s
579
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   1111.5 MB/s (0.2%)
580
 NEON LD1/ST1 copy                                    :   1028.8 MB/s (1.5%)
581
 NEON STP fill                                        :   2145.2 MB/s
582
 NEON STNP fill                                       :   2062.5 MB/s (8.3%)
583
 ARM LDP/STP copy                                     :   1069.0 MB/s (0.6%)
584
 ARM STP fill                                         :   2146.5 MB/s
585
 ARM STNP fill                                        :   2060.2 MB/s (3.9%)
586

    
587
==========================================================================
588
== Memory latency test                                                  ==
589
==                                                                      ==
590
== Average time is measured for random memory accesses in the buffers   ==
591
== of different sizes. The larger is the buffer, the more significant   ==
592
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
593
== accesses. For extremely large buffer sizes we are expecting to see   ==
594
== page table walk with several requests to SDRAM for almost every      ==
595
== memory access (though 64MiB is not nearly large enough to experience ==
596
== this effect to its fullest).                                         ==
597
==                                                                      ==
598
== Note 1: All the numbers are representing extra time, which needs to  ==
599
==         be added to L1 cache latency. The cycle timings for L1 cache ==
600
==         latency can be usually found in the processor documentation. ==
601
== Note 2: Dual random read means that we are simultaneously performing ==
602
==         two independent memory accesses at a time. In the case if    ==
603
==         the memory subsystem can't handle multiple outstanding       ==
604
==         requests, dual random read has the same timings as two       ==
605
==         single reads performed one after another.                    ==
606
==========================================================================
607

    
608
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
609
      1024 :    0.0 ns          /     0.0 ns
610
      2048 :    0.0 ns          /     0.0 ns
611
      4096 :    0.0 ns          /     0.0 ns
612
      8192 :    0.0 ns          /     0.0 ns
613
     16384 :    0.0 ns          /     0.0 ns
614
     32768 :    0.0 ns          /     0.0 ns
615
     65536 :    4.2 ns          /     7.3 ns
616
    131072 :    6.7 ns          /    10.7 ns
617
    262144 :    8.1 ns          /    12.0 ns
618
    524288 :   12.5 ns          /    18.6 ns
619
   1048576 :  130.7 ns          /   201.5 ns
620
   2097152 :  195.1 ns          /   259.5 ns
621
   4194304 :  234.0 ns          /   283.4 ns
622
   8388608 :  253.7 ns          /   290.5 ns
623
  16777216 :  263.3 ns          /   295.9 ns
624
  33554432 :  268.4 ns          /   301.3 ns
625
  67108864 :  281.3 ns          /   320.1 ns
626

    
627
block size : single random read / dual random read, [MADV_HUGEPAGE]
628
      1024 :    0.0 ns          /     0.0 ns
629
      2048 :    0.0 ns          /     0.0 ns
630
      4096 :    0.0 ns          /     0.0 ns
631
      8192 :    0.0 ns          /     0.0 ns
632
     16384 :    0.0 ns          /     0.0 ns
633
     32768 :    0.0 ns          /     0.0 ns
634
     65536 :    4.2 ns          /     7.4 ns
635
    131072 :    6.7 ns          /    10.9 ns
636
    262144 :    8.1 ns          /    12.4 ns
637
    524288 :   12.8 ns          /    17.7 ns
638
   1048576 :  130.8 ns          /   201.5 ns
639
   2097152 :  193.2 ns          /   257.4 ns
640
   4194304 :  222.4 ns          /   276.3 ns
641
   8388608 :  237.2 ns          /   283.0 ns
642
  16777216 :  244.8 ns          /   285.3 ns
643
  33554432 :  248.7 ns          /   286.0 ns
644
  67108864 :  250.6 ns          /   286.4 ns
645

    
646

    
647
RUN7: (ECC enabled)
648
==========================================================================
649
== Memory bandwidth tests                                               ==
650
==                                                                      ==
651
== Note 1: 1MB = 1000000 bytes                                          ==
652
== Note 2: Results for 'copy' tests show how many bytes can be          ==
653
==         copied per second (adding together read and writen           ==
654
==         bytes would have provided twice higher numbers)              ==
655
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
656
==         to first fetch data into it, and only then write it to the   ==
657
==         destination (source -> L1 cache, L1 cache -> destination)    ==
658
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
659
==         brackets                                                     ==
660
==========================================================================
661

    
662
 C copy backwards                                     :    849.3 MB/s (1.3%)
663
 C copy backwards (32 byte blocks)                    :    870.2 MB/s (2.0%)
664
 C copy backwards (64 byte blocks)                    :    859.6 MB/s (1.2%)
665
 C copy                                               :    863.9 MB/s (1.5%)
666
 C copy prefetched (32 bytes step)                    :    687.7 MB/s (0.4%)
667
 C copy prefetched (64 bytes step)                    :    772.1 MB/s (0.4%)
668
 C 2-pass copy                                        :    795.9 MB/s
669
 C 2-pass copy prefetched (32 bytes step)             :    519.2 MB/s (1.2%)
670
 C 2-pass copy prefetched (64 bytes step)             :    281.5 MB/s (0.2%)
671
 C fill                                               :   1910.1 MB/s (0.1%)
672
 C fill (shuffle within 16 byte blocks)               :   1910.6 MB/s (0.2%)
673
 C fill (shuffle within 32 byte blocks)               :   1910.6 MB/s (0.1%)
674
 C fill (shuffle within 64 byte blocks)               :   1909.2 MB/s
675
 ---
676
 standard memcpy                                      :    879.2 MB/s (0.6%)
677
 standard memset                                      :   1910.2 MB/s (0.3%)
678
 ---
679
 NEON LDP/STP copy                                    :    895.4 MB/s (0.8%)
680
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    622.2 MB/s
681
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    753.3 MB/s
682
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   1002.9 MB/s
683
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   1002.4 MB/s
684
 NEON LD1/ST1 copy                                    :    877.8 MB/s (1.2%)
685
 NEON STP fill                                        :   1908.2 MB/s
686
 NEON STNP fill                                       :   1852.8 MB/s (2.1%)
687
 ARM LDP/STP copy                                     :    894.8 MB/s (0.4%)
688
 ARM STP fill                                         :   1911.3 MB/s (0.1%)
689
 ARM STNP fill                                        :   1850.2 MB/s (0.9%)
690

    
691
==========================================================================
692
== Memory latency test                                                  ==
693
==                                                                      ==
694
== Average time is measured for random memory accesses in the buffers   ==
695
== of different sizes. The larger is the buffer, the more significant   ==
696
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
697
== accesses. For extremely large buffer sizes we are expecting to see   ==
698
== page table walk with several requests to SDRAM for almost every      ==
699
== memory access (though 64MiB is not nearly large enough to experience ==
700
== this effect to its fullest).                                         ==
701
==                                                                      ==
702
== Note 1: All the numbers are representing extra time, which needs to  ==
703
==         be added to L1 cache latency. The cycle timings for L1 cache ==
704
==         latency can be usually found in the processor documentation. ==
705
== Note 2: Dual random read means that we are simultaneously performing ==
706
==         two independent memory accesses at a time. In the case if    ==
707
==         the memory subsystem can't handle multiple outstanding       ==
708
==         requests, dual random read has the same timings as two       ==
709
==         single reads performed one after another.                    ==
710
==========================================================================
711

    
712
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
713
      1024 :    0.0 ns          /     0.0 ns
714
      2048 :    0.0 ns          /     0.0 ns
715
      4096 :    0.0 ns          /     0.0 ns
716
      8192 :    0.0 ns          /     0.0 ns
717
     16384 :    0.0 ns          /     0.0 ns
718
     32768 :    0.0 ns          /     0.0 ns
719
     65536 :    4.2 ns          /     7.6 ns
720
    131072 :    6.7 ns          /    10.7 ns
721
    262144 :    8.1 ns          /    12.5 ns
722
    524288 :   12.8 ns          /    18.1 ns
723
   1048576 :  139.2 ns          /   215.9 ns
724
   2097152 :  207.9 ns          /   279.4 ns
725
   4194304 :  247.2 ns          /   305.5 ns
726
   8388608 :  267.7 ns          /   315.2 ns
727
  16777216 :  280.3 ns          /   321.5 ns
728
  33554432 :  288.0 ns          /   326.0 ns
729
  67108864 :  302.1 ns          /   347.4 ns
730

    
731
block size : single random read / dual random read, [MADV_HUGEPAGE]
732
      1024 :    0.0 ns          /     0.0 ns
733
      2048 :    0.0 ns          /     0.0 ns
734
      4096 :    0.0 ns          /     0.0 ns
735
      8192 :    0.0 ns          /     0.0 ns
736
     16384 :    0.0 ns          /     0.0 ns
737
     32768 :    0.0 ns          /     0.0 ns
738
     65536 :    4.2 ns          /     7.3 ns
739
    131072 :    6.7 ns          /    10.7 ns
740
    262144 :    8.1 ns          /    11.9 ns
741
    524288 :   12.6 ns          /    18.8 ns
742
   1048576 :  139.0 ns          /   215.6 ns
743
   2097152 :  206.0 ns          /   277.1 ns
744
   4194304 :  239.0 ns          /   296.4 ns
745
   8388608 :  255.0 ns          /   303.5 ns
746
  16777216 :  263.1 ns          /   306.7 ns
747
  33554432 :  267.3 ns          /   308.2 ns
748
  67108864 :  269.3 ns          /   309.0 ns
749

    
750
RUN8: 
751
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
752

    
753
==========================================================================
754
== Memory bandwidth tests                                               ==
755
==                                                                      ==
756
== Note 1: 1MB = 1000000 bytes                                          ==
757
== Note 2: Results for 'copy' tests show how many bytes can be          ==
758
==         copied per second (adding together read and writen           ==
759
==         bytes would have provided twice higher numbers)              ==
760
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
761
==         to first fetch data into it, and only then write it to the   ==
762
==         destination (source -> L1 cache, L1 cache -> destination)    ==
763
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
764
==         brackets                                                     ==
765
==========================================================================
766

    
767
 C copy backwards                                     :    881.9 MB/s (2.6%)
768
 C copy backwards (32 byte blocks)                    :    859.2 MB/s (1.6%)
769
 C copy backwards (64 byte blocks)                    :    847.6 MB/s (2.3%)
770
 C copy                                               :    869.8 MB/s (2.3%)
771
 C copy prefetched (32 bytes step)                    :    688.7 MB/s (0.7%)
772
 C copy prefetched (64 bytes step)                    :    770.7 MB/s (0.5%)
773
 C 2-pass copy                                        :    799.4 MB/s (0.2%)
774
 C 2-pass copy prefetched (32 bytes step)             :    520.7 MB/s (0.5%)
775
 C 2-pass copy prefetched (64 bytes step)             :    281.3 MB/s
776
 C fill                                               :   1911.5 MB/s (0.1%)
777
 C fill (shuffle within 16 byte blocks)               :   1911.2 MB/s (0.2%)
778
 C fill (shuffle within 32 byte blocks)               :   1911.5 MB/s (0.1%)
779
 C fill (shuffle within 64 byte blocks)               :   1911.2 MB/s (0.2%)
780
 ---
781
 standard memcpy                                      :    877.9 MB/s (0.7%)
782
 standard memset                                      :   1908.8 MB/s
783
 ---
784
 NEON LDP/STP copy                                    :    891.0 MB/s (0.5%)
785
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    624.9 MB/s (0.3%)
786
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    753.3 MB/s
787
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   1002.9 MB/s
788
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   1003.0 MB/s
789
 NEON LD1/ST1 copy                                    :    883.1 MB/s (1.6%)
790
 NEON STP fill                                        :   1910.8 MB/s (0.1%)
791
 NEON STNP fill                                       :   1853.0 MB/s (1.2%)
792
 ARM LDP/STP copy                                     :    889.2 MB/s
793
 ARM STP fill                                         :   1910.9 MB/s (0.3%)
794
 ARM STNP fill                                        :   1852.6 MB/s (1.0%)
795

    
796
==========================================================================
797
== Memory latency test                                                  ==
798
==                                                                      ==
799
== Average time is measured for random memory accesses in the buffers   ==
800
== of different sizes. The larger is the buffer, the more significant   ==
801
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
802
== accesses. For extremely large buffer sizes we are expecting to see   ==
803
== page table walk with several requests to SDRAM for almost every      ==
804
== memory access (though 64MiB is not nearly large enough to experience ==
805
== this effect to its fullest).                                         ==
806
==                                                                      ==
807
== Note 1: All the numbers are representing extra time, which needs to  ==
808
==         be added to L1 cache latency. The cycle timings for L1 cache ==
809
==         latency can be usually found in the processor documentation. ==
810
== Note 2: Dual random read means that we are simultaneously performing ==
811
==         two independent memory accesses at a time. In the case if    ==
812
==         the memory subsystem can't handle multiple outstanding       ==
813
==         requests, dual random read has the same timings as two       ==
814
==         single reads performed one after another.                    ==
815
==========================================================================
816

    
817
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
818
      1024 :    0.0 ns          /     0.0 ns
819
      2048 :    0.0 ns          /     0.0 ns
820
      4096 :    0.0 ns          /     0.0 ns
821
      8192 :    0.0 ns          /     0.0 ns
822
     16384 :    0.0 ns          /     0.0 ns
823
     32768 :    0.0 ns          /     0.0 ns
824
     65536 :    4.2 ns          /     7.4 ns
825
    131072 :    6.7 ns          /    10.9 ns
826
    262144 :    8.1 ns          /    12.7 ns
827
    524288 :   12.6 ns          /    19.5 ns
828
   1048576 :  139.1 ns          /   215.9 ns
829
   2097152 :  207.8 ns          /   279.3 ns
830
   4194304 :  247.1 ns          /   305.5 ns
831
   8388608 :  267.5 ns          /   314.9 ns
832
  16777216 :  280.3 ns          /   321.4 ns
833
  33554432 :  288.0 ns          /   326.0 ns
834
  67108864 :  303.2 ns          /   350.0 ns
835

    
836
block size : single random read / dual random read, [MADV_HUGEPAGE]
837
      1024 :    0.0 ns          /     0.0 ns
838
      2048 :    0.0 ns          /     0.0 ns
839
      4096 :    0.0 ns          /     0.0 ns
840
      8192 :    0.0 ns          /     0.0 ns
841
     16384 :    0.0 ns          /     0.0 ns
842
     32768 :    0.0 ns          /     0.0 ns
843
     65536 :    4.2 ns          /     7.6 ns
844
    131072 :    6.7 ns          /    10.7 ns
845
    262144 :    8.1 ns          /    11.9 ns
846
    524288 :   12.8 ns          /    17.6 ns
847
   1048576 :  139.2 ns          /   215.4 ns
848
   2097152 :  205.9 ns          /   277.0 ns
849
   4194304 :  239.0 ns          /   296.3 ns
850
   8388608 :  254.9 ns          /   303.4 ns
851
  16777216 :  263.1 ns          /   306.7 ns
852
  33554432 :  267.2 ns          /   308.2 ns
853
  67108864 :  269.2 ns          /   308.9 ns
854

    
855
RUN9:
856
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
857

    
858
==========================================================================
859
== Memory bandwidth tests                                               ==
860
==                                                                      ==
861
== Note 1: 1MB = 1000000 bytes                                          ==
862
== Note 2: Results for 'copy' tests show how many bytes can be          ==
863
==         copied per second (adding together read and writen           ==
864
==         bytes would have provided twice higher numbers)              ==
865
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
866
==         to first fetch data into it, and only then write it to the   ==
867
==         destination (source -> L1 cache, L1 cache -> destination)    ==
868
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
869
==         brackets                                                     ==
870
==========================================================================
871

    
872
 C copy backwards                                     :    876.8 MB/s (1.7%)
873
 C copy backwards (32 byte blocks)                    :    860.7 MB/s (2.2%)
874
 C copy backwards (64 byte blocks)                    :    855.5 MB/s (1.7%)
875
 C copy                                               :    862.2 MB/s (1.4%)
876
 C copy prefetched (32 bytes step)                    :    684.3 MB/s (0.4%)
877
 C copy prefetched (64 bytes step)                    :    773.2 MB/s (0.4%)
878
 C 2-pass copy                                        :    797.4 MB/s
879
 C 2-pass copy prefetched (32 bytes step)             :    506.8 MB/s (0.6%)
880
 C 2-pass copy prefetched (64 bytes step)             :    281.4 MB/s
881
 C fill                                               :   1910.6 MB/s (0.2%)
882
 C fill (shuffle within 16 byte blocks)               :   1910.6 MB/s (0.1%)
883
 C fill (shuffle within 32 byte blocks)               :   1911.3 MB/s (0.1%)
884
 C fill (shuffle within 64 byte blocks)               :   1910.4 MB/s
885
 ---
886
 standard memcpy                                      :    881.4 MB/s (0.8%)
887
 standard memset                                      :   1909.8 MB/s (0.6%)
888
 ---
889
 NEON LDP/STP copy                                    :    894.7 MB/s (0.5%)
890
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    626.9 MB/s (0.3%)
891
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    752.7 MB/s
892
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   1003.3 MB/s
893
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   1002.3 MB/s
894
 NEON LD1/ST1 copy                                    :    874.1 MB/s (0.9%)
895
 NEON STP fill                                        :   1910.3 MB/s (0.1%)
896
 NEON STNP fill                                       :   1847.1 MB/s (1.4%)
897
 ARM LDP/STP copy                                     :    895.6 MB/s (0.7%)
898
 ARM STP fill                                         :   1908.8 MB/s
899
 ARM STNP fill                                        :   1844.5 MB/s
900

    
901
==========================================================================
902
== Memory latency test                                                  ==
903
==                                                                      ==
904
== Average time is measured for random memory accesses in the buffers   ==
905
== of different sizes. The larger is the buffer, the more significant   ==
906
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
907
== accesses. For extremely large buffer sizes we are expecting to see   ==
908
== page table walk with several requests to SDRAM for almost every      ==
909
== memory access (though 64MiB is not nearly large enough to experience ==
910
== this effect to its fullest).                                         ==
911
==                                                                      ==
912
== Note 1: All the numbers are representing extra time, which needs to  ==
913
==         be added to L1 cache latency. The cycle timings for L1 cache ==
914
==         latency can be usually found in the processor documentation. ==
915
== Note 2: Dual random read means that we are simultaneously performing ==
916
==         two independent memory accesses at a time. In the case if    ==
917
==         the memory subsystem can't handle multiple outstanding       ==
918
==         requests, dual random read has the same timings as two       ==
919
==         single reads performed one after another.                    ==
920
==========================================================================
921

    
922
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
923
      1024 :    0.0 ns          /     0.0 ns
924
      2048 :    0.0 ns          /     0.0 ns
925
      4096 :    0.0 ns          /     0.0 ns
926
      8192 :    0.0 ns          /     0.0 ns
927
     16384 :    0.0 ns          /     0.0 ns
928
     32768 :    0.0 ns          /     0.0 ns
929
     65536 :    4.2 ns          /     7.7 ns
930
    131072 :    6.7 ns          /    11.2 ns
931
    262144 :    8.1 ns          /    12.8 ns
932
    524288 :   12.8 ns          /    19.2 ns
933
   1048576 :  139.0 ns          /   215.8 ns
934
   2097152 :  207.9 ns          /   279.4 ns
935
   4194304 :  247.0 ns          /   305.3 ns
936
   8388608 :  267.5 ns          /   315.0 ns
937
  16777216 :  280.3 ns          /   321.6 ns
938
  33554432 :  288.1 ns          /   326.0 ns
939
  67108864 :  301.5 ns          /   346.0 ns
940

    
941
block size : single random read / dual random read, [MADV_HUGEPAGE]
942
      1024 :    0.0 ns          /     0.0 ns
943
      2048 :    0.0 ns          /     0.0 ns
944
      4096 :    0.0 ns          /     0.0 ns
945
      8192 :    0.0 ns          /     0.0 ns
946
     16384 :    0.0 ns          /     0.0 ns
947
     32768 :    0.0 ns          /     0.0 ns
948
     65536 :    4.2 ns          /     7.6 ns
949
    131072 :    6.7 ns          /    11.2 ns
950
    262144 :    8.0 ns          /    12.8 ns
951
    524288 :   13.0 ns          /    19.4 ns
952
   1048576 :  139.1 ns          /   215.7 ns
953
   2097152 :  206.0 ns          /   277.1 ns
954
   4194304 :  239.0 ns          /   296.4 ns
955
   8388608 :  255.0 ns          /   303.5 ns
956
  16777216 :  263.2 ns          /   306.7 ns
957
  33554432 :  267.2 ns          /   308.2 ns
958
  67108864 :  269.3 ns          /   308.9 ns
959

    
960
RUN10: (Uboot changes: TCR/ASR disabled and ECC Enabled)
961

    
962
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
963

    
964
==========================================================================
965
== Memory bandwidth tests                                               ==
966
==                                                                      ==
967
== Note 1: 1MB = 1000000 bytes                                          ==
968
== Note 2: Results for 'copy' tests show how many bytes can be          ==
969
==         copied per second (adding together read and writen           ==
970
==         bytes would have provided twice higher numbers)              ==
971
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
972
==         to first fetch data into it, and only then write it to the   ==
973
==         destination (source -> L1 cache, L1 cache -> destination)    ==
974
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
975
==         brackets                                                     ==
976
==========================================================================
977

    
978
 C copy backwards                                     :    812.7 MB/s (1.6%)
979
 C copy backwards (32 byte blocks)                    :    807.0 MB/s (1.5%)
980
 C copy backwards (64 byte blocks)                    :    817.1 MB/s (1.5%)
981
 C copy                                               :    814.8 MB/s (1.6%)
982
 C copy prefetched (32 bytes step)                    :    651.8 MB/s (0.6%)
983
 C copy prefetched (64 bytes step)                    :    726.0 MB/s
984
 C 2-pass copy                                        :    770.5 MB/s
985
 C 2-pass copy prefetched (32 bytes step)             :    486.5 MB/s (0.7%)
986
 C 2-pass copy prefetched (64 bytes step)             :    269.8 MB/s
987
 C fill                                               :   1906.5 MB/s (0.2%)
988
 C fill (shuffle within 16 byte blocks)               :   1907.1 MB/s (1.2%)
989
 C fill (shuffle within 32 byte blocks)               :   1906.9 MB/s (0.2%)
990
 C fill (shuffle within 64 byte blocks)               :   1908.7 MB/s (0.1%)
991
 ---
992
 standard memcpy                                      :    836.3 MB/s (0.6%)
993
 standard memset                                      :   1905.6 MB/s
994
 ---
995
 NEON LDP/STP copy                                    :    849.5 MB/s (0.5%)
996
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    591.6 MB/s (0.4%)
997
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    713.3 MB/s
998
 NEON LDP/STP copy pldl1keep (32 bytes step)          :    947.3 MB/s (0.1%)
999
 NEON LDP/STP copy pldl1keep (64 bytes step)          :    947.9 MB/s
1000
 NEON LD1/ST1 copy                                    :    831.5 MB/s (1.2%)
1001
 NEON STP fill                                        :   1906.0 MB/s
1002
 NEON STNP fill                                       :   1843.3 MB/s (2.2%)
1003
 ARM LDP/STP copy                                     :    849.5 MB/s (0.4%)
1004
 ARM STP fill                                         :   1905.8 MB/s
1005
 ARM STNP fill                                        :   1842.1 MB/s (1.1%)
1006

    
1007

    
1008
==========================================================================
1009
== Memory latency test                                                  ==
1010
==                                                                      ==
1011
== Average time is measured for random memory accesses in the buffers   ==
1012
== of different sizes. The larger is the buffer, the more significant   ==
1013
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
1014
== accesses. For extremely large buffer sizes we are expecting to see   ==
1015
== page table walk with several requests to SDRAM for almost every      ==
1016
== memory access (though 64MiB is not nearly large enough to experience ==
1017
== this effect to its fullest).                                         ==
1018
==                                                                      ==
1019
== Note 1: All the numbers are representing extra time, which needs to  ==
1020
==         be added to L1 cache latency. The cycle timings for L1 cache ==
1021
==         latency can be usually found in the processor documentation. ==
1022
== Note 2: Dual random read means that we are simultaneously performing ==
1023
==         two independent memory accesses at a time. In the case if    ==
1024
==         the memory subsystem can't handle multiple outstanding       ==
1025
==         requests, dual random read has the same timings as two       ==
1026
==         single reads performed one after another.                    ==
1027
==========================================================================
1028

    
1029
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
1030
      1024 :    0.0 ns          /     0.0 ns
1031
      2048 :    0.0 ns          /     0.0 ns
1032
      4096 :    0.0 ns          /     0.0 ns
1033
      8192 :    0.0 ns          /     0.0 ns
1034
     16384 :    0.0 ns          /     0.0 ns
1035
     32768 :    0.0 ns          /     0.0 ns
1036
     65536 :    4.2 ns          /     7.6 ns
1037
    131072 :    6.7 ns          /    10.7 ns
1038
    262144 :    8.1 ns          /    11.9 ns
1039
    524288 :   12.9 ns          /    18.1 ns
1040
   1048576 :  143.2 ns          /   223.5 ns
1041
   2097152 :  215.7 ns          /   290.8 ns
1042
   4194304 :  255.7 ns          /   314.0 ns
1043
   8388608 :  275.4 ns          /   327.5 ns
1044
  16777216 :  285.6 ns          /   336.9 ns
1045
  33554432 :  291.5 ns          /   341.8 ns
1046
  67108864 :  305.3 ns          /   360.3 ns
1047

    
1048
block size : single random read / dual random read, [MADV_HUGEPAGE]
1049
      1024 :    0.0 ns          /     0.0 ns
1050
      2048 :    0.0 ns          /     0.0 ns
1051
      4096 :    0.0 ns          /     0.0 ns
1052
      8192 :    0.0 ns          /     0.0 ns
1053
     16384 :    0.0 ns          /     0.0 ns
1054
     32768 :    0.0 ns          /     0.0 ns
1055
     65536 :    4.2 ns          /     7.6 ns
1056
    131072 :    6.7 ns          /    11.2 ns
1057
    262144 :    8.1 ns          /    12.8 ns
1058
    524288 :   12.7 ns          /    19.1 ns
1059
   1048576 :  143.1 ns          /   223.4 ns
1060
   2097152 :  213.7 ns          /   288.5 ns
1061
   4194304 :  248.2 ns          /   308.0 ns
1062
   8388608 :  264.1 ns          /   313.7 ns
1063
  16777216 :  271.6 ns          /   315.9 ns
1064
  33554432 :  275.2 ns          /   316.8 ns
1065
  67108864 :  277.0 ns          /   317.3 ns
1066

    
1067
RUN 11:
1068
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
1069

    
1070
==========================================================================
1071
== Memory bandwidth tests                                               ==
1072
==                                                                      ==
1073
== Note 1: 1MB = 1000000 bytes                                          ==
1074
== Note 2: Results for 'copy' tests show how many bytes can be          ==
1075
==         copied per second (adding together read and writen           ==
1076
==         bytes would have provided twice higher numbers)              ==
1077
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
1078
==         to first fetch data into it, and only then write it to the   ==
1079
==         destination (source -> L1 cache, L1 cache -> destination)    ==
1080
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
1081
==         brackets                                                     ==
1082
==========================================================================
1083

    
1084
 C copy backwards                                     :    825.2 MB/s (2.2%)
1085
 C copy backwards (32 byte blocks)                    :    822.6 MB/s (1.5%)
1086
 C copy backwards (64 byte blocks)                    :    809.6 MB/s (1.4%)
1087
 C copy                                               :    817.7 MB/s (1.3%)
1088
 C copy prefetched (32 bytes step)                    :    655.5 MB/s (0.5%)
1089
 C copy prefetched (64 bytes step)                    :    728.1 MB/s (0.3%)
1090
 C 2-pass copy                                        :    770.6 MB/s
1091
 C 2-pass copy prefetched (32 bytes step)             :    493.9 MB/s (1.4%)
1092
 C 2-pass copy prefetched (64 bytes step)             :    269.8 MB/s
1093
 C fill                                               :   1907.0 MB/s
1094
 C fill (shuffle within 16 byte blocks)               :   1905.8 MB/s (0.2%)
1095
 C fill (shuffle within 32 byte blocks)               :   1903.7 MB/s
1096
 C fill (shuffle within 64 byte blocks)               :   1907.7 MB/s (0.2%)
1097
 ---
1098
 standard memcpy                                      :    838.0 MB/s (0.6%)
1099
 standard memset                                      :   1907.6 MB/s (0.1%)
1100
 ---
1101
 NEON LDP/STP copy                                    :    846.3 MB/s (0.5%)
1102
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    590.8 MB/s
1103
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    713.0 MB/s
1104
 NEON LDP/STP copy pldl1keep (32 bytes step)          :    946.7 MB/s (0.2%)
1105
 NEON LDP/STP copy pldl1keep (64 bytes step)          :    946.5 MB/s
1106
 NEON LD1/ST1 copy                                    :    838.6 MB/s (1.0%)
1107
 NEON STP fill                                        :   1907.1 MB/s (0.1%)
1108
 NEON STNP fill                                       :   1846.0 MB/s (1.2%)
1109
 ARM LDP/STP copy                                     :    847.6 MB/s (0.5%)
1110
 ARM STP fill                                         :   1904.7 MB/s
1111
 ARM STNP fill                                        :   1848.7 MB/s (1.2%)
1112

    
1113
==========================================================================
1114
== Memory latency test                                                  ==
1115
==                                                                      ==
1116
== Average time is measured for random memory accesses in the buffers   ==
1117
== of different sizes. The larger is the buffer, the more significant   ==
1118
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
1119
== accesses. For extremely large buffer sizes we are expecting to see   ==
1120
== page table walk with several requests to SDRAM for almost every      ==
1121
== memory access (though 64MiB is not nearly large enough to experience ==
1122
== this effect to its fullest).                                         ==
1123
==                                                                      ==
1124
== Note 1: All the numbers are representing extra time, which needs to  ==
1125
==         be added to L1 cache latency. The cycle timings for L1 cache ==
1126
==         latency can be usually found in the processor documentation. ==
1127
== Note 2: Dual random read means that we are simultaneously performing ==
1128
==         two independent memory accesses at a time. In the case if    ==
1129
==         the memory subsystem can't handle multiple outstanding       ==
1130
==         requests, dual random read has the same timings as two       ==
1131
==         single reads performed one after another.                    ==
1132
==========================================================================
1133

    
1134
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
1135
      1024 :    0.0 ns          /     0.0 ns
1136
      2048 :    0.0 ns          /     0.0 ns
1137
      4096 :    0.0 ns          /     0.0 ns
1138
      8192 :    0.0 ns          /     0.0 ns
1139
     16384 :    0.0 ns          /     0.0 ns
1140
     32768 :    0.0 ns          /     0.0 ns
1141
     65536 :    4.2 ns          /     7.4 ns
1142
    131072 :    6.7 ns          /    10.9 ns
1143
    262144 :    8.1 ns          /    11.9 ns
1144
    524288 :   13.1 ns          /    18.9 ns
1145
   1048576 :  143.4 ns          /   224.1 ns
1146
   2097152 :  216.0 ns          /   291.1 ns
1147
   4194304 :  256.0 ns          /   314.5 ns
1148
   8388608 :  275.6 ns          /   327.5 ns
1149
  16777216 :  286.0 ns          /   337.5 ns
1150
  33554432 :  291.8 ns          /   342.3 ns
1151
  67108864 :  308.0 ns          /   364.8 ns
1152

    
1153
block size : single random read / dual random read, [MADV_HUGEPAGE]
1154
      1024 :    0.0 ns          /     0.0 ns
1155
      2048 :    0.0 ns          /     0.0 ns
1156
      4096 :    0.0 ns          /     0.0 ns
1157
      8192 :    0.0 ns          /     0.0 ns
1158
     16384 :    0.0 ns          /     0.0 ns
1159
     32768 :    0.0 ns          /     0.0 ns
1160
     65536 :    4.2 ns          /     7.4 ns
1161
    131072 :    6.7 ns          /    10.8 ns
1162
    262144 :    8.1 ns          /    12.5 ns
1163
    524288 :   13.1 ns          /    19.5 ns
1164
   1048576 :  143.4 ns          /   223.6 ns
1165
   2097152 :  213.8 ns          /   288.7 ns
1166
   4194304 :  248.5 ns          /   308.1 ns
1167
   8388608 :  264.3 ns          /   313.9 ns
1168
  16777216 :  271.8 ns          /   316.1 ns
1169
  33554432 :  275.4 ns          /   317.0 ns
1170
  67108864 :  277.2 ns          /   317.6 ns
1171

    
1172
RUN12:
1173
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
1174

    
1175
==========================================================================
1176
== Memory bandwidth tests                                               ==
1177
==                                                                      ==
1178
== Note 1: 1MB = 1000000 bytes                                          ==
1179
== Note 2: Results for 'copy' tests show how many bytes can be          ==
1180
==         copied per second (adding together read and writen           ==
1181
==         bytes would have provided twice higher numbers)              ==
1182
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
1183
==         to first fetch data into it, and only then write it to the   ==
1184
==         destination (source -> L1 cache, L1 cache -> destination)    ==
1185
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
1186
==         brackets                                                     ==
1187
==========================================================================
1188

    
1189
 C copy backwards                                     :    817.2 MB/s (2.0%)
1190
 C copy backwards (32 byte blocks)                    :    821.4 MB/s (1.7%)
1191
 C copy backwards (64 byte blocks)                    :    812.7 MB/s (1.2%)
1192
 C copy                                               :    815.8 MB/s (1.2%)
1193
 C copy prefetched (32 bytes step)                    :    651.9 MB/s (0.5%)
1194
 C copy prefetched (64 bytes step)                    :    729.5 MB/s (0.3%)
1195
 C 2-pass copy                                        :    771.5 MB/s
1196
 C 2-pass copy prefetched (32 bytes step)             :    499.8 MB/s (2.0%)
1197
 C 2-pass copy prefetched (64 bytes step)             :    269.8 MB/s
1198
 C fill                                               :   1905.1 MB/s
1199
 C fill (shuffle within 16 byte blocks)               :   1907.2 MB/s (0.2%)
1200
 C fill (shuffle within 32 byte blocks)               :   1907.4 MB/s (0.2%)
1201
 C fill (shuffle within 64 byte blocks)               :   1908.2 MB/s (0.1%)
1202
 ---
1203
 standard memcpy                                      :    839.3 MB/s (0.4%)
1204
 standard memset                                      :   1909.1 MB/s (0.3%)
1205
 ---
1206
 NEON LDP/STP copy                                    :    848.9 MB/s (0.5%)
1207
 NEON LDP/STP copy pldl2strm (32 bytes step)          :    592.0 MB/s (0.3%)
1208
 NEON LDP/STP copy pldl2strm (64 bytes step)          :    712.9 MB/s
1209
 NEON LDP/STP copy pldl1keep (32 bytes step)          :    947.4 MB/s
1210
 NEON LDP/STP copy pldl1keep (64 bytes step)          :    946.3 MB/s
1211
 NEON LD1/ST1 copy                                    :    829.7 MB/s (1.2%)
1212
 NEON STP fill                                        :   1907.5 MB/s (0.2%)
1213
 NEON STNP fill                                       :   1840.8 MB/s (2.3%)
1214
 ARM LDP/STP copy                                     :    849.9 MB/s (0.5%)
1215
 ARM STP fill                                         :   1908.4 MB/s (0.2%)
1216
 ARM STNP fill                                        :   1843.1 MB/s (0.5%)
1217

    
1218
==========================================================================
1219
== Memory latency test                                                  ==
1220
==                                                                      ==
1221
== Average time is measured for random memory accesses in the buffers   ==
1222
== of different sizes. The larger is the buffer, the more significant   ==
1223
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
1224
== accesses. For extremely large buffer sizes we are expecting to see   ==
1225
== page table walk with several requests to SDRAM for almost every      ==
1226
== memory access (though 64MiB is not nearly large enough to experience ==
1227
== this effect to its fullest).                                         ==
1228
==                                                                      ==
1229
== Note 1: All the numbers are representing extra time, which needs to  ==
1230
==         be added to L1 cache latency. The cycle timings for L1 cache ==
1231
==         latency can be usually found in the processor documentation. ==
1232
== Note 2: Dual random read means that we are simultaneously performing ==
1233
==         two independent memory accesses at a time. In the case if    ==
1234
==         the memory subsystem can't handle multiple outstanding       ==
1235
==         requests, dual random read has the same timings as two       ==
1236
==         single reads performed one after another.                    ==
1237
==========================================================================
1238

    
1239
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
1240
      1024 :    0.0 ns          /     0.0 ns
1241
      2048 :    0.0 ns          /     0.0 ns
1242
      4096 :    0.0 ns          /     0.0 ns
1243
      8192 :    0.0 ns          /     0.0 ns
1244
     16384 :    0.0 ns          /     0.0 ns
1245
     32768 :    0.0 ns          /     0.0 ns
1246
     65536 :    4.2 ns          /     7.6 ns
1247
    131072 :    6.7 ns          /    11.2 ns
1248
    262144 :    8.1 ns          /    12.8 ns
1249
    524288 :   12.8 ns          /    18.4 ns
1250
   1048576 :  143.2 ns          /   223.6 ns
1251
   2097152 :  215.7 ns          /   290.9 ns
1252
   4194304 :  255.7 ns          /   314.0 ns
1253
   8388608 :  275.4 ns          /   327.2 ns
1254
  16777216 :  285.7 ns          /   337.1 ns
1255
  33554432 :  291.5 ns          /   342.0 ns
1256
  67108864 :  307.0 ns          /   363.3 ns
1257

    
1258
block size : single random read / dual random read, [MADV_HUGEPAGE]
1259
      1024 :    0.0 ns          /     0.0 ns
1260
      2048 :    0.0 ns          /     0.0 ns
1261
      4096 :    0.0 ns          /     0.0 ns
1262
      8192 :    0.0 ns          /     0.0 ns
1263
     16384 :    0.0 ns          /     0.0 ns
1264
     32768 :    0.0 ns          /     0.0 ns
1265
     65536 :    4.2 ns          /     7.6 ns
1266
    131072 :    6.7 ns          /    11.2 ns
1267
    262144 :    8.1 ns          /    12.8 ns
1268
    524288 :   12.8 ns          /    19.6 ns
1269
   1048576 :  143.3 ns          /   223.8 ns
1270
   2097152 :  213.6 ns          /   288.4 ns
1271
   4194304 :  248.2 ns          /   308.0 ns
1272
   8388608 :  264.1 ns          /   313.7 ns
1273
  16777216 :  271.6 ns          /   315.9 ns
1274
  33554432 :  275.3 ns          /   316.8 ns
1275
  67108864 :  277.0 ns          /   317.3 ns
(1-1/2) Go to top
Add picture from clipboard (Maximum size: 1 GB)