Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
The Riesel Problem :
Optimal number of instances on i5 and i7 for Reisel problem
Author |
Message |
|
I have several i5s and i7s running the Reisel problem when they aren't doing something else. What is the optimum number of threads to run on th i5 and on the i7, for the maximum throughput? (I generally have four running on each.) | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
I have several i5s and i7s running the Reisel problem when they aren't doing something else. What is the optimum number of threads to run on th i5 and on the i7, for the maximum throughput? (I generally have four running on each.)
If you're not also running a GPU in those computers, my advice would be as follows:
1) On the i5s, run all 4 cores
2) On the i7s, run 50% (i.e., 4 cores), *OR* disable hyperthreading in the BIOS and run 100% (also 4 cores).
3) Use app_config to run TRP in multi-threaded mode, using 4 threads for the task.
You will then be running a single TRP task on all 4 cores (all four "full" cores in the case of the i7s). That's the most efficient way to run TRP on those computers.
app_config.xml should look like this:
<app_config>
<app>
<name>llrTRP</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrTRP</app_name>
<cmdline>-t 4</cmdline>
<avg_ncpus>4</avg_ncpus>
</app_version>
</app_config>
The app_config.xml file should be in C:\ProgramData\BOINC\projects\www.primegrid.com\
assuming you're running a normal Windows installation.
If you ARE also running a GPU, you *might* want to leave a CPU core free to service the GPU. You'll get more done on the GPU, but less on the CPU. Note that in this scenario you might want to leave hyperthreading turned on on the i7s and use the hyperthreads to service the GPU. Leave HT on and set the number of CPUs to 50%.
To leave a (full) core free for a GPU when there's no hyperthreading (either the i5s, or if you have turned off hyperthreading on the i7s), set the number of CPUs to 75% and change "4" to 3" in the two lines near the end of the app_config.xml file.
____________
My lucky number is 75898524288+1 | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 885 ID: 370496 Credit: 334,085,845 RAC: 0
                  
|
I have several i5s and i7s running the Reisel problem when they aren't doing something else. What is the optimum number of threads to run on th i5 and on the i7, for the maximum throughput? (I generally have four running on each.)
That's... not an easy question. There many thing that affect performance, just knowing if it's an i5 or i7 isn't really helpful. And while we can see your PCs to know their clocks and generations, it gives us no info on RAM, which also plays a major factor. And even then, it's still hard to guess performance just based on specs - this is how difficult of a question we're talking about.
If you want as much precision as possible on the answer, download the BETA Prime95 29.2 and use the benchmark feature. Please look at the print below:
http://i.imgur.com/aCwQ5lJ.png
*EDIT: on the "number of workers to benchmark" field, it should be 1,2,4, not 1,4. Oops, made a typo on the print.
On the right, the benchmark settings you should use (for Riesel, 864k FFT). On the left, the results of a quick run I did on one of my PCs. The first number shows 4 cores crunching a single WU. The second is for 2 cores processing one unit, with 2 units running at the same time. And the last refers to each core running it's own WU. On my PC, seems like 4 cores one 1 unit is the best; on yours, it might be 2 units with 2 cores each, or maybe even 1c for each unit. Who knows? | |
|
|
There's good answers already. While not directly relevant here, it may be in the near future with ever more cores in CPUs. There seems to be some inefficiency in running more than around 8 cores, and in that situation running into the additional threads beyond real cores seemed to help. Safest option, try it all and see how it actually responds.
The Prime95 benchmark is a good indicator, but I suspect things can run a little differently in LLR, so don't purely rely on P95. | |
|
|
Multithread results:
CPU time ~= Real computation time * Core count
Run time = ... WTF? | |
|
|
You're seeing stacked run time due to grabbing more than 1 WU at the start of a multi-threading run. This is a known BOINC issue, not PGs fault.
If you're trying to get an average run time either don't use the obviously high ones or work through and subtract the times of the ones done before it. | |
|
|
w/o HT:
Timings for 960K all-complex FFT length (6 cores, 1 worker): 0.61 ms. Throughput: 1637.29 iter/sec.
Timings for 960K all-complex FFT length (6 cores, 2 workers): 1.25, 1.24 ms. Throughput: 1610.60 iter/sec.
Timings for 960K all-complex FFT length (6 cores, 3 workers): 2.26, 2.25, 2.25 ms. Throughput: 1332.79 iter/sec.
Timings for 960K all-complex FFT length (6 cores, 4 workers): 4.72, 4.64, 2.32, 2.36 ms. Throughput: 1281.87 iter/sec.
Timings for 960K all-complex FFT length (6 cores, 5 workers): 4.88, 4.85, 4.86, 4.86, 2.30 ms. Throughput: 1257.76 iter/sec.
Timings for 960K all-complex FFT length (6 cores, 6 workers): 5.27, 5.20, 5.22, 5.21, 5.16, 5.22 ms. Throughput: 1150.77 iter/sec.
with HT:
Timings for 1280K all-complex FFT length (6 cores, 1 worker): 0.81 ms. Throughput: 1230.40 iter/sec.
Timings for 1280K all-complex FFT length (6 cores hyperthreaded, 1 worker): 0.77 ms. Throughput: 1296.76 iter/sec.
large FFT with HT:
Timings for 1920K all-complex FFT length (6 cores, 1 worker): 1.32 ms. Throughput: 760.07 iter/sec.
Timings for 1920K all-complex FFT length (6 cores hyperthreaded, 1 worker): 1.39 ms. Throughput: 719.25 iter/sec.
P.S. on i7-8700K lock 4000MHz | |
|
|
TRP (Credit = 4,061)
1. Intel(R) Core(TM) i7-3720QM CPU @ 2.6GHz
Using AVX FFT length 864K, Pass1=384, Pass2=2304, 4 threads
Real Time = 19,523 s
CPU Time = 76,300 s
PPD ~= 18k
CPU Power = 30W
Efficiency math = 1730
Efficiency power = 600
2. Intel(R) Core(TM) i7-4600U CPU @ 2.2GHz
Using FMA3 FFT length 864K, Pass1=384, Pass2=2304, 2 threads
Real Time = 31,010 s
CPU Time = 61,307 s
PPD ~= 11.3k
CPU Power = 17W
Efficiency math = 2568
Efficiency power = 665
3. Intel(R) Core(TM) i7-8700K CPU @ 4.0GHz
Using FMA3 FFT length 864K, Pass1=384, Pass2=2304, 6 threads
Real Time = 5,105 s
CPU Time = 30,237 s
PPD ~= 68.7k
CPU Power = 115W
Efficiency math = 2862
Efficiency power = 600
P.S. Efficiency math = PPD / (Frequency * Core), Efficiency math = PPD / Power | |
|
Message boards :
The Riesel Problem :
Optimal number of instances on i5 and i7 for Reisel problem |