Author |
Message |
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
-For Linux 64bit Intel or AMD-
I have converted part of the AP26 application to SIMD SSE2 code.
The application is here: http://www.megaupload.com/?d=SCEBSEOE
I would like to know the speeds of various kinds of processors. If you can run the application with the computer idle with the command line:
time ./ap26-x86_64sse2-linux 366384 366384 0
and compare the results with this file: http://www.megaupload.com/?d=GKOG5SSR
using the command:
cmp SOL-AP26.txt TEST-366384.txt
The results should match exactly. Please post your speeds and results compared to the current version 1.03 official application. Thanks!
-Bryan Little
____________
|
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
Test computer: Intel Core2 Quad Q6600 @ 2.4ghz
new app: 3m 42s
1.03 app: 4m 30s
results file matches
____________
|
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
Ok after testing my AMD computer I realized I was getting some register spillover and cut the SIMD code back some. Updated version 2 is here:
http://www.megaupload.com/?d=OSXO7YME
Original version in post 1 has been removed.
Test Computer: AMD Athlon 64x2 6000+ @ 3.0ghz
new app: 3m 22s
1.03 app: 4m 2s
results file matches
---------------------------------------------------------------------------------
Test computer: Intel Core2 Quad Q6600 @ 2.4ghz
new app: 3m 13s
1.03 app: 4m 30s
results file matches
____________
|
|
|
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 5,051
                    
|
Testcomputer: AMD Phenom @ 2.2 GHz running ubuntu linux 64-bit (server) - klick
new app stdout wrote: real 3m43.335s
user 3m41.230s
sys 0m0.010s
regular primegrid appstdout wrote: real 5m1.110s
user 4m58.820s
sys 0m0.000s
result file matches
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 5,051
                    
|
Testcomputer: Intel Core2Duo @2.2 GHz running ubuntu linux 64-bit (server) - klick
new app stdout wrote: real 3m31.543s
user 3m28.920s
sys 0m0.070s
regular primegrid app stdout wrote: real 4m57.438s
user 4m54.470s
sys 0m0.110s
result file matches
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
vasm Volunteer tester
 Send message
Joined: 6 Dec 08 Posts: 47 ID: 32604 Credit: 990,892 RAC: 0
                
|
Test computer: Intel C2D Mobile T7500 @ 2.2GHz
new app: 3m 31s
1.03 app: 4m 56s
results file matches |
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 785 ID: 18447 Credit: 262,879,636 RAC: 16,009
                     
|
I can't download the app:
Unfortunately, the link you have clicked is not available.
Reasons for this may include:
- Invalid link
- The file has been deleted because it was violating our Terms of service.
____________
|
|
|
blahVolunteer tester Send message
Joined: 27 Sep 08 Posts: 19 ID: 29724 Credit: 3,454,807 RAC: 0
         
|
Test computer: Intel C2Q Q6600 @ 3.15 GHz running ubuntu linux 64-bit
New App
real 2m26.771s
user 2m24.769s
sys 0m0.008s
1.03 App
real 3m25.230s
user 3m23.241s
sys 0m0.012s
result file matches |
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
I can't download the app:
Sorry, I changed the application location after 60 minutes of creating this thread, so I was unable to edit it, see post 3:
http://www.megaupload.com/?d=OSXO7YME
Original version in post 1 has been removed.
Thanks for all who have tested the application so far, results are looking very good. It's good to see AMD processors running faster, also. This app would be able to be used as official if Rytis wants to, as all Intel and AMD 64bit processors have SSE2. This is a BOINC-ready application built and linked static with Intel Compiler 11.1.
Does anyone have a 64bit Pentium 4 to test? It would be interesting to see the speed increase.
____________
|
|
|
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 5,051
                    
|
Does anyone have a 64bit Pentium 4 to test? It would be interesting to see the speed increase.
me not - sorry
am I right? you used SIMD to run more in parallel ?? does this means a increase of temps, because of more parallel work on cpu ??
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 785 ID: 18447 Credit: 262,879,636 RAC: 16,009
                     
|
Does anyone have a 64bit Pentium 4 to test? It would be interesting to see the speed increase.
I have a 64bit Pentium D - I will try to test this next time I can get to it.
____________
|
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
am I right? you used SIMD to run more in parallel ?? does this means a increase of temps, because of more parallel work on cpu ??
More work in parallel AND 16 more registers are available to the application means less spill from registers to cache. The code changes were made in accessing the OKOK arrays in SITO.H, which is in an inner-most loop that the program spends considerable time in.
I do not know about any increases in CPU temperature. Since SSE2 is using a part of the CPU that wasn't being used much before, then possibly. Previous applications only used compiler-generated SSE2 vectorization for setting up the OK arrays, which is only done 3 times per workunit. So really, no major speed increase until now, where SSE2 was manually applied to more important code that the compiler cannot auto-vectorize.
____________
|
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 785 ID: 18447 Credit: 262,879,636 RAC: 16,009
                     
|
Ok - here we go:
Intel(R) Pentium(R) D CPU 3.00GHz Ubuntu 9.04 amd64
ap26-x86_64sse2-linux_v2
344.70 user
0.05 system
5:47.29 elapsed
primegrid_ap26_1.03_x86_64-pc-linux-gnu
411.79 user
0.02 system
6:54.29 elapsed
Results matched
I'll run this on these when I can get to them unless you say you don't need them:
AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ @ 2.10GHz Ubuntu 8.04 amd64
AMD Athlon(tm) 64 X2 Dual Core Processor 3600+ @ 1.90GHz Ubuntu 9.04 amd64
Intel Pentium(R) Dual-Core CPU E5300 @ 2.60GHz Ubuntu 9.04 amd64
Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz Ubuntu 9.04 amd64
____________
|
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
Thanks for checking the Pentium D. Excellent results.
Don't really need the others to be tested, we have at least one of every chip architecture tested already. Thank you.
____________
|
|
|
RytisVolunteer moderator Project administrator
 Send message
Joined: 22 Jun 05 Posts: 2649 ID: 1 Credit: 26,363,112 RAC: 695
                    
|
I'm making this version public. Thanks to everyone.
____________
|
|
|
|
All I can say is WOW!
Even on this ol' dinosaur - AMD Athlon(tm) 64 Processor 3000+ [Ubuntu 9.04] this makes for one heck of an improvement in run-time.
A BOINC managed w/u with 1.03 app averaged 922 seconds; with the 1.04 app the BOINC managed (and happily validated) w/u completed in 720 seconds.
____________
There's someone in our head but it's not us. |
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 785 ID: 18447 Credit: 262,879,636 RAC: 16,009
                     
|
Good stuff all round!
Does the 32bit version have any cpu checking that would allow use of SSE2 or SSE if the chip supported it?
Or is the app not structured that way? (or has this already been done?)
____________
|
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
Good stuff all round!
Yes, right now Linux 64-bit is about twice as fast as Winows 64bit app on Intel Core2 :)
Does the 32bit version have any cpu checking that would allow use of SSE2 or SSE if the chip supported it?
Or is the app not structured that way? (or has this already been done?)
I plan to work on a 32bit version, without dropping support for older, non-SSE processors.
____________
|
|
|
vasm Volunteer tester
 Send message
Joined: 6 Dec 08 Posts: 47 ID: 32604 Credit: 990,892 RAC: 0
                
|
Since this new app was introduced I've started getting some errored out workunits (roughly 1 every 20).
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
SIGSEGV: segmentation violation
Is anyone else noticing something similar or is my laptop cracking up?
|
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
Any idea what temperatures the CPU is running?
just checked my 3 hosts and I have one wu that had a similar error.
____________
|
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 785 ID: 18447 Credit: 262,879,636 RAC: 16,009
                     
|
SEGV normally means a pointer to invalid memory space - often a programming error (ie ptr to 0 or -1) but sometimes caused by memory corruption due to temperature or similar.
____________
|
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
If this shows to be a problem, then GCC compiled app can be tested.
Since the SSE2 code is very simple (about 45 lines of code), I don't think that's the problem. I think the problem may be with the binary generated by the Intel Compiler. I have compiled with GCC 4.3.2 and the results compared to PrimeGrid's GCC compiled app and various others:
--------------------------------------------------------------
CPU: Intel Core2 Quad Q6600 @ 2.4ghz
primegrid_ap26_1.00_x86_64-pc-linux-gnu : 5m 23s
Dec '08 source compiled with GCC 4.3.2: 6m 7s
March '09 source compiled with GCC 4.3.2: 4m 56s
SSE2 modified source, Intel compiler: 3m 13s
SSE2 modified source, GCC compiler: 3m 25s
---------------------------------------------------------------
CPU AMD Athlon64 6000+ @ 3.0ghz
primegrid_ap26_1.00_x86_64-pc-linux-gnu : 3m 43s
SSE2 modified source, Intel compiler: 3m 23s
SSE2 modified source, GCC compiler: 3m 38s
---------------------------------------------------------------
It appears that the Intel Compiler increases speed a little over GCC in both AMD/Intel CPUs with SSE2 code. Also, Intel CPUs have a huge advantage with SSE2 on both compilers, while this AMD processor just a few seconds faster.
Comparing the ldd of the GCC-SSE2 app and PrimeGrid app:
ldd primegrid_ap26_1.00_x86_64-pc-linux-gnu
linux-vdso.so.1 => (0x00007fffcc1fe000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003692e00000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003684400000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003691200000)
libc.so.6 => /lib64/libc.so.6 (0x0000003683800000)
libm.so.6 => /lib64/libm.so.6 (0x0000003683c00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003683400000)
ldd ap26-gcc-sse2-x86_64
linux-vdso.so.1 => (0x00007fff50dfe000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003684400000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003692e00000)
libm.so.6 => /lib64/libm.so.6 (0x0000003683c00000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003691200000)
libc.so.6 => /lib64/libc.so.6 (0x0000003683800000)
/lib64/ld-linux-x86-64.so.2 (0x0000003683400000)
Matches, so using the GCC app linked dynamically should be ok it seems.
____________
|
|
|
vasm Volunteer tester
 Send message
Joined: 6 Dec 08 Posts: 47 ID: 32604 Credit: 990,892 RAC: 0
                
|
Any idea what temperatures the CPU is running?
just checked my 3 hosts and I have one wu that had a similar error.
The CPU was running at about 72C (quite hot, but during this summer I've seen it peak at 77-78C doing LLR and didn't give any errors).
I set it single-core and let it run for 24 hrs (at 63-64C) and there were no errors.
I guess I'll give the fan a good clean-up, re-enable both cores and see if the errors return or not. |
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
Ok, hopefully that will fix the errors you were having.
I've been monitoring my 3 hosts running the app and they are running fine so far. I also browsed around the top hosts list and it seems no errors on those machines, either.
____________
|
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
I've noticed a trend in the hosts having random SIGSEV errors. All I have seen, including mine, are running a Linux kernel 2.6.28 (most likely Ubuntu).
Saw this post on another BOINC forum:
Hello!
I do have "some" problems with my linux 64 bits host All seems to end in a SigSegv error code 193 on this host!
When looking furhter into some of the wu's other have problems too; could it be some error in the work units?
Posted by Crunch3r:
No it's not the fault of the WU. The cause of this is Ubuntu 9.04 and it's kernel.
It's messed up.
The same behavior has been seen over at S@H, running optimized apps...
This isn't a very technical explanation, but hopefully it will help anyone having problems.
____________
|
|
|
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 5,051
                    
|
I am running Ubuntu 64-bit Kernel 2.6.28-15-server and have not seen many SIGSEGV errors.
Can you specify the corrupt kernel-version more exactly ?
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
I am running Ubuntu 64-bit Kernel 2.6.28-15-server and have not seen many SIGSEGV errors.
Can you specify the corrupt kernel-version more exactly ?
My Q6600 running Ubuntu reads "2.6.28-11-generic", and every host I've seen with SIGSEGV errors have the same kernel. The Q6600 development box running Fedora 10 has not had the problem. My AMD running Ubuntu 2.6.28-11-generic has not had a problem. So it's all speculation, really. Maybe it's just with Intel processors on that kernel.
____________
|
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 785 ID: 18447 Credit: 262,879,636 RAC: 16,009
                     
|
Try patching the kernel to current version cos 2.6.28-11 is old and 2.6.28-15 is current.
I have no issues with the Intel/Ubuntu combination for PrimeGrid.
update-manager is your friend!
____________
|
|
|
|
I had a few segmentation violation errors with the Intel / Ubuntu 2.6.28-15-generic combination out of hundreds of tasks.
A few older ones are already purged.
____________
|
|
|
|
I've noticed a trend in the hosts having random SIGSEV errors. All I have seen, including mine, are running a Linux kernel 2.6.28 (most likely Ubuntu).
I've had a few, here's one http://www.primegrid.com/result.php?resultid=126711310
Host is running Ubuntu 9.04 64 (2.6.28-15-generic) on a T5800 C2D.
Not too worried as the WUs are processed so quickly now, (thanks Bryan), that one in a while is not a problem.
Pete.
____________
35 x 2^3587843+1 is prime! |
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 224 ID: 38042 Credit: 860,116,790 RAC: 9,599
                      
|
I switched a q6600 from Ubuntu to Fedora 12 alpha (2.6.31-0.125.4.2.rc5.git2.fc12.x86_64) and have not had an error yet. So something strange going on with the Ubuntu 2.6.28 Kernel. But not even a big deal, it was only 4 errrors out of hundreds of workunits.
____________
|
|
|