PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : Number crunching : Alternative Platforms

Author Message
Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 23866 - Posted: 18 May 2010 | 17:40:49 UTC

Hi All,

Just looking for some feedback on what my options are here....


Is there any variant of LLR or sieve that will compile/run on PPC/Linux?

I have a 1.6 GHz PowerPC G5 running Gentoo, and right now I'm primarily using it as my server, but I would like to see if there is the possibility of running PG and/or PRPNet on it...otherwise my only other option is SETI@Home and I would MUCH rather contribute to PG... :)

I've managed to get the PRPNet client software to compile and execute successfully, but I was unable to get any implementation of LLR or the ppsieve test to compile...probably because it is x86 specific and I don't know anywhere near enough code to make it work on PPC...

Also, I was unable to find the source for Genefer or PFGW, I would be willing to try to create PPC Linux builds for those programs in order to run PRPNet if the source was available somewhere....in hindsight, it probably is available, I'm just not looking in the right places... ;)


So, is this just a pipe dream? or would it be feasible/practical to have a non-x86 (PPC) LLR or sieve implementation? I'm fairly certain (based on the applications list) that a OSX/PPC sieve (probably sr2sieve) implementation does exist, so that should at least be relatively easy to port to Linux (correct me if I'm wrong)

Again, I apologize for this, but I don't know nearly enough C to make this work on my own, so I would GREATLY appreciate any help that could be thrown my way, and if not, I understand, I know all the devs are rather busy... :)


Sorry if that was a little long, just trying to get all my thoughts in here...Thanks! :)
____________

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1196
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 23869 - Posted: 18 May 2010 | 17:50:52 UTC - in response to Message 23866.

Is there any variant of LLR or sieve that will compile/run on PPC/Linux?

I have a 1.6 GHz PowerPC G5 running Gentoo, and right now I'm primarily using it as my server, but I would like to see if there is the possibility of running PG and/or PRPNet on it...otherwise my only other option is SETI@Home and I would MUCH rather contribute to PG... :)

I've managed to get the PRPNet client software to compile and execute successfully, but I was unable to get any implementation of LLR or the ppsieve test to compile...probably because it is x86 specific and I don't know anywhere near enough code to make it work on PPC...

Also, I was unable to find the source for Genefer or PFGW, I would be willing to try to create PPC Linux builds for those programs in order to run PRPNet if the source was available somewhere....in hindsight, it probably is available, I'm just not looking in the right places... ;)


So, is this just a pipe dream? or would it be feasible/practical to have a non-x86 (PPC) LLR or sieve implementation? I'm fairly certain (based on the applications list) that a OSX/PPC sieve (probably sr2sieve) implementation does exist, so that should at least be relatively easy to port to Linux (correct me if I'm wrong)

Again, I apologize for this, but I don't know nearly enough C to make this work on my own, so I would GREATLY appreciate any help that could be thrown my way, and if not, I understand, I know all the devs are rather busy... :)


Sorry if that was a little long, just trying to get all my thoughts in here...Thanks! :)


Welcome to the project!

Get phrot source from here: http://home.roadrunner.com/~mrodenkirch/home/Phrot.html
Get genefer source from here: http://home.roadrunner.com/~mrodenkirch/home/Genefer.html

Both can be built and run on PPC. I run both on MacPPC (since I maintain them), so LinuxPPC shouldn't be difficult to build on. PFGW cannot be built on PPC (it's x86 only), so don't worry about it. If you need help building either, let me know.

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 23870 - Posted: 18 May 2010 | 18:00:14 UTC - in response to Message 23869.


Welcome to the project!

Get phrot source from here: http://home.roadrunner.com/~mrodenkirch/home/Phrot.html
Get genefer source from here: http://home.roadrunner.com/~mrodenkirch/home/Genefer.html

Both can be built and run on PPC. I run both on MacPPC (since I maintain them), so LinuxPPC shouldn't be difficult to build on. PFGW cannot be built on PPC (it's x86 only), so don't worry about it. If you need help building either, let me know.



Wow, thanks for the quick response!


I have some time today, so I'll take a crack at building genefer and phrot....I'll post the results a little later...

Thanks again!
____________

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 23872 - Posted: 18 May 2010 | 19:27:38 UTC - in response to Message 23870.

In an attempt to troubleshoot myself, I'm recompiling GCC after installing FFTW....so I'll try again later today after that's done...

Rogue, I sent you a PM with the details of the various errors I encountered during my attempts to compile Phrot and Genefer

I'll post any updates to this thread if I have some time later today
____________

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 23895 - Posted: 19 May 2010 | 19:09:17 UTC

Okay, after a lot of trial and error and quite a bit of assistance from rogue, I have succeeded in compiling Phrot for 64-bit Linux/PPC....It seems to be working well as of this writing.... :)

I'll take a crack at genefer next, but just so I'm clear, there is not any implementation of LLR for RISC architectures worth mentioning, correct?

And does anyone know if a sieve application exists for PowerPC?
AFAIK, I would imagine that it would likely be very efficient due to the number of registers on the PPC and other RISC processors....but I could be totally misinterpreting the concept...
____________

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1196
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 23896 - Posted: 19 May 2010 | 19:33:16 UTC - in response to Message 23895.

Okay, after a lot of trial and error and quite a bit of assistance from rogue, I have succeeded in compiling Phrot for 64-bit Linux/PPC....It seems to be working well as of this writing.... :)

I'll take a crack at genefer next, but just so I'm clear, there is not any implementation of LLR for RISC architectures worth mentioning, correct?

And does anyone know if a sieve application exists for PowerPC?
AFAIK, I would imagine that it would likely be very efficient due to the number of registers on the PPC and other RISC processors....but I could be totally misinterpreting the concept...


Make sure you run some of the provided tests with phrot to verify that it is working correctly.

Believe it or not, outside of base 2, phrot on PPC compares fairly well to LLR on x86 (at the same clock rate). Jean is working on an LLR implementation that is built on FFTW, but I would be surprised if it is anywhere near as fast as phrot. The only real advantage it would have is the primality testing that phrot does not do.

Sieve applications do exist for PPC. Look here, http://sites.google.com/site/geoffreywalterreynolds/programs/. These can all be built on PPC. RISC is faster for some things that require fused multiply-add instructions in the FPU, but the extra registers don't help as much as you would expect. This is due to the number of cycles needed for some instructions and the expense of converting between FPU and INT.

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 23897 - Posted: 19 May 2010 | 20:11:48 UTC - in response to Message 23896.


Make sure you run some of the provided tests with phrot to verify that it is working correctly.

Believe it or not, outside of base 2, phrot on PPC compares fairly well to LLR on x86 (at the same clock rate). Jean is working on an LLR implementation that is built on FFTW, but I would be surprised if it is anywhere near as fast as phrot. The only real advantage it would have is the primality testing that phrot does not do.

Sieve applications do exist for PPC. Look here, http://sites.google.com/site/geoffreywalterreynolds/programs/. These can all be built on PPC. RISC is faster for some things that require fused multiply-add instructions in the FPU, but the extra registers don't help as much as you would expect. This is due to the number of cycles needed for some instructions and the expense of converting between FPU and INT.



Makes sense, the floating-point advantage of RISC would be essentially negated when converting to INT because of the extra cycles required (I'm only an amateur, so forgive me if my interpretation of this is way off)

LLR using FFTW...that would be interesting...any idea on the expected time frame of such a release?

I'll have to check out building the PPC sieve applications next, after I'm done messing around with phrot and genefer....


I did run the tests included with phrot and they completed successfully and matched the baseline results :)

I only had one question, in both the readme:
http://pgllr.mine.nu/software/phrot/readme_phrot.txt
and in the mersenneforum thread:
http://www.mersenneforum.org/showpost.php?p=130777&postcount=11
You mention the use of -DUNROLLED_MR in the makefile and setting UNROLLED_MR in phrot.c, would this produce any measurable benefit for the PPC if set? and I'm also unsure as to what value it would be set as well...

Thanks!
____________

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 23898 - Posted: 19 May 2010 | 20:27:01 UTC - in response to Message 23896.
Last modified: 19 May 2010 | 20:36:45 UTC


Believe it or not, outside of base 2, phrot on PPC compares fairly well to LLR on x86 (at the same clock rate). Jean is working on an LLR implementation that is built on FFTW, but I would be surprised if it is anywhere near as fast as phrot. The only real advantage it would have is the primality testing that phrot does not do.


I actually just ran a comparison of the two....
cllr.exe -d -q"7843*2^134274+1" Starting Proth prime test of 7843*2^134274+1 Using all-complex FFT length 12K, a = 3 7843*2^134274+1 is not prime. Proth RES64: B72949BC5FFC727A Time : 80.173 sec.


^^^That was on my i7-920 Win7 64-bit^^^


7843*2^134274+1 is composite LLR64=b72949bc5ffc727a. (e=0.03516 (0.0587399~3.90193e-16@0.000) t=108.05s) [2010-05-19 20:17:05 GMT] PPSE10k: 7843*2^134274+1 is not prime. Residue b72949bc5ffc727a


^^^ and that was the G5 1.8 GHz with kernel 2.6.32^^^


So considering that the i7 is clocked almost 1 GHz faster, with a difference of only ~30sec I feel like the G5 held its own...


EDIT: Crud, just realized this was base 2... :p I'll have to try another test later with a different number....
____________

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1196
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 23899 - Posted: 19 May 2010 | 20:51:53 UTC - in response to Message 23898.

the UNROLLED_MR is a setting that will unroll the main loops to gain performance on CPUs with many registers. PPC is one such CPU. This is in phrot.c

#if defined(__ppc__)
#define UNROLLED_MR
#endif

so if __ppc__ is defined on your box, then it will take advantage of it.

The FPU to INT conversion is a problem with both FFTs and sieves, so those programs try to do as much math work as possible without doing a conversion. IIRC the conversion between the two is much more expensive on PPC than x86.

LLR 3.8.1 might be on par with phrot since it is using special modular reduction which George added specifically for LLR and PFGW. I haven't compared them since LLR 3.8 came out. Even if phrot is now slower than LLR for other bases, I think that it is fairly impressive since it has almost no asm in it.

vasmProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Dec 08
Posts: 47
ID: 32604
Credit: 990,892
RAC: 0
321 LLR Bronze: Earned 10,000 credits (10,190)Cullen LLR Bronze: Earned 10,000 credits (23,978)PPS LLR Bronze: Earned 10,000 credits (50,765)PSP LLR Bronze: Earned 10,000 credits (11,013)SoB LLR Bronze: Earned 10,000 credits (16,508)SR5 LLR Bronze: Earned 10,000 credits (10,703)SGS LLR Bronze: Earned 10,000 credits (10,005)TRP LLR Bronze: Earned 10,000 credits (10,303)Woodall LLR Bronze: Earned 10,000 credits (11,537)321 Sieve Bronze: Earned 10,000 credits (21,060)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (22,668)PPS Sieve Silver: Earned 100,000 credits (260,686)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (40,185)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,280)AP 26/27 Silver: Earned 100,000 credits (149,718)GFN Bronze: Earned 10,000 credits (45,568)PSA Silver: Earned 100,000 credits (271,875)
Message 23900 - Posted: 19 May 2010 | 21:25:01 UTC - in response to Message 23898.

I actually just ran a comparison of the two....
cllr.exe -d -q"7843*2^134274+1" Starting Proth prime test of 7843*2^134274+1 Using all-complex FFT length 12K, a = 3 7843*2^134274+1 is not prime. Proth RES64: B72949BC5FFC727A Time : 80.173 sec.

^^^That was on my i7-920 Win7 64-bit^^^

So considering that the i7 is clocked almost 1 GHz faster, with a difference of only ~30sec I feel like the G5 held its own...

Was any other process hindering the i7 test? For that number it looks too slow. My Win7 x64 2.66GHz Core2 Q9450 gave:

cllr.exe -d -q"7843*2^134274+1" Starting Proth prime test of 7843*2^134274+1 Using all-complex FFT length 12K, a = 3 7843*2^134274+1 is not prime. Proth RES64: B72949BC5FFC727A Time : 29.368 sec.


I would expect a 2.66GHz i7-920 to be even quicker.

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 23904 - Posted: 20 May 2010 | 7:54:48 UTC - in response to Message 23900.


Was any other process hindering the i7 test? For that number it looks too slow. My Win7 x64 2.66GHz Core2 Q9450 gave:

cllr.exe -d -q"7843*2^134274+1" Starting Proth prime test of 7843*2^134274+1 Using all-complex FFT length 12K, a = 3 7843*2^134274+1 is not prime. Proth RES64: B72949BC5FFC727A Time : 29.368 sec.


I would expect a 2.66GHz i7-920 to be even quicker.



Correct, that was my mistake again....the original test was done with the i7 running at 100%, courtesy of BOINC....

Here's a 're-do' at idle:

cllr.exe -d -q"7843*2^134274+1" Starting Proth prime test of 7843*2^134274+1 Using all-complex FFT length 12K, a = 3 7843*2^134274+1 is not prime. Proth RES64: B72949BC5FFC727A Time : 22.393 sec.

____________

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 23905 - Posted: 20 May 2010 | 8:25:46 UTC - in response to Message 23899.
Last modified: 20 May 2010 | 8:40:44 UTC

the UNROLLED_MR is a setting that will unroll the main loops to gain performance on CPUs with many registers. PPC is one such CPU. This is in phrot.c

#if defined(__ppc__)
#define UNROLLED_MR
#endif

so if __ppc__ is defined on your box, then it will take advantage of it.

I'll have to play around with this...because even with the modifications I made to the makefile, it should have still called UNROLLED_MR....but wouldn't '-funroll-loops' via invoking '-O3' in gcc do essentially the same thing?
(Again, forgive me if I'm way off here)

Because that would be great (obviously) if there were some additional optimizations I could use on phrot....I would like to stretch this G5 as far and fast as I possibly can....

The FPU to INT conversion is a problem with both FFTs and sieves, so those programs try to do as much math work as possible without doing a conversion. IIRC the conversion between the two is much more expensive on PPC than x86.

Oh well, I guess that's just the price I pay for keeping my G5 around... ;)

LLR 3.8.1 might be on par with phrot since it is using special modular reduction which George added specifically for LLR and PFGW. I haven't compared them since LLR 3.8 came out. Even if phrot is now slower than LLR for other bases, I think that it is fairly impressive since it has almost no asm in it.

With what little knowledge of this area I have, I agree, that phrot still compares very favorably in many situations/bases to LLR without much asm to speak of is very impressive....IMHO everyone involved in producing and maintaining phrot definitely deserves a round of applause for that! :)
____________

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1196
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 23912 - Posted: 20 May 2010 | 12:36:44 UTC - in response to Message 23905.

the UNROLLED_MR is a setting that will unroll the main loops to gain performance on CPUs with many registers. PPC is one such CPU. This is in phrot.c

#if defined(__ppc__)
#define UNROLLED_MR
#endif

so if __ppc__ is defined on your box, then it will take advantage of it.

I'll have to play around with this...because even with the modifications I made to the makefile, it should have still called UNROLLED_MR....but wouldn't '-funroll-loops' via invoking '-O3' in gcc do essentially the same thing?
(Again, forgive me if I'm way off here)


There is a BIG difference between -funroll-loops and UNROLLED_MR. -funroll-loops is a compiler optimization that can take relatively simple loops (a few lines of C) and unroll them. UNROLLED_MR is coded optimization that takes some fairly complex loops and unrolls them.

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 23916 - Posted: 20 May 2010 | 16:34:43 UTC - in response to Message 23912.

There is a BIG difference between -funroll-loops and UNROLLED_MR. -funroll-loops is a compiler optimization that can take relatively simple loops (a few lines of C) and unroll them. UNROLLED_MR is coded optimization that takes some fairly complex loops and unrolls them.


I had a feeling it was something along those lines....I'll have to take a look to see if that was explicitly defined when I built phrot....
(Again, I apologize for my relative ignorance when it comes to these things...)



In other news, I have succeeded in building genefer! :)

I took your suggestion and added in a new #elseif with a CPU_TARGET linux/ppc/ppc64 and that did the trick :)

I can provide a diff with my modifications if you like, or just post them here....

Also, when building genefer, I had to use the '-pipe' CFLAG, I kept getting error messages about temporary files....not sure if that was a general error or something I was doing incorrectly, but it works now, I've checked it against several sources and it appears to be running correctly :)

If you had any recommendations about optimizing genefer, I would greatly appreciate them, what I ended up using is
'-O3 -pipe -mcpu=970 -mtune=970 -maltivec -Wl,-lm -O2'
(where the last two are passed to the linker) In hindsight, I probably didn't need to use '-lm' since it's not linking external libraries, but it seemed to work okay....

And PRPNet has been humming along rather smoothly for the last few hours, it's not the quickest in the world (at least compared to the i7 that I'm used to), but it gets the job done... :)


So I just wanted to especially thank rogue for his support and patience in helping me figure this out! I definitely couldn't have done it without your help, Thank You! :)

Next I guess I'll try re-working some sieve applications to see if they can be built on PPC/Linux

Sieve applications do exist for PPC. Look here, http://sites.google.com/site/geoffreywalterreynolds/programs/. These can all be built on PPC. RISC is faster for some things that require fused multiply-add instructions in the FPU, but the extra registers don't help as much as you would expect. This is due to the number of cycles needed for some instructions and the expense of converting between FPU and INT.

But that will probably have to wait until I have some more free time... ;)
____________

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 23918 - Posted: 20 May 2010 | 17:24:25 UTC - in response to Message 23912.
Last modified: 20 May 2010 | 17:35:00 UTC


There is a BIG difference between -funroll-loops and UNROLLED_MR. -funroll-loops is a compiler optimization that can take relatively simple loops (a few lines of C) and unroll them. UNROLLED_MR is coded optimization that takes some fairly complex loops and unrolls them.

I went in and added another #ifdef to phrot.c, specifically ppc64:
#if defined(__ppc__) || defined(__ppc64__) #define UNROLLED_MR #endif


I think the results, compared to my earlier run, speak for themselves:
AzureDragon phrot # ./phrot.g5 -d -q"7843*2^134274+1"
Phil Carmody's Phrot (0.72)
Input 7843*2^134274+1 : Actually testing 128499712*1048576^6713+1 (witness=3 6715/14336 limbs)
7843*2^134274+1 [-223680,289213,462807,360724] is composite LLR64=b72949bc5ffc727a. (e=0.03516 (0.0587399~3.90193e-16@0.000) t=91.61s)


Whereas my earlier test, when UNROLLED_MR was probably NOT defined, i got these results:
7843*2^134274+1 is composite LLR64=b72949bc5ffc727a. (e=0.03516 (0.0587399~3.90193e-16@0.000) t=108.05s )
[2010-05-19 20:17:05 GMT] PPSE10k: 7843*2^134274+1 is not prime. Residue b72949bc5ffc727a


It really is amazing how that makes such a difference....I'll check to see if I can tweak it a little more and will update as time permits
____________

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1196
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 23923 - Posted: 20 May 2010 | 20:56:16 UTC - in response to Message 23918.

Did you run genefer with the switch that verifies residues? I suggest that you read this thread: http://www.primegrid.com/forum_thread.php?id=1800 regarding genefer compiler options. -O3 causes problems. I haven't looked into it yet. --ffast-math should work, but it doesn't. I really need to spend some time on it. I should be able to do that Memorial Day weekend.

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 23932 - Posted: 21 May 2010 | 17:10:48 UTC - in response to Message 23923.
Last modified: 21 May 2010 | 17:12:31 UTC

Did you run genefer with the switch that verifies residues? I suggest that you read this thread: http://www.primegrid.com/forum_thread.php?id=1800 regarding genefer compiler options. -O3 causes problems. I haven't looked into it yet. --ffast-math should work, but it doesn't. I really need to spend some time on it. I should be able to do that Memorial Day weekend.


I discovered this problem later, (though I caught it before I was sent any GFN workunits)...It's a shame, because -O3 and -ffast-math make it SO much faster....

I should have some time to kill this weekend, so maybe I'll try building genefer while selectively enabling the subflags of -O3 and -ffast-math, from what I read on that thread, it seems like one or two of the flags is the culprit...probably disabling "-fno-associative math" and "-funsafe-math-optimizations"...at least on the PS3 (and yet to be determined on the G5)

From the GCC Manual:
-fassociative-math
Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result. NOTE: re-ordering may change the sign of zero as well as ignore NaNs and inhibit or create underflow or overflow (and thus cannot be used on a code which relies on rounding behavior like (x + 2**52) - 2**52). May also reorder floating-point comparisons and thus may not be used when ordered comparisons are required. This option requires that both -fno-signed-zeros and -fno-trapping-math be in effect. Moreover, it doesn't make much sense with -frounding-math.

-funsafe-math-optimizations
Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link-time, it may include libraries or startup files that change the default FPU control word or other similar optimizations.
This option is not turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math.


I would try taking a look through the code itself, but as I said before, my knowledge of C is rudimentary at best, so maybe if I'm feeling adventurous later...I'll post the results here if I find some problems/answers... ;)

On the contrary, as you said in the other thread http://www.primegrid.com/forum_thread.php?id=1800&nowrap=true#22784, phrot seems to be running VERY well when compiled with -O3 (though not -ffast-math)....I'm getting about ~94s per PPSE10k unit, about ~100s per PPSE11k unit, ~2100s per PPSE unit...

I also was able to gain about ~2s per PPSE10/11k wu's by compiling phrot with -floop-block and -ftree-loop-distribution (I'm using GCC 4.4.3) so I'm not sure if that is just by chance or the Graphite/PPL loop optimizations will actually benefit the code....


I didn't launch this effort with any major expectations to speak of, so I'm very satisfied with the results and if I were able to get genefer fully optimized, that would just be the icing on the cake... :)


Rogue, seriously, thank you so much for all your help and patience with me here, I really appreciate it!
____________

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 24094 - Posted: 3 Jun 2010 | 20:31:35 UTC
Last modified: 3 Jun 2010 | 20:35:15 UTC

Okay, sorry it's been a while, though I doubt anyone was holding their breath... ;) I got a new Nexus One and I've been enjoying all the Android splendor so I've been otherwise occupied for the last week or so with that to distract me from PG...


Here's an update on how PRPNet has been going on my G5 (single processor 1.8 GHz w/1.2G RAM running Gentoo with kernel 2.6.32) :


I've been able to get phrot tuned pretty well for PPSE10 and 11k, 10k work units take ~94 sec and 11k's take around ~105 sec

PPSE n>450K work units, on the other hand, take around ~2000 sec on average...

ESP works, but it takes several hours (I haven't kept track of exactly how long, but I believe it's around 6-8)...so I have that set to a low % in prpclient.ini...

Same for SGS, though I ended up commenting that line out because it was taking about 10-12 hours/WU to complete

27121 and GCW13 are also working, but I never completed a WU from either of those projects, I aborted both tasks after about 24 hours (tried 27121 first, then GCW) because they were less than half complete

But then, after I got my nexus one, and thus being otherwise occupied, I left the new 121 project crunching and it finished the first WU after about 36 hours....so I would assume that 27121 and GCW would take about that same amount of time (+ or - 2 hours)...still, 121 seems like the credit is fairly high for one work unit...I would be interested in finding out roughly how long its taking others to finish a WU on x86....

So I'm not sure if the extremely long run-times on the larger numbers are due to my hardware limitations (probably) or just due to the way I optimized phrot...I'll have to play around with optimizations a little more and see what GCC can come up with...my problem is that I'm not willing to wait long enough (damn ADD rears its ugly head) for the longer test units to finish running, so most of my tests were on the (relatively) smaller PPSE10 and 11k WU's....but now I have the new phone to distract me while it is running these tests, so I should be good ;)


Genefer seems to be working very well on GFN65536, but on GFN32768 it detects a roundoff error almost immediately and finishes with phrot (similar to what happens on x86 windows, but the x86 windows version then has the option to usually finish with the 80-bit version of genefer)


Mark: in regards to your above post I was able to successfully compile genefer with "-O3", but it only seemed to work correctly (producing correct residues) with the following CFLAG added on:

-ftree-loop-distribution
Again, I'm not quite sure why this is the case, but it is working correctly when compiled with that CFLAG in addition to the others:
-O3 -mcpu=970 -mtune=970 -maltivec -pipe -Wl,-lm -O2
It does provide a slight but noticeable performance benefit (at least in the built in benchmarks) when running Genefer


I should have some free time either tomorrow night or a couple days next week, during which I should be able to start working on test packages of PRPNet for LinuxPPC (32- and 64-bit), which, once completed, could hopefully be included on the PRPNet download page! :)
____________

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1196
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 24096 - Posted: 3 Jun 2010 | 22:22:34 UTC - in response to Message 24094.

Some of the projects have workunits that take a long time on PPC. Workunits on GCW shouldn't take quite as long now that the base 13 tests are done. They will probably vary from five to ten hours, possibly longer, on your computer. I would recommend against SGS because the large k will make phrot inefficient. 27121, PPSE, and GFN65536 are also good choices, even though 27121 has longer workunits.

As for GFN32768, forget about it. The bases are too large for genefer and will trigger round-off errors. Since genefer80 can still handle the larger bases, I would avoid it until genefer80 until the bases get too large for it. The residues for generfer and phrot/llr/pfgw are not compatible, so unless phrot were to find a PRP, the test would be wasted (unless the server is not configured to do double-checks).

-ftree-loop-distribution is not a compiler option on MacPPC. I still haven't had time to work on this issue. The reason why the other one built with -O3 is because you also had -O2, which overrode the -O3.

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1196
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 24097 - Posted: 3 Jun 2010 | 22:59:29 UTC - in response to Message 24096.

I did a quick test and see that -funsafe-math-optimizations, which is set by -ffast-math is the problem on MacPPC. Using -ffast-math with -fno-unsafe-math-optimizations works, albeit not much faster than just -O3 without -ffast-math. I might be able to re-arrange parts of the code so that I can use -ffast-math, but I don't know yet. Here is gcc's take on -ffast-math:

This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 24098 - Posted: 3 Jun 2010 | 23:30:55 UTC - in response to Message 24096.

Some of the projects have workunits that take a long time on PPC. Workunits on GCW shouldn't take quite as long now that the base 13 tests are done. They will probably vary from five to ten hours, possibly longer, on your computer. I would recommend against SGS because the large k will make phrot inefficient. 27121, PPSE, and GFN65536 are also good choices, even though 27121 has longer workunits.

As for GFN32768, forget about it. The bases are too large for genefer and will trigger round-off errors. Since genefer80 can still handle the larger bases, I would avoid it until genefer80 until the bases get too large for it. The residues for generfer and phrot/llr/pfgw are not compatible, so unless phrot were to find a PRP, the test would be wasted (unless the server is not configured to do double-checks).


Hmmmm....I haven't tried GCW in a while....maybe I'll add that to the rotation and see how it runs...

But yeah, I've more or less written off SGS and GFN32768 due to the above reasons...probably going to add 27121 to that list as well...

I've been looking over programmers notes for the PowerPC 970FX and based on my learnings and (limited) knowledge of programing, it doesn't seem that there will be any easy way to get a similar extended precision (i.e. Genefer80) running on the PPC, since, (as you know) the 80-bit floating point standard is unique to the x87 instruction set and therefore to the x86 architecture...though since the PPC 970 does have 2 FPU's as well as the AltiVec units, something could possibly be done with those....or not...again, forgive my interpretations if I'm way off, some of this is way over my head...

-ftree-loop-distribution is not a compiler option on MacPPC. I still haven't had time to work on this issue. The reason why the other one built with -O3 is because you also had -O2, which overrode the -O3.

Interesting, so the MacPPC version does not include the graphite loop optimizations....
Also, I was under the impression that since I specified:
-Wl
that all options following that would be passed to the linker...so that's why I placed -O2 there....but again, I'm probably wrong.... thanks for being patient here....

I did a quick test and see that -funsafe-math-optimizations, which is set by -ffast-math is the problem on MacPPC. Using -ffast-math with -fno-unsafe-math-optimizations works, albeit not much faster than just -O3 without -ffast-math. I might be able to re-arrange parts of the code so that I can use -ffast-math, but I don't know yet. Here is gcc's take on -ffast-math:

This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.

I knew I forgot something...I'll see if -ffast-math with -fno-unsafe-math-optimizations makes any difference and I'll play around with the Graphite Loop Optimizations because it seemed like those helped phrot (not to mention allowing genefer to compile with -O3)...I'll try and read up some more on all this and see if I can take a look at moving things around in the code as well
____________

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1196
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 24099 - Posted: 3 Jun 2010 | 23:48:23 UTC - in response to Message 24098.

Altivec just isn't overly useful for these programs because it is only single precision. Double precision combined with the fused multiply-add instruction is where you most of "bang for the buck".

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 24228 - Posted: 9 Jun 2010 | 21:39:46 UTC - in response to Message 24099.
Last modified: 9 Jun 2010 | 21:40:51 UTC

Okay, I've been doing some tweaking with the compiler options (again, I wish my programming knowledge was a little better so I could do more work with the code itself) and I've been able to further optimize phrot and genefer....

Keep in mind, I haven't done any extensive testing or data collection to prove this hypothesis, it is just my general impression from how both were running through PRPNet

I'm not sure whether it is a product of how the code is written or just the way I ran it through GCC, but the time it took to produce results for both programs seemed to make a parabola (if hypothetically plotted on a grid)....smaller bases would take proportionally longer to produce a result, as would larger bases.... what I believe I've managed to do is make it so the larger bases are slightly faster than they were using my original build (i.e. ESP, GCW and 27121 seem to run faster) and the smaller PPSE wu's seem to take about the same amount of time....

Hope that made sense, so here's what worked for me...

Phrot (and YEAFFT) was compiled with the following:

-O3 -ftree-loop-distribution -ffast-math -fno-unsafe-math-optimizations -funroll-loops -mcpu=970 -mtune=970 -maltivec -mpowerpc64 -mpowerpc-gpopt -mpowerpc-gfxopt -mdouble-float

Genefer was compiled with the following:
-O3 -ftree-loop-distribution -ffast-math -fno-unsafe-math-optimizations -mcpu=970 -mtune=970 -maltivec -mpowerpc64 -mpowerpc-gpopt -mpowerpc-gfxopt -mdouble-float


I've also compiled the new PRPNet release (3.3.0alpha) and it seems to be running fairly well...

I'm currently running a few more tests and then I'll be putting together a package for distribution...I'll upload it later and post the link here...I'll copy the scripts and prpclient.ini from the regular linux client...

My current builds of both phrot and genefer are tweaked for the G5 (PPC-970) and 64-bit Linux...I would have no problem attempting to create a build for the G4 if requested (although I have no way to reliably test this)
I'm not sure how many people out there are running PPC Linux, and even how many of them would be interested in running PRPNet, but hopefully this contribution will help somehow... :)
____________

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 24237 - Posted: 10 Jun 2010 | 2:58:01 UTC
Last modified: 10 Jun 2010 | 3:01:39 UTC

Okay, so at long last and to relatively little fanfare...I would like to present to the PG community a new build of PRPNet!

prpclient-3.3.0alpha-PPC_Linux64

This was tested and working on a 64-bit installation of Gentoo Linux running on a G5 1.8 GHz single processor.

It includes:
Phrot 0.72, which performs PRP tests in a similar fashion to LLR.
Genefer 2.2.0 for RISC, which performs PRP tests on Generalized Fermat Numbers.

Works well on PPSE10 and 11k, GFN65536, ESP, 121, 27121, and GCW. See the above posts for more information on why GFN32768 and SGS are specifically NOT recommended.

I've posted the download link above, as well as the direct link here.

I would especially like to thank Mark (rogue) for all his help in getting this put together, I couldn't have done it without him!

Please PM me or email me (redstar3894 AT gmail DOT com) directly if anyone has any questions.
____________

Profile trigggl
Avatar
Send message
Joined: 24 Feb 09
Posts: 57
ID: 36070
Credit: 30,388,836
RAC: 0
321 LLR Bronze: Earned 10,000 credits (22,795)Cullen LLR Bronze: Earned 10,000 credits (23,087)PPS LLR Silver: Earned 100,000 credits (105,532)PSP LLR Bronze: Earned 10,000 credits (53,996)SoB LLR Bronze: Earned 10,000 credits (34,455)SR5 LLR Bronze: Earned 10,000 credits (11,626)SGS LLR Bronze: Earned 10,000 credits (20,209)TRP LLR Bronze: Earned 10,000 credits (18,601)Woodall LLR Silver: Earned 100,000 credits (113,413)321 Sieve Bronze: Earned 10,000 credits (21,376)Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,541,362)PPS Sieve Sapphire: Earned 20,000,000 credits (23,162,986)TRP Sieve (suspended) Bronze: Earned 10,000 credits (33,358)AP 26/27 Silver: Earned 100,000 credits (106,414)GFN Turquoise: Earned 5,000,000 credits (5,117,595)
Message 24475 - Posted: 22 Jun 2010 | 15:33:43 UTC - in response to Message 24237.
Last modified: 22 Jun 2010 | 15:46:58 UTC

Congrats on the PPC64 app. I have an IBM Power 3 (ppc64) running pps sieve at the moment, running Gentoo Linux, of course.

I was wondering if you could compile a generic app without altivec? IBM power doesn't support it. Is your app able to run the PPS LLR work units?

http://www.primegrid.com/show_host_detail.php?hostid=107509

I managed to squeeze out one work unit for the just completed Summer Solstice Challenge.
____________
6r39 7ri99

Beware the dual headed Gentoo with Wine!

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 24476 - Posted: 22 Jun 2010 | 19:35:56 UTC - in response to Message 24475.
Last modified: 22 Jun 2010 | 19:36:30 UTC

Congrats on the PPC64 app. I have an IBM Power 3 (ppc64) running pps sieve at the moment, running Gentoo Linux, of course.

I was wondering if you could compile a generic app without altivec? IBM power doesn't support it. Is your app able to run the PPS LLR work units?

http://www.primegrid.com/show_host_detail.php?hostid=107509

I managed to squeeze out one work unit for the just completed Summer Solstice Challenge.


I should be able to cook one up fairly easily for you, but it may take me a couple days...there was a major power outage in my area with all the storms in the Midwest US the last few days and the UPS on my G5 failed to shut it down cleanly, which caused more than a few problems...so, long story short, I have to rebuild my Gentoo installation....I'm sure you know how fun that usually is... :p

Did you just want it without altivec or are there other CFLAGS you know of that work well on the POWER3? (I'm somewhat familiar with the POWER series but most of my experience has been building for the PowerPC architecture)

As for your other question, phrot and genefer, as well as the larger PRPNet platform are used in prime finding projects not a part of BOINC. There is more information in the following threads:

http://www.primegrid.com/forum_thread.php?id=1215
http://www.primegrid.com/forum_thread.php?id=1537

So it would unfortunately not be able to run PPS LLR workunits...as referenced earlier in this thread, and IIRC, LLR is currently only available for the x86 (and x86_64) architecture, though there is reportedly (according to Mark) a version of LLR in development using FFTW, which should allow it to be compiled for RISC architectures...

Take a look over the rest of the thread for more information on this, hopefully I'll have enough time to get Gentoo rebuilt and running on the G5 before the week is out, so I'll keep you updated of any new developments :)

And Welcome to PRPNet! :)
____________

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 24567 - Posted: 25 Jun 2010 | 18:55:12 UTC

Okay, finally got the G5 back up and running *somewhat* smoothly... :p

So now that my build platform is back up and running, I should have some new build(s) created in the next few days (depending on available time, of course), and also incorporating any fixes (as referenced in the PRNet Discussion Thread) from Mark and Lennart.

The prpclient-3.3.(x)alpha-ppc_linux_64 will be renamed to prpclient-(x.x.x)alpha-ppc_linux_G5 (ppc 970/970FX) which will remain a 64-bit build compiled with AltiVec (though the benefits remain to be seen at this point) and it should also work on POWER6 (if anyone will ever be running PRPNet on one)

I will also add the following builds to the list of supported architectures:


    prpclient-(x.x.x)alpha-ppc_linux_G4 (ppc 7xxx) 32-bit build for PPC with AltiVec
    prpclient-(x.x.x)alpha-power_linux_64 (POWER3,4,5) 64-bit build for IBM POWER processors (same as ppc 970 but without AltiVec)




Please let me know if I'm off on the naming scheme here, and I'll change the build designations accordingly.

Like I said earlier, this is dependent on both my time and any changes Mark makes to the client source. And also please note that the two new builds I am adding will be run AT YOUR OWN RISK as I have no way to reliably test them on the architectures they will be built for.
____________

Profile Redstar3894Project donor
Avatar
Send message
Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 866,812
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,064)Cullen LLR Bronze: Earned 10,000 credits (11,089)PPS LLR Bronze: Earned 10,000 credits (10,066)PSP LLR Bronze: Earned 10,000 credits (10,478)SoB LLR Bronze: Earned 10,000 credits (65,077)SGS LLR Bronze: Earned 10,000 credits (10,970)TRP LLR Bronze: Earned 10,000 credits (11,428)Woodall LLR Bronze: Earned 10,000 credits (10,531)321 Sieve Bronze: Earned 10,000 credits (30,910)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,339)PPS Sieve Silver: Earned 100,000 credits (136,406)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (22,850)TRP Sieve (suspended) Bronze: Earned 10,000 credits (43,120)AP 26/27 Silver: Earned 100,000 credits (103,767)PSA Silver: Earned 100,000 credits (368,620)
Message 24598 - Posted: 28 Jun 2010 | 18:49:23 UTC

I have updated the PRPNet builds to the latest version (as of this writing, 3.3.2)

prpclient-3.3.2alpha-PPC_Linux_G5 - 64-bit PowerPC build with AltiVec

prpclient-3.3.2alpha-Power_Linux_64 - 64-bit generic PPC/POWER 64-bit build (without AltiVec)


These were tested and working on a 64-bit installation of Gentoo Linux running on a G5 1.8 GHz single processor.

It includes:
Phrot 0.72, which performs PRP tests in a similar fashion to LLR.
Genefer 2.2.0 for RISC, which performs PRP tests on Generalized Fermat Numbers.

Both builds work well on PPSE10 and 11k, GFN65536, ESP, 121, 27121, and GCW. See the above posts for more information on why GFN32768 and SGS are specifically NOT recommended. The generic build is slightly slower (from the benchmarks I ran) than the AltiVec-enabled build for just about all tests, but should still work fairly well on machines without the benefit of those added instructions.

I've posted the download links above, and the links on the PRPNet thread should be updated shortly.

I would especially like to thank Mark (rogue) for all his help in getting this put together, I couldn't have done it without him! Also, I would be remiss if I didn't thank Lennart and John for administering PRPNet and PrimeGrid itself, as well as Rytis for making it all possible. :)

Please PM me or email me directly (redstar3894 AT gmail DOT com) if there are any questions/issues or further requests.


NOTE:
When rebuilding my system, I neglected to build the powerpc 32-bit (G4) toolchain, so the previously announced G4 build will be somewhat delayed, but I will update this thread and/or this post with any details as they come up.
____________

Message boards : Number crunching : Alternative Platforms

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2020 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 0.00, 0.00, 0.00
Generated 2 Dec 2020 | 19:31:45 UTC