hey i parallelized this stuff, because we cant be messing around here when i have a quadcore. But there is a catch, the program gives deterministic, but slightly different output than itself compiled without openmp.
here is the code:
PROGRAM bob
USE OMP_lib
INTEGER, PARAMETER :: P=140067,M=70000
REAL(8) :: sum,out(M),data(P),VAR,P2,tmp
INTEGER :: k,j
P2 = P;
open (unit = 2, file = "data.txt")
do k=1,P
read(2,*) data(k)
end do
sum=0
do k=1,P
sum = sum + data(k)
end do
u = sum/P
sum = 0
do k=1,P
sum = sum + (data(k)-u)*(data(k)-u)
end do
VAR = sum/(P-1)
write (*,*) "STANDARD ERROR:"
write (*,*) 1.96/SQRT(P2)
!$OMP PARALLEL PRIVATE(k,tmp,j) DEFAULT(SHARED)
!$OMP DO
do k=1,M
tmp = 0
do j=k+1,P
tmp = tmp + (data(j)-u)*(data(j-k)-u);
end do
!$OMP CRITICAL
out(k) = tmp/(VAR*P)
!$OMP END CRITICAL
end do
!$OMP END DO
!$OMP END PARALLEL
open (unit = 7, file = "CORR.txt")
do k=1,M
write (7,*) out(k)
end do
close(7)
END PROGRAM
if i compile it with openmp it gives me:
1.369807396630835E-003
as the first element of the answer set
if i compile it without openmp it gives me:
1.369807396632110E-003
every other element seems to be the same
this could be a big problem or nothing.
to get around the silly fortran limitations i actually wrote a perl script to take inputs and write out a fortran source and compile it and run it for me, very goofy. but now i can use this code without manually fiddling around everytime i want to run it and it doesnt really slow it down to any appreciable extent.
here is the time for the parrallel code running on a set of about 140k with a max dimension of 70k
real 0m3.086s user 0m9.237s sys 0m0.168s
here it is as it was previously:
real 0m7.776s user 0m7.628s sys 0m0.148s