#cuda seems to be faster in 32bit,due to smaller pointer -> more free reigster;c++/SSE is faster in 64bit due to more register. damn
0
0
0
0
0