Author Topic: Primenumbergenerator  (Read 2549 times)

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4954
Primenumbergenerator
« on: October 12, 2006, 01:34:11 AM »
Well, I really wanted to program something today, so I made a prime number generator. I figured it might come in handy for some of the crypto stuff I'm doing lately since having an easily accessible list of small primes around can kinda be handy for implementing or speeding up certain algorithms.

The code used to be fairly readable, but then I optimized it to use less memory and run a little faster. I also have ugly status output in the calculations so you have some idea what's going on while it's working.

Anyways it works using the sieve of eratosthenes. I'd actually come up with the idea a while back while trying to figure out how to list prime numbers quickly, and even got some cool bounds on what you have to test to get all the prime numbers up to a certain point... only to find someone beat me to the idea by about 4000 years or so. But hey, at least I figured something out that people found worthy of remembering for 4000 years.  :blink:

So, I generated a file of over the first 26 million primes. The file size was a little over 100 MB. It listed all primes below 500,000,000. The last three are as follows for anyone who wants to know:

499999909 499999931 499999993


Runtime on the final optimized version took about a minute (maybe two) to generate the list of all primes under 500,000,000. I used an 800 MHz Celeron with a lot less memory than I'd like to have. Let me know if you're interested in the source code.  :)
 
« Last Edit: October 12, 2006, 01:36:19 AM by Hooman »

Offline Eddy-B

  • Hero Member
  • *****
  • Posts: 1186
    • http://www.eddy-b.com
Primenumbergenerator
« Reply #1 on: October 12, 2006, 03:05:11 AM »
Me want :)
Rule #1:  Eddy is always right
Rule #2: If you think he's wrong, see rule #1
--------------------

Outpost : Renegades - Eddy-B.com - Electronics Pit[/siz

Offline TH300

  • Hero Member
  • *****
  • Posts: 1404
    • http://op3game.net
Primenumbergenerator
« Reply #2 on: October 12, 2006, 05:27:04 AM »
I am interested.

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4954
Primenumbergenerator
« Reply #3 on: October 13, 2006, 02:46:44 AM »
And the last 3 primes under 1 billion are:
999999893 999999929 999999937

I didn't have as much stuff running, and actually had the 200 MB free to write out the list of primes to a file this time. There are over 50 million primes below 1 billion.  :) (Last time I had memory hog programs running and only 150 MB free disk space or so). The total running time for all primes below 1,000,000,000 was about 6 minutes. About 3 minutes to sieve all the numbers, and another 3 minutes waiting for windows to enlarge my page file and to write them all into an array before being stored to disk. I used up about 215 MB of RAM during the operation (and my system only has 192 MB of RAM). Note: The sieving process only took about 15 MB, but the list of prime numbers it generated took up the 200 MB. I used 1 bit per number in the sieve part since I'm only marking if the number is prime or composite. Hence using a bit vector for storage. The list of primes written to the file was a list of 32 ints however. It might have been better if I'd just stored the bit vector.  :unsure:


Also note, if you've marked all multiples of all primes up to n, then the smallest composite number which has not yet been marked as composite (everything defaults to prime) is greater than n^2. (That is, it has at least 2 factors since it's composite, and each factor is greater than n, since otherwise it would have been marked as composite during a sieve of all multiples of a prime factor less than or equal to n). This is why the outer loop only has to iterate up to the square root of the max value for primes you're looking for. It's also why you can start sieving out composites starting at the square of a new prime that has been found. (Although not required for correctness, these two facts contribute greatly to the speed of the algorithm).

Oh, and to save memory during the sieve and a bit of running time, I special cased 2. That way I only have to check odd multiples of new primes. (Start at p^2, and increment by 2p for the inner loop). Note that if p is odd, then so it p^2, and so p^2 + 2p*n is also odd (since 2p is even). This is important for correctness since I didn't reserve storage space for multiples of 2 (it's clear 2 is the only even prime number) and the indexing doesn't distinguish between even and odd values. (That is, and index of 6 or 7 would both point to the same bit). You wouldn't want to accidentally mark 7 as composite by trying to mark 6 as composite. But this is just an optimization issue. It's only important because of my implementation to save memory (and a bit of time).





The code!

Code: [Select]
#include <memory.h>
#include <math.h>
#include <iostream.h>
#include <fstream.h>


#define ByteOfBit(i) (list[(i) / (sizeof(int) * 8 * 2)])
#define Bit(i)   ( 1 << (((i)/2) & (sizeof(int)*8 - 1)) )
#define IsPrime(i)  ((ByteOfBit(i) & Bit(i)) == 0)


unsigned int* GenerateListOfPrimes(unsigned int max, unsigned int &numPrimes)
{
unsigned int i, j;

// Initialize the number of primes found
numPrimes = 0;
if (max < 2)
  return 0;    // We are done

// Display status
cerr << "Allocating " << ((max  / (sizeof(int)*8*2))+1)*32 << " bits (" << ((max  / (sizeof(int)*8*2))+1) << " bytes)" << endl;
// Allocate space for a list of numbers to sieve
unsigned int *list = new unsigned int[((max  / (sizeof(int)*8*2))+1)];
// Check for memory allocation error
if (list == 0)
{
  cerr << "Out of memory (creating number list to sieve)" << endl;
  return 0;
}


// Display status
cerr << "Initializing memory" << endl;
// Initialize the list to all primes
memset(list, 0, ((max  / (sizeof(int)*8*2))+1)*sizeof(int));


unsigned int maxSieve = (int)sqrt(max);
// Display status
cerr << "Starting sieve for primes under " << max << " (" << maxSieve << " iterations required) ..." << endl;
numPrimes = 1;    // We've found the prime 2
// Sieve the list of numbers for primes
for(i = 3; i < maxSieve; i += 2)
{
  // Status update
  if ((i & (i-2)) == 1)
   cerr << "Processed up to: " << i-1 << endl;

  // Check if the current number is prime
  if (IsPrime(i))
  {
   // New prime found
   numPrimes++;
   // Cross off all multiples of the prime
   for (j = i*i; j < max; j += (i + i))
    ByteOfBit(j) |= Bit(j); // Mark number as composite
  }
}
cerr << "Done sieving. Counting primes in tail of list..." << endl;
// Finish counting primes
for (; i < max; i += 2)
  // Check if the current number is prime
  if (IsPrime(i))
   numPrimes++;   // New prime found

cerr << "Allocating space for list of primes" << endl;
// Allocate space to hold the list of primes
unsigned int *primeList = new unsigned int[numPrimes];
// Check for allocation error
if (primeList == 0)
{
  cerr << "Error allocating memory to store list of primes (" << numPrimes << " primes found)" << endl;
  delete [] list;    // Free the sieved list of numbers
  return 0;
}


// Add 2 to the list of primes
primeList[0] = 2;
j = 1;

// Scan numbers for primes and add to the list
for (i = 3; i < max; i += 2)
  if (IsPrime(i))    // Check if the current number is prime
  {
   primeList[j] = i;  // Add the number to the list
   j++;
  }

delete [] list;     // Free the temporary list of numbers that was sieved
return primeList;    // Return the list of primes
}




int main()
{
unsigned int numPrimes;

// Generate the list of primes
unsigned int *primes = GenerateListOfPrimes(1000, numPrimes);


// Display how many primes were found
cerr << endl << "Number of primes found:" << numPrimes << endl;

// Display the list of primes
if (numPrimes < 100)
{
  cout << "The first few primes are:" << endl;
  unsigned int i;
  for (i = 0; i < numPrimes; i++)
   cout << primes[i] << " ";
}
if (numPrimes > 100)
{
  cout << "The last few primes are:" << endl;
  unsigned int i;
  for (i = numPrimes-100; i < numPrimes; i++)
   cout << primes[i] << " ";
}


cout << endl << "Writing primes to file..." << endl;
// Write the list of primes to a file
ofstream outFile("Primes");
outFile.write((char*)primes, numPrimes*sizeof(int));
outFile.close();
cout << "Done." << endl;


return 0;
}

Just copy and paste into a C++ file. It mostly looks big because of all my output code to see what the algorithm is doing. I didn't want to get stuck waiting for hours for it to finish since I had now idea how fast it would be.
« Last Edit: October 13, 2006, 02:49:33 AM by Hooman »

Offline dm-horus

  • Banned
  • Hero Member
  • *****
  • Posts: 1042
Primenumbergenerator
« Reply #4 on: October 13, 2006, 06:12:49 AM »
Nice work Hooman.

Offline TH300

  • Hero Member
  • *****
  • Posts: 1404
    • http://op3game.net
Primenumbergenerator
« Reply #5 on: October 13, 2006, 03:50:25 PM »
Really nice.

Too bad that there's a limit on how large an integer number can be :/

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4954
Primenumbergenerator
« Reply #6 on: October 13, 2006, 04:56:53 PM »
There isn't really a limit on the bit vector other than memory constraints. The main limitation for the sieving process is the size of the index variables. If you wanted to go larger, replace the int with a 64 bit quantity. I think MSVC allows __int64 as a data type. (Or something similar to that).

I guess I should mention that I didn't try sieving beyond 1 billion since overflow errors should start to cause the algorithm to fail probably somewhere between 1 billion and 1.5 billion. (Probably at around MAXINT / 3, give or take 1 or so). My reasons for this is when a prime p is found, it starts crossing off composites with that prime as a factor starting at p^p and incrementing by 2p. If Adding 2p overflows the index size, it won't terminate because the index wraps around and is less than the max, and then it'll probably start knocking out smaller primes and think they're composite. This could happen once (sqrt(max) ^ 2) + 2*max > MAXINT, which is about 3*max > MAXINT.

Mind you generating the list of primes in 32 bit int format from the bit vector wastes a lot of memory. You'd probably want to get rid of that part and just return the bit vector if you wanted to go much larger. Replacing the unsigned ints with unsigned __int64 should increase the limit up to about 2^64/3 (which is about 6,148,914,691,236,517,205) and would require about 96,076,792,050,570,581 bytes of memory for the bit vector.

I think you'd have to rewrite the algorithm to work in iterative chunks well before the 64 bit index limit was exceeded to avoid running out of memory. Not to mention the expected run time to produce all primes up to 2^64/3 is well over thousands of years. (A really rough estimate assuming the asymtotic run time is linear, which is probably a low guess, is over 32000 years on my computer).