java.lang.OutOfMemoryError: Java heap space to find all strings of length n

I want to generate all possible strings of length n with alphabets a,b, and c .. I am getting this error `java.lang.OutOfMemoryError`: Java heap space. my heap size is 512m . Can you suggst me alternatives for this ..

Counting Strings not containing substring "ABC" can be solved with dynamic programming.

First, lets name the number of all strings of length n as `S(n)`. Note that `S(n)` is easy to compute. `S(n) = pow(size_of_the_alphabet,n)`

Let's name the number all strings containing "ABC" of length n as `A(n)` and let's name the number of all strings having first ocurrence of "ABC" on k-th position as `A(n,k)`

Now note that: `A(n,k) = (S(k-1) - A(k-1)) * S(n - k - 3)` (since k is the first position where "ABC" occurs, each of these strings has a substring without "ABC" before this position and any substring after the first "ABC").

Note that `A(n) = sum A(n,k) for k in [0..n-3]`

So now we can calculate `A(n)` computing each value of `A(n)` starting from 0.

Base case is simple, as `A(n) = 0 for n = 0,1,2`

`A(3) = (S(0) - A(0))*S(0) = 1` (namely "ABC")

etc..

Once you have `A(n)` you can get the number you're looking for using the formula `S(n) - A(n)`.

Pseudojava pseudocode:

`public class Counter {`

``` public int count(int aSize, int n) { long[] a = new long[n+1]; // n + 1 elements since a[i] contains # of strings containing "ABC" a[0] = 0; a[1] = 0; a[2] = 0; for (int i = 3; i <= n; ++i){ long sum = 0; for (int k = 0; k <= i-3; ++k) { sum += (pow(aSize, k) - a[k]) * pow(aSize, i - k - 3); } a[i] = sum; } return a[n]; } public static void main(String... args) { int aSize = 3; //size of the alphabet int n = 30; // length of the strings //final result long result = pow(aSize, n) - count(aSize, n); ```

```} } ```

Running time is `O(n^2)`, assuming pow is O(1). If it's not then you can save some time precalculating S(i).

Space requirement is `O(n)`.

Note that this computes number of all strings with length == n. If you want length <= n then the modification is obvious (you just sum all elements in a).

The number of strings you are currently computing is

```3^30 = 205 891 132 094 649 ```

Which is quite a lot...

Knowing that each String contains three bytes:

```3^30 * 3 = 617 673 396 283 947 ```

Plus the 32-bit or 64-bit pointers of the two dimensional array.

```3^30 * (3 + 4) = 1 441 237 924 662 540 // 32-bit Java VM 3^30 * (3 + 8) = 2 264 802 453 041 140 // 64-bit Java VM ```

Which is

```2 109 261 GB = 2059 TB // 64-bit JVM ```

I guess that is the problem.

With your limit of 500 MB you can solve this equation:

``` 3^x * (3 + 8) = 524 288 000 3^x = 47662545 x = log(47662545) / log(3) x = 7 / 0.477121254719662 x = 14.67 ```

So, if I forgot nothing, your tests should work for `n <= 14`. Of course, it won't work until you delete this code:

```List<String> result = new ArrayList<String>(); for (char[] permutation : table) { result.add(new String(permutation)); } ```

This code duplicates all the data! Which means that you need twice the amount of memory. Try to print it immediately.

I noted in your comment you have a different question. You want to find all the strings of length 30 with a-z which don't contains a-c. This is the count of all the string of length 30 which are d-z. The count is (26-3)^30.

```System.out.printf("%,d%n", BigInteger.valueOf(26-3).pow(30)); ```

prints

```71,094,348,791,151,363,024,389,554,286,420,996,798,449 ```

Instead of remembering every string, you can encode every possible String as a number. In your case you can use a `long`.

```public static void main(String... args) { String letters = "abc"; int len = 30; long combinations = (long) Math.pow(letters.length(), len); System.out.printf("There are %,d strings%n", combinations); for (long i = 0; i < 10; i++) System.out.println(fromLong(i, letters, len)); System.out.println("... some numbers skipped ..."); for (long i = combinations-10; i < combinations; i++) System.out.println(fromLong(i, letters, len)); }```

``` ```

```public static String fromLong(long n, String letters, int len) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < len; i++) { sb.append(letters.charAt((int) (n % letters.length()))); n /= letters.length(); } return sb.reverse().toString(); } ```

prints

```There are 205,891,132,094,649 strings aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaab aaaaaaaaaaaaaaaaaaaaaaaaaaaaac aaaaaaaaaaaaaaaaaaaaaaaaaaaaba aaaaaaaaaaaaaaaaaaaaaaaaaaaabb aaaaaaaaaaaaaaaaaaaaaaaaaaaabc aaaaaaaaaaaaaaaaaaaaaaaaaaaaca aaaaaaaaaaaaaaaaaaaaaaaaaaaacb aaaaaaaaaaaaaaaaaaaaaaaaaaaacc aaaaaaaaaaaaaaaaaaaaaaaaaaabaa ... some numbers skipped ... cccccccccccccccccccccccccccbcc ccccccccccccccccccccccccccccaa ccccccccccccccccccccccccccccab ccccccccccccccccccccccccccccac ccccccccccccccccccccccccccccba ccccccccccccccccccccccccccccbb ccccccccccccccccccccccccccccbc ccccccccccccccccccccccccccccca cccccccccccccccccccccccccccccb cccccccccccccccccccccccccccccc ```

This can print every possible String from 0 to 3^30-1. You don't need to store all the encoded values because you know all the possible values are in a continuous range.

Are you trying to compute the strings or count the strings?

EDIT: I see from the later comment that you know how many Strings you're talking about: If you are only counting, that's a standard permutation closed form: `26^n` = 26 raised to the nth power (assuming that you're only using lower case alphabet from 'a' to 'z').

If you are truly trying to enumerate every string, I strongly recommend that you ensure that you do not retain a reference to each `String`. If you end up with a dangling reference to each of those Strings, you're going to run out of memory after about six characters.

Category:java Views:0 Time:2011-12-20