From: Godmar Back (gback@cs.utah.edu)
Date: Tue Jan 05 1999 - 12:28:46 EST
One of the reasons why hacking kaffe is so much fun is that it still
really easy to get speedups in the 15% range just by removing some of
the sillyness that's still in there...
For instance, previously, when two threads ran out of memory (say the
main thread and the finalizer), both would signal the collector. The
collector would then do two full gc passes. The second pass, of course,
would just be a gratuitous walk of the complete heap for no reason
whatsoever. I fixed that and voila, it took only 8 instead of 9.5
seconds because it was only one instead of two gcs.
Still, there is the question of gc strategy which is still utterly unclear
at this point. Let me raise a few points here.
+ First, the "initial size of the heap" parameter (essentially
the -ms switch value) is also the amount of memory that we try
to get from the system if we need more. It's the gc_heap_allocation_size
var in gc-mem.c. This seems wrong to me. In fact, if you look at
the JNI interface, you'll see different parameters for minHeapSize
and allocHeapSize. We've been using allocHeapSize as minHeapSize
(cause it would be the amount of memory allocated from the system
when we get memory the first time.)
Then question then becomes how to estimate the amount of memory to
get from the system if we run out. Clearly, we want it to be large
enough to delay the next gc until enough garbage has collected, but
on the other hand we want to avoid growing the heap gratuitously.
+ Second, there used to be a criteria that said to only signal the gc if the
"alloced memory since last gc attempt > size of initial_heap/2"
The "alloc > gc_heap_allocation_size/2" criteria didn't make any sense to
me whatsoever, so I took it out. In fact, I think it was wrong in that
it led us to throw outofmemory where we could have gc'd. For instance, if
gc_heap_allocation_size == 32MB, we must be able allocate at least 16MB to
cause a gc according to this criteria. What if we have 24MB live data and
allocate 8MB and then run out of memory?
+ Third, if there are no free blocks when an allocation is attempted, should
there be an immediate gc? The broken criteria I mentioned above attempted
to find situations where the answer to this question was no.
I can see only one circumstance where the answer would be no:
Applications that grow to a certain amount of long-lived data quickly, then
stay there and produce only a comparatively small amount of short-lived
objects over the course of their lifetime. For these applications, a gc
during the growth phase would be a waste: it would only find that everything
needs to stay alive and that it needed to get memory from the system anyway.
The problem here is to recognize such a situation, which is hard to do.
In addition, few applications would have such a regular growth pattern.
Also, for those applications, it should be possible to hint at a favourable
initial heap size -- using the -ms switch.
Therefore, I feel that we should always collect if we run out.
I'll look at fixing the first issue, cause it seems clearly broken to me.
I'd like to mention another strange phenomenon I observed in connection with
the third issue: the more memory we give kaffe, the more it tends to
keep alive. I'm talking about the jit here, btw.
For instance, plain kaffe (-ms 1m) runs the iute test with a 5MB heap and ca.
3.8 MB of live data after each gc. In fact, the curve goes like this:
(This is the amount of live data after a gc!)
-ms 1M
3399K 3400K 4117K 4032K 3626K 3749K 3954K 3925K 3804K 3833K 3901K 3908K 3866K
3868K 3882K 3891K 3887K 3883K 3887K 3887K 3886K 3887K 3887K 3886K 3887K 3887K
3886K 3887K 3887K 3886K 3887K 3887K 3886K 3887K 3887K 3886K 3887K 3887K 3886K
3887K 3887K
Now I tried other size to reduce gc frequency:
-ms 4M
3399K 3400K 5780K 6390K 4645K 4488K 5599K 5860K 5061K 5005K 5365K 5560K 5378K
5288K 5373K 5458K
-ms 8M
5619K 6704K 4555K 4165K 5648K 6023K 4939K 4847K 5281K 5590K 5410K 5247K 5316K
5455K
-ms 16M
10048K 12984K 7181K 6125K
-ms 32M
18910K
This is the live data after a gc.
In other words, in a program that has ca. 3.8MB of live data, we managed to
find keep 18MB alive in one gc!
Btw, even after increasing the run-time, this amount doesn't go down:
-ms 32M
18910K 25544K 12420K 10045K 19090K 21376K 14777K 14205K 16861K 18749K 17641K
16660K 17246K 18003K 17850K 17427K 17540K 17796K 17808K 17651K 17649K 17699K
17754K
Same for 16M
-ms 16M
10048K 12984K 7181K 6125K 10127K 11141K 8215K 7974K 9169K 9978K 9468K 9054K
9286K 9634K 9606K 9404K 9432K 9548K 9569K 9488K 9489K 9524K 9536K 9522K 9515K
9515K 9526K 9533K 9524K 9516K 9515K 9526K 9533K 9524K 9516K 9515K 9526K 9533K
9524K 9516K 9515K 9526K 9533K 9524K 9516K 9515K 9526K 9533K 9524K 9516K 9515K
9526K 9533K
One possible explanation is that if we allow for more memory, object
addresses are less likely to be reused. With kaffe's excessive spilling
of references, a lot of these address will end up on the C stack.
In addition, methods such as callMethodV allocate half a K (that's with
MAXARGS==64) will leave 512 bytes of stack space (MAXARGS*sizeof(jvalue))
for each call (although that may not be as big of a problem for the JIT.
On the other hand, looking at the test I would expect the very same
stack addresses to be reused over and over.
I will do some measurements to find out just how much floating garbage
is kept alive by the various stacks. The finalizer stack is also a likely
culprit here.
Finally, there's also the possibility that our accounting is wrong.
Let me know if you have any comments/insights/opinions.
(Especially on the allocHeapSize issue, i.e., the amount by which to grow
the heap when we really run out --- maybe we can find some dynamic formula)
- Godmar
This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 19:57:29 EDT