Re: class gc & future gc interface

Date view Thread view Subject view Author view

From: Godmar Back (gback@marker.cs.utah.edu)
Date: Sun Jan 03 1999 - 21:09:14 EST


 Hi Tim,

welcome back.

> I suppose ther';s somce chance that starting tomorrow I'll be able to
> catch up on what been happenin in 3 weeks I've been away (never mind a
> week being a long time in politics - in free software its an eon).

Let's hope you'll like the changes. The major two changes were Archie's
string/utf8 stuff and the changes relating to class gc.

If I may be a bit philosophical here: in essence, your original approach
to kaffevm was to write it like Java code: in particular, exceptions
could be thrown from C code, and C data structures would be subject to
conservative garbage collection. There is a certain elegance to this
approach, which I actually appreciate a lot.

Unfortunately, it appears that it didn't work: exceptions caused deadlock,
memory leaks, and data corruption; they made implementing errors hard,
especially given the lack of language support for `catch' and `finally'
clauses. Relying on conservative collection turned out to be unnecessarily
expensive.

So the current process is in essence a loss of elegance, which (hopefully)
is offset by a win of functionality, correctness and performance.

Clearly, the usual trade-off between the fragility of explicit memory mngt.
compared to the costs of automatic memory management applies. If you look
at functions such as destroyClass (which is only half finished), you'll
see how tedious they are.

In the case of utf8s, we were able to reduce the fragility of explicit
memory mngt. a great deal by following the rules of COM reference counting.
Wilson's 92 survey paper on uniprocessor gc techniques even refers to ref
counting as a technique for automatic memory management. Its main drawbacks
are the inability to cope with cycles and the cost involved when having
to update ref counts frequently. I believe that in the case of utf8,
however, the ref counts are not updated frequently; and cycles are not
an issue. This is why I believe it was the definite way to go in this case.

As an aside, somebody reported success running Apache's JServ with
-mx 5m for extended periods of time without exploding; this is a good
indicator that class gc appears to work (this was even before fixing
utf8s.)

>
> Anyway, related to the GC issue - how exactly do other GC systems handle
> the walking of kmown data structures? I'm sure there must be a number of
> free ones out there that we could look at before picking an interface, and
> we should try to avoid picking one whihc prohibits the use of specific
> algorithms or implementations if possible. The one that usually springs
> to mind at this point is the Bohm-Weiner collector (or possibly not, I can
> never remember the name). I've toyed with the idea of using it with Kaffe
> for at least 12 months but not found the time to do it. I'ts used in Toba
> I thihnkand shows significant speed improvements over Kaffe's GC.
>

I'll ask Jason some more about Boehm's collector. I'll look at it some
myself, too. Clearly, it always springs to mind first. But I do have a
gut feeling that it may not necessarily be the way to go, for the following
reasons:

+ with Jason's new allocation scheme and with the currently ongoing precising
  of the heap walking part we've improved a lot. Hence, speed improvements
  may no longer be as significant. Plus, if we really wanted to get into
  machine-specific optimizations, there's plenty to do. For instance,
  I bet that using asm versions of "find first bit" would speed up walkObject.
  I had already mentioned that not using memset for zeroing out objects
  sped allocation time up on the x86. I believe that we can make Kaffe GC
  as fast as Boehm's, frankly.

+ Second, Boehm conservative collector is just that: a conservative collector.
  My long term vision is a precise collector, including precise scanning of
  the stacks. With Boehm being a conservative collector, it is conceivable
  that it makes certain compromises when the speed of pointer identification
  is concerned. We do not have to make such compromises.
  
+ I think that pretty much all groups that used it had to tweak it quite a bit:
  toba did, and so did gcj. I think some groups did it only for one or two
  architectures. I would not be surprised if Kaffe's gc is more portable
  by comparison. If we have to tweak Boehm too much, it would defeat the
  purpose to a certain extent.

+ We understand Kaffe's gc already, we do not understand Boehm's yet.
  This has implications on developing and debugging. In particular, we have
  full control over the state changes of objects between finalization and
  destruction. We can easily tweak it; just like we've been doing for
  class gc. I think that acquiring the some proficiency with Boehm's would
  take some time. In addition, the next steps will be to implement weak,
  soft, phantom and guard references. Will this be possible in Boehm's
  collector? How much tweaking will be necessary? It appears this is
  easier to do when you fully understand and control the collector.

  I also remember that toba actually had some problems with finalizers;
  I'll see if I can find what those were and whether they were related to the
  collector.

I can certainly be convinced otherwise; but these are some of the issues
at which we should look.

        - Godmar


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 19:57:28 EDT