reference counting for utf8consts

Date view Thread view Subject view Author view

From: Godmar Back (gback@cs.utah.edu)
Date: Sun Dec 20 1998 - 00:38:42 EST


>
> I'm working on intern'ing UTF-8 constants. So any two Utf8Const *
> pointers will actually be equal if they have the same contents.
>
> The question I have now is how to remove Utf8Const's from the
> hashtable when all of their references have been garbage collected
> away?
>
> Is it correct that I just need to write a special finalizer
> "finalizeUtf8Const" in gc-incremental.c that removes the string
> from the hashtable?
>

 I would say so, but I would like to bring a different solution in
the discussion.

Relying on the gc to mark utf8s assumes that we will either stick to
a conservative collector, or gc_add_ref and gc_rm_ref references
when necessary.

To understand that, consider a statement such as

        Utf8Const* c = makeUtf8Const(name, len);
-- (a) -->
        f->field = c;
        c = 0;
-- (b) -->

At (a), the utf8 object is only pinned down because we scan the C stack,
find the reference to the object is c, and mark it.
If c is stored in, say, a Field.name field --- then it is this Field
that keeps c alive (b). When the Field is walked, the utf8 is marked.
In cases where the utf8 is not walked --- entries in the class entry
pool are an example, we must pin it down with gc_add_ref and gc_rm_ref
it when the entry gets destroyed.

My proposal is to use reference counting for utf8consts instead.
"makeUtf8Const" either creates an entry with ref count 1, or it
increases the ref count if one exists. "releaseRefUtf8Const" will
decrement the count, freeing it if it reaches zero.
"addRefUtf8Const" will add a reference.

I propose to use Microsoft's COM reference counting conventions.
It's easy and one of the few technically sound things MS has produced.

It does require some rewriting. Let's look at an example:

Say this example in function `classFromSig':

    return (loadClass(makeUtf8Const(start, end-start), loader, einfo));

would have to be rewritten as:

    uc = makeUtf8Const(start, end-start);
    clazz = loadClass(uc, loader, einfo));
    releaseRefUtf8Const(uc);
    return (clazz);

Several things to note here:

+ If loadClass does not invoke addRefUtf8Const because it does not need
  to hold onto the utf8 -- say because all it does is to perform
  equalsUtf8Const on it, then the following releaseRefUtf8Const will
  immediately release the constants and reclaim the memory. Hence, the
  garbage collector will never have to worry about marking this utf8const,
  reducing gc pressure.

+ Clearly, just like malloc and free, reference counting is subject to
  mistakes. If you release a reference you don't own, dangling pointers can
  result. In general, this is considered error prone --- one of the reasons
  why Modula 3 and Lisp and Java came in this world. This, however, is where
  MS's COM rules come in: they allow you to instantly check that you do not
  wrongly add or release references.

  For instance, in this example, the rule says that a callee must acquire
  a reference. Therefore, there's no question as to whether loadClass
  should release the reference on its first argument. It should not.
  If loadClass needs to hold on to a pointer, it must add a reference.

  Similarly, if you call a function with an argument of type Utf8Const**,
  you will get back a reference you must release. Ditto for return values.
  If you drop a pointer, you must release the reference. All these
  properties can be checked *locally*, without knowing any kind of specific
  contract between caller and callee, as is often the case in libc functions
  (for instance, the caller of strdup must free the result, the caller
  of getpwnam must not etc.)

However, doing explicit memory management in this way will require us
to release all references manually. Consider again Field*. Its type
field stores the utf8 of the unresolved type name first. At some
point, this is overwritten with the resolved type. It is at this point
that the releaseRefUtf8Const must be inserted. The Field.name is also
a utf8const. This utf8 would have to be released in the finalizer of
Field. (*)

As my implementation of class garbage collection has shown, we still
don't fully understand just where things are stored and how. However,
I really think finding this all out is a doable task. I believe being
able to explicitly manage utf8's is very much worthwhile and should
greatly improve both memory usage and overall performance.

        - Godmar

(*) About Fields: Alexandre pointed out that there's no way to find
how many entries in the Field block are actually used, because you
can't find the class from Field (unlike from Method).
I think the answer to this problem is to
    a) eliminate the Field allocation type.
    b) allocate Field as fixed memory.
    c) walk them in walkClass
    d) free them in the class's destructor.
ditto for methods* and if2table, interfaces, itable2dtable, gc_layout
and most likely for dtable also.

Here's possible complications:
+ JNI code may hold Field* pointers in the Field array as jfieldID.
  What do the JNI rules say about that?

+ Ditto for java.lang.reflect.Field or java.lang.reflect.Method
  We will have to find solutions for these cases --- one such solution
  could be to add a pointer to the class to which the field belongs.
  (btw, this might already be broken at this moment.)


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 19:57:25 EDT