internalization

Date view Thread view Subject view Author view

From: Godmar Back (gback@cs.utah.edu)
Date: Fri Dec 18 1998 - 18:16:45 EST


 Archie,

there is one fundamental reason to oppose your string internalization
project. I think this is the reason that Java's designers made
string internalization optional even though java strings were defined
to be immutable.

The reason is that you pay an overhead at creation time: you must
lookup the string, which involves taking locks, grabbing a lock (*),
walking a hashtable, comparing the contents etc..
Plus, you'll have to remove the entries from the table upon
deallocation.

You will get a payback for this overhead if and only if you either
intern that string or if you perform equals on it. Now without
traces, I cannot tell you whether that will be a win for most
string creation, but there is reason to doubt it.

The only thing that you would save at creation time is the actual
allocation of memory (the char[] contents array) if you've already
found an entry in the table of interned strings. Again, this is
probably not the common case.

>From this perspective, you might want to rethink the idea.
Now certainly, for given applications that create a lot of strings
with the same content (say compilers) and needs to do equals fairly
often, interning them is a win. However, this is what the explicit
String.intern is for.

As for utf8, the same trade-off applies here. What would be needed
is a heuristics as to which utf8 to intern. For instance, intern
all utf8 that start with "java", such as "java/lang/Object" etc.

        - Godmar

(*) do we even do that or is string.c broken here?
String.intern is not synchronized...


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 19:57:23 EDT