collapse Blogs I Read
collapse Table of Contents

Comparing Java and C# Generics - Jonathan Pryor's web log

« Problems with Traditional Object Oriented Ideas | Main | Yet Another Random Update... »

Comparing Java and C# Generics

Or, What's Wrong With Java Generics?

What Are Generics

Java 5.0 and C# 2.0 have both added Generics, which permit a multitude of things:

  1. Improved compiler-assisted checking of types.
  2. Removal of casts from source code (due to (1)).
  3. In C#, performance advantages (discussed later).

This allows you to replace the error-prone Java code:

List list = new ArrayList ();
list.add ("foo");
list.add (new Integer (42));  // added by "mistake"

for (Iterator i = list.iterator (); i.hasNext (); ) {
    String s = (String) i.next (); 
       // ClassCastException for Integer -> String
    // work on `s'
    System.out.println (s);
}

with the compiler-checked code:

// constructed generic type
List<String> list = new ArrayList<String> ();
list.add ("foo");
list.add (42); // error: cannot find symbol: method add(int)
for (String s : list)
    System.out.println (s);

The C# equivalent code is nigh identical:

IList<string> list = new List<string> ();
list.Add ("foo");
list.Add (42); // error CS1503: Cannot convert from `int' to `string'
foreach (string s in list)
    Console.WriteLine (s);

Terminology

A Generic Type is a type (classes and interfaces in Java and C#, as well as delegates and structs in C#) that accepts Generic Type Parameters. A Constructed Generic Type is a Generic Type with Generic Type Arguments, which are Types to actually use in place of the Generic Type Parameters within the context of the Generic Type.

For simple generic types, Java and C# have identical syntax for declaring and using Generic Types:

class GenericClass<TypeParameter1, TypeParameter2>
{
    public static void Demo ()
    {
        GenericClass<String, Object> c = 
            new GenericClass<String, Object> ();
    }
}

In the above, GenericClass is a Generic Type, TypeParameter1 and TypeParameter2 are Generic Type Parameters for GenericClass, and GenericClass<String, Object> is a Constructed Generic Type with String as a Generic Type Argument for the TypeParameter1 Generic Type Parameter, and Object as the Generic Type Argument for the TypeParameter2 Generic Type Parameter.

It is an error in C# to create a Generic Type without providing any Type Arguments. Java permits creating Generic Types without providing any Type Arguments; these are called raw types:

Map rawMap = new HashMap <String, String> ();

Java also permits you to leave out Generic Type Arguments from the right-hand-side. Both raw types and skipping Generic Type Arguments elicit a compiler warning:

Map<String, String> correct = new HashMap<String, String> ();
    // no warning, lhs matches rhs
Map<String, String> incorrect = new HashMap ();
    // lhs doesn't match rhs; generates the warning:
    //  Note: gen.java uses unchecked or unsafe operations.
    //  Note: Recompile with -Xlint:unchecked for details.

Compiling the above Java code with -Xlint:unchecked produces:

gen.java:9: warning: [unchecked] unchecked conversion
found   : java.util.HashMap
required: java.util.Map<java.lang.String,java.lang.String>
                Map<String, String> incorrect = new HashMap ();

Note that all "suspicious" code produces warnings, not errors, under Java. Only provably wrong code generate compiler errors (such as adding an Integer to a List<String>).

(Also note that "suspicious" code includes Java <= 1.4-style use of collections, i.e. all collections code that predates Java 5.0. This means that you get lots of warnings when migrating Java <= 1.4 code to Java 5.0 and specifying -Xlint:unchecked.)

Aside from the new use of `<', `>', and type names within constructed generic type names, the use of generic types is essentially identical to the use of non-generic types, though Java has some extra flexibility when declaring variables.

Java has one added wrinkle as well: static methods of generic classes cannot reference the type parameters of their enclosing generic class. C# does not have this limitation:

class GenericClass<T> {
    public static void UseGenericParameter (T t) {}
        // error: non-static class T cannot be 
        // referenced from a static context
}

class Usage {
    public static void UseStaticMethod () {
        // Valid C#, not valid Java
        GenericClass<int>.UseGenericParameter (42);
    }
}

Generic Methods

Java and C# both support generic methods, in which a (static or instance) method itself accepts generic type parameters, though they differ in where the generic type parameters are declared. Java places the generic type parameters before the method return type:

class NonGenericClass {
    public static <T> T max (T a, T b) {/*...*/}
}

while C# places them after the method name:

class NonGenericClass {
    static T Max<T> (T a, T b) {/*...*/}
}

Generic methods may exist on both generic- and non-generic classes and interfaces.

Constraints

What can you do with those Generic Type Parameters within the class or method body? Not much:

class GenericJavaClass<T> {
    T[] arrayMember  = null;
    T   singleMember = null;

    public static void Demo ()
    {
        T localVariable = 42; // error
        T localVariable2 = null;

        AcceptGenericTypeParameter (localVariable);
    }

    public static void AcceptGenericTypeParameter (T t)
    {
        System.out.println (t.toString ()); // ok
        System.out.println (t.intValue ()); 
            // error: cannot find symbol
    }
}

class GenericCSharpClass<T> {
    T[] arrayMember  = null;
    T   singleMember = default(T);

    public static void Demo ()
    {
        T localVariable  = 42; // error
        T localVariable2 = default(T);

        AcceptGenericTypeParameter (localVariable);
    }

    public static void AcceptGenericTypeParameter (T t)
    {
        System.out.println (t.ToString ());     // ok
        System.out.println (t.GetTypeCode ());  // error: cannot find symbol
    }
}

So how do we call non-Object methods on objects of a generic type parameter?

  1. Cast the variable to a type that has the method you want (and accept the potentially resulting cast-related exceptions).
  2. Place a constraint on the generic type parameter. A constraint is a compile-time assertion that the generic type argument will fulfill certain obligations. Such obligations include the base class of the generic type argument, any implemented interfaces of the generic type argument, and (in C#) whether the generic type argument's type has a default constructor, is a value type, or a reference type.

Java Type Constraints

Java type and method constraints are specified using a "mini expression language" within the `<' and `>' declaring the generic type parameters. For each type parameter that has constraints, the syntax is:

TypeParameter ListOfConstraints

Where ListOfConstraints is a `&'-separated list of one of the following constraints:

(`&' must be used instead of `,' because `,' separates each generic type parameter.)

The above constraints also apply to methods, and methods can use some additional constraints described below.

class GenericClass<T extends Number & Comparable<T>> {
    void print (T t) {
        System.out.println (t.intValue ()); // OK
    }
}

class Demo {
    static <U, T extends U>
    void copy (List<T> source, List<U> dest) {
        for (T t : source)
            dest.add (t);
    }

    static void main (String[] args) {
        new GenericClass<Integer>().print (42);
            // OK: Integer extends Number
        new GenericClass<Double>().print (3.14159);
            // OK: Double extends Number
        new GenericClass<String>().print ("string");
            // error: <T>print(T) in gen cannot be applied 
            //  to (java.lang.String)

        ArrayList<Integer> ints = new ArrayList<Integer> ();
        Collections.addAll (ints, 1, 2, 3);
        copy (ints, new ArrayList<Object> ());
            // OK; Integer inherits from Object
        copy (ints, new ArrayList<String> ());
            // error: <U,T>copy(java.util.List<T>,
            //  java.util.List<U>) in cv cannot be 
            //  applied to (java.util.ArrayList<java.lang.Integer>,
            //  java.util.ArrayList<java.lang.String>)
    }
}

C# Constraints

C# generic type parameter constraints are specified with the context-sensitive where keyword, which is placed after the class name or after the method's closing `)'. For each type parameter that has constraints, the syntax is:

where TypeParameter : ListOfConstraints

Where ListOfConstraints is a comma-separated list of one of the following constraints:

class GenericClass<T> : IComparable<GenericClass<T>>
    where T : IComparable<T>
{
    private GenericClass () {}

    void Print (T t)
    {
        Console.WriteLine (t.CompareTo (t));
            // OK; T must implement IComparable<T>
    }

    public int CompareTo (GenericClass<T> other)
    {
        return 0;
    }
}

class Demo {
    static void OnlyValueTypes<T> (T t) 
        where T : struct
    {
    }

    static void OnlyReferenceTypes<T> (T t) 
        where T : class
    {
    }

    static void Copy<T, U> (IEnumerable<T> source, 
            ICollection<U> dest)
        where T : U, IComparable<T>, new()
        where U : new()
    {
        foreach (T t in source)
            dest.Add (t);
    }

    static T CreateInstance<T> () where T : new()
    {
        return new T();
    }

    public static void Main (String[] args)
    {
        new GenericClass<int>.Print (42);
            // OK: Int32 implements IComparable<int>
        new GenericClass<double>.Print (3.14159);
            // OK: Double implements IComparable<double>
        new GenericClass<TimeZone>.Print (
            TimeZone.CurrentTimeZone);
            // error: TimeZone doesn't implement 
            //  IComparable<TimeZone>

        OnlyValueTypes (42);    // OK: int is a struct
        OnlyValueTypes ("42");
            // error: string is a reference type

        OnlyReferenceTypes (42);
            // error: int is a struct
        OnlyReferenceTypes ("42");  // OK

        CreateInstance<int> ();
            // OK; int has default constructor
        CreateInstance<GenericClass<int>> ();
            // error CS0310: The type `GenericClass<int>' 
            //  must have a public parameterless constructor
            //  in order to use it as parameter `T' in the 
            //  generic type or method 
            //  `Test.CreateInstance<T>()'

        // In theory, you could do `Copy' instead of 
        // `Copy<...>' below, but it depends on the 
        // type inferencing capabilities of your compiler.
        Copy<int,object> (new int[]{1, 2, 3}, 
            new List<object> ());
            // OK: implicit int -> object conversion exists.
        Copy<int,AppDomain> (new int[]{1, 2, 3}, 
            new List<AppDomain> ());
            // error CS0309: The type `int' must be 
            //  convertible to `System.AppDomain' in order 
            //  to use it as parameter `T' in the generic 
            //  type or method `Test.Copy<T,U>(
            //      System.Collections.Generic.IEnumerable<T>, 
            //      System.Collections.Generic.ICollection<U>)'
    }
}

Java Wildcards (Java Method Constraints)

Java has additional support for covarient- and contravariant generic types on method declarations.

By default, you cannot assign an instance of one constructed generic type to an instance of another generic type where the generic type arguments differ:

// Java, though s/ArrayList/List/ for C#
List<String> stringList = new ArrayList<String> ();
List<Object> objectList = stringList; // error

The reason for this is quite obvious with a little thought: if the above were permitted, you could violate the type system:

// Assume above...
stringList.add ("a string");
objectList.add (new Object ());
// and now `stringList' contains a non-String object!

This way leads madness and ClassCastExceptions. :-)

However, sometimes you want the flexibility of having different generic type arguments:

static void cat (Collection<Reader> sources) throws IOException {
    for (Reader r : sources) {
        int c;
        while ((c = r.read()) != -1)
            System.out.print ((char) c);
    }
}

Many types implement Reader, e.g. StringReader and FileReader, so we might want to do this:

Collection<StringReader> sources = 
    new Collection<StringReader> ();
Collections.addAll (sources, 
    new StringReader ("foo"), 
    new StringReader ("bar"));
cat (sources);
// error: cat(java.util.Collection<java.io.Reader>) 
//  in gen cannot be applied to 
//  (java.util.Collection<java.io.StringReader>)

There are two ways to make this work:

  1. use Collection<Reader> instead of Collection<StringReader>:
    Collection<Reader> sources = new Collection<Reader> ();
    Collections.addAll (sources, 
        new StringReader ("foo"), 
        new StringReader ("bar"));
    cat (sources);
  2. Use wildcards.

Unbounded Wildcards

If you don't care about the specific generic type arguments involved, you can use `?' as the type parameter. This is an unbounded wildcard, because the `?' can represent anything:

static void printAll (Collection<?> c) {
    for (Object o : c)
        System.out.println (o);
}

The primary utility of unbounded wildcards is to migrate pre-Java 5.0 collection uses to Java 5.0 collections (thus removing the probably thousands of warnings -Xlint:unchecked produces) in the easiest manner.

This obviously won't help for cat (above), but it's also possible to "bind" the wildcard, to create a bounded wildcard.

Bounded Wildcards

You create a bounded wildcard by binding an upper- or lower- bound to an unbounded wildcard. Upper bounds are specified via extends, while lower bounds are specified via super. Thus, to allow a Collection parameter that accepts Reader instances or any type that derives from Reader:

static void cat (Collection<? extends Reader> c) 
    throws IOException
{
    /* as before */
}

This permits the more desirable use:

Collection<StringReader> sources = 
    new Collection<StringReader> ();
Collections.addAll (sources, 
    new StringReader ("foo"), 
    new StringReader ("bar"));
cat (sources);

Bounded wildcards also allow you to reduce the number of generic parameters you might otherwise want/need a generic method; compare this Demo.copy to the previous Java Demo.copy implementation:

class Demo {
    static <T> void copy (List<? extends T> source, 
        List<? super T> dest)
    {
        for (T t : source)
            dest.add (t);
    }
}

C# Equivalents

C# has no direct support for bounded or unbounded wildcards, and thus doesn't permit declaring class- or method-level variables that make use of them. However, if you can make the class/method itself generic, you can create equivalent functionality.

A Java method taking an unbounded wildcard would be mapped to a generic C# method with one generic type parameter for each unbound variable within the Java method:

static void PrintAll<T> (IEnumerable<T> list) {
    foreach (T t in list) {
        Console.WriteLine (t);
    }
}

This permits working with any type of IEnumerable<T>, e.g. List<int> and List<string>.

A Java method taking an upper bounded wildcard can be mapped to a generic C# method with one generic type parameter for each bound variable, then using a derivation constraint on the type parameter:

static void Cat<T> (IList<T> sources)
    where T : Stream
{
    for (Stream s : sources) {
        int c;
        while ((c = r.ReadByte ()) != -1)
            Console.Write ((char) c);
    }
}

A Java method taking a lower bounded wildcard can be mapped to a generic C# method taking two generic type parameters for each bound variable (one is the actual type you care about, and the other is the super type), then using a derivation constraint between your type variables:

static void Copy<T,U> (IEnumerable<T> source, 
        ICollection<U> dest)
    where T : U
{
    foreach (T t in source)
        dest.Add (t);
}

Generics Implementation

How Java and C# implement generics has a significant impact on what generics code can do and what can be done at runtime.

Java Implementation

Java Generics were originally designed so that the .class file format wouldn't need to be changed. This would have meant that Generics-using code could run unchanged on JDK 1.4.0 and earlier JDK versions.

However, the .class file format had to change anyway (for example, generics permits you to overload methods based solely on return type), but they didn't revisit the design of Java Generics, so Java Generics remains a compile-time feature based on Type Erasure.

Sadly, you need to know what type erasure is in order to actually write much generics code.

With Type Erasure, the compiler transforms your code in the following manner:

  1. Generic types (classes and interfaces) retain the same name, so you cannot have a generic class Foo and a non-generic Foo<T> in the same package -- these are the same type. This is the raw type.
  2. All instances of generic types become their corresponding raw type. So a List<String> becomes a List. (Thus all "nested" uses of generic type parameters -- in which the generic type parameter is used as a generic type argument of another generic type -- are "erased".)
  3. All instances of generic type parameters in both class and method scope become instances of their closest matching type:
    • If the generic type parameter has an extends constraint, then instances of the generic type parameter become instances of the specified type.
    • Otherwise, java.lang.Object is used.
  4. Generic methods also retain the same name, and thus there cannot be any overloading of methods between those using generic type parameters (after the above translations have occurred) and methods not using generic type parameters (see below for example).
  5. Runtime casts are inserted by the compiler to ensure that the runtime types are what you think they are. This means that there is runtime casting that you cannot see (the compiler inserts the casts), and thus generics confer no performance benefit over non-generics code.

For example, the following generics class:

class GenericClass<T, U extends Number> {
    T tMember;
    U uMember;

    public T getFirst (List<T> list) {
        return list.get (0);
    }

    // in bytecode, this is overloading based on return type
    public U getFirst (List<U> list) {
        return list.get (0);
    }

    //
    // This would be an error -- doesn't use generic type parameters
    // and has same raw argument list as above two methods:
    // 
    //  public Object getFirst (List list) {
    //      return list.get (0);
    //  }
    //

    public void printAll (List<U> list) {
        for (U u : list) {
            System.out.println (u);
        }
    }
}

Is translated by the compiler into the equivalent Java type:

class GenericClass {
    Object tMember;
    Number uMember; // as `U extends Number'

    public Object getFirst (List list) {
        return list.get (0);
    }

    public Number getFirst (List list) {
        // note cast inserted by compiler
        return (Number) list.get (0);
    }

    public void printAll (List list) {
        for (Iterator i=list.iterator (); i.hasNext (); ) {
            // note cast inserted by compiler
            Number u = (Number) i.next ();
            System.out.println (u);
        }
    }
}

.NET Implementation

.NET adds a number of new instructions to its intermediate language to support generics. Consequently generics code cannot be directly used by languages that do not understand generics, though many generic .NET types also implement the older non-generic interfaces so that non-generic languages can still use generic types, if not directly.

The extension of IL to support generics permits type-specific code generation. Generic types and methods can be constructed over both reference (classes, delegates, interfaces) and value types (structs, enumerations).

Under .NET, there will be only one "instantiation" (JIT-time code generation) of a class which will be used for all reference types. (This can be done because (1) all reference types have the same representation as local variables/class fields, a pointer, and (2) generics code has a different calling convention in which additional arguments are implicitly passed to methods to permit runtime type operations.) Consequently, a List<string> and a List<object> will share JIT code.

No additional implicit casting is necessary for this code sharing, as the IL verifier will prevent violation of the type system. It is not Java Type Erasure.

Value types will always get a new JIT-time instantiation, as the sizes of value types will differ. Consequently, List<int> and List<short> will not share JIT code.

Currently, Mono will always generate new instantiations for generic types, for both value and reference types (i.e. JIT code is never shared). This may change in the future, and will have no impact on source code/IL. (It will impact runtime performance, as more memory will be used.)

However, there are still some translations performed by the compiler, These translations have been standardized for Common Language Subset use, though these specific changes are not required:

Runtime Environment

Generics implementations may have some additional runtime support.

Java Runtime Environment

Java generics are a completely compile-time construct. You cannot do anything with generic type parameters that rely in any way on runtime information. This includes:

In short, all of the following produce compiler errors in Java:

static <T> void genericMethod (T t) {
    T newInstance = new T (); // error: type creation
    T[] array = new T [0];    // error: array creation

    Class c = T.class;        // error: Class querying

    List<T> list = new ArrayList<T> ();
    if (list instanceof List<String>) {}
        // error: illegal generic type for instanceof
}
Array Usage

The above has some interesting implications on your code. For example, how would you create your own type-safe collection (i.e. how is ArrayList<T> implemented)?

By accepting the unchecked warning -- you cannot remove the warning. Fortunately you'll only see the warning when you compile your class, and users of your class won't see the unchecked warnings within your code. There are two ways to do it, the horribly unsafe way and the safe way.

The horribly unsafe way works for simple cases:

static <T> T[] unsafeCreateArray (T type, int size) {
    return (T[]) new Object [size];
}

This seems to work for typical generics code:

static <T> void seemsToWork (T t) {
    T[] array = unsafeCreateArray (t, 10);
    array [0] = t;
}

But it fails horribly if you ever need to use a non-generic type:

static void failsHorribly () {
    String[] array = unsafeCreateArray ((String) null, 10);
    // runtime error: ClassCastException
}

The above works if you can guarantee that the created array will never be cast to a non-Object array type, so it's useful in some limited contexts (e.g. implementing java.util.ArrayList), but that's the extent of it.

If you need to create the actual runtime array type, you need to use java.lang.reflect.Array.newInstance and java.lang.Class<T>:

static <T> T[] safeCreateArray (Class<T> c, int size) {
    return (T[])java.lang.reflect.Array.newInstance(c,size);
}

static void actuallyWorks () {
	String[] a1 = safeCreateArray(String.class, 10);
}

Note that this still generates a warning by the compiler, but no runtime exception will occur.

C# Runtime Environment

.NET provides extensive runtime support for generics code, permitting you to do everything that Java doesn't:

static void GenericMethod<T> (T t)
    where T : new()
{
    T newInstance = new T (); // OK - new() constraint.
    T[] array = new T [0];    // OK

    Type type = typeof(T);    // OK

    List<T> list = new List<T> ();
    if (list is List<String>) {} // OK
}

C# also has extensive support for querying generic information at runtime via System.Reflection, such as with System.Type.GetGenericArguments().

What C# doesn't support is non-default constructor declaration, non-interface or base-type method declaration, and static method declaration. Since operator overloading is based on static methods, this means that you cannot generically use arithmetic unless you introduce your own interface to perform arithmetic:

// This is what I'd like:
class Desirable // NOT C#
{
    public static T Add<T> (T a, T b)
        where T : .op_Addition(T,T)
    {
        return a + b;
    }
}

// And this is what we currently need to do:
interface IArithmeticOperations<T> {
    T Add (T a, T b);
    // ...
}

class Undesirable {
    public static T Add<T> (IArithmeticOperations<T> ops, 
        T a, T b)
    {
        return ops.Add (a, b);
    }
}

Summary

The generics capabilities in Java and .NET differ significantly. Syntax wise, Java and C# generics initially look quite similar, and share similar concepts such as constraints. The semantics of generics is where they differ most, with .NET permitting full runtime introspection of generic types and generic type parameters in ways that are obvious in their utility (instance creation, array creation, performance benefits for value types due to lack of boxing) and completely lacking in Java.

In short, all that Java generics permit is greater type safety with no new capabilities, with an implementation that permits blatant violation of the type system with nothing more than warnings:

List<String> stringList = new ArrayList<String> ();
List rawList            = stringList;
    // only triggers a warning
List<Object> objectList = rawList;
    // only triggers a warning
objectList.add (new Object ());
for (String s : stringList)
    System.out.println (s); 
        // runtime error: ClassCastException due to Object.

This leads to the recommendation that you remove all warnings from your code, but if you try to do anything non-trivial (apparently typesafe arrays is non-trivial), you get into scenarios where you cannot remove all warnings.

Contrast this with C#/.NET, where the above code isn't possible, as there are no raw types, and converting a List<string> to a List<object> would (1) require an explicit cast (as opposed to the complete lack of casts in the above Java code), and (2) generate an InvalidCastException at runtime from the explicit cast.

Furthermore, C#/.NET convey additional performance benefits due to the lack of required casts (as the verifier ensures everything is kosher) and support for value types (Java generics don't work with the builtin types like int), thus removing the overhead of boxing, and C# permits faster, more elegant, more understandable, and more maintainable code.

Posted on 31 Aug 2007 | Path: /development/ | Permalink
blog comments powered by Disqus