In the preceding chapters, we have studied how to define and document APIs, of increasing complexity. Defining, implementing and interacting with APIs are important skills for software engineers. In this chapter and the next, we discuss two more advanced techniques that are widely used in practical APIs: iterators and generics. The former are a design pattern for offering a standard API for iterating over different data structures (like arrays, linked lists or balanced trees). The latter are a programming language feature that allows defining APIs that are parameterised by argument types.
Iterators are a solution to an API design problem: how to introduce an abstract API between a collection implementation and its clients, that hides the data structure used to implement the collection, without increasing the time or space (memory) usage of clients that iterate over the elements of the collection?
We illustrate the problem by means of the example of two data structures for storing a collection of objects, and a client that iterates over both of them:
public class ArrayList {
public Object[] elements;
}
public class LinkedList {
public static class Node {
public Object value;
public Node next;
}
public Node firstNode;
}
public class ClientProgram {
public void printAll(ArrayList arrayList) {
for (int i = 0; i < arrayList.elements.length; i++)
System.out.println(arrayList.elements[i]);
}
public void printAll(LinkedList linkedList) {
for (LinkedList.Node node = linkedList.firstNode;
node != null;
node = node.next)
System.out.println(node.value);
}
public void printBoth(ArrayList arrayList, LinkedList linkedList) {
printAll(arrayList);
printAll(linkedList);
}
}
Notice the following:
- No abstraction or encapsulation is applied to the two data structures; clients have direct access to the representation. Of course, it is recommended to apply abstraction and encapsulation as much as possible. However, in this example we do not yet apply it because it is the point of this text to figure out how such an abstraction should be designed in the first place.
- The class of linked list nodes is defined as a static nested class within the class of linked lists. This is essentially equivalent to defining it in the usual way outside of another class,
except that in order to refer to it from outside of the enclosing class, you need to qualify its name by the name of the enclosing class. For example, in the client program the class is referred to as
LinkedList.Node
. More details about static and non-static nested classes will follow later in this text. - When passing an object as an argument,
System.out.println
callstoString()
on it and prints the resulting string to the screen. - The two invocations of
printAll
will be resolved to the two overloaded implementations based on the static type of the argument.
Notice also that the client program has to duplicate the logic of printing all elements for the two data structures. The question we consider in this text is: how can we introduce an API that generalizes the two data structures, and allows the client to write the logic for printing all elements only once and apply it unchanged to both data structures, without hurting performance?
Let's first consider two approaches that are not a solution:
- We could introduce a method into both data structures that returns the elements as an array; the client could call it and then iterate over the array to print all elements. However, this would increase the memory usage of the client: temporarily converting the linked list to an array would require an amount of extra memory linear in the number of elements in the collection.
- Alternatively, we could implement a method
getElement(int i)
in theLinkedList
class that returns thei
th element of the list.printAll
could then use this method to iterate over the list in the same way as it iterates over the array. The downside of this approach is that it increases the time complexity:getElement(i)
requires a number of steps linear in the length of the list, which would giveprintAll
quadratic time complexity.
To see the solution, carefully consider both printAll
methods and notice that they follow a similar pattern: they use some piece of data to track where they are in the data structure: respectively an index into the array and a pointer to the current node of the list.
Then, they perform a test on the piece of data to see whether they have reached the end of the data structure; they retrieve the current element pointed to by the piece of data; and they update the piece of data to point to the next element.
The solution, then, is to encapsulate this piece of data into an object, which we will call an iterator, and provide methods to allow clients to test whether the iterator has reached the end of the data structure, to retrieve the current element, and to mutate the iterator so that it points to the next element.
Such an iterator API can take a few different forms, depending on whether and how some of these functionalities are combined into a single method. Whereas most programming languages define some kind of standard iterator API, they do so in different ways. However, in the end all of these forms are essentially equivalent.
We show the styles used by some of the most popular programming languages (translated into Java syntax) below:
// Java
public interface Iterator {
boolean hasNext();
/**
* Mutates the iterator to point to the next element and returns the current
* element.
*/
Object next();
}
// C#
public interface Enumerator {
Object getCurrent();
/**
* Mutates the enumerator to point to the next element, or returns `false` if
* the end has been reached.
*/
boolean moveNext();
}
// C++
public interface Iterator {
Object getCurrent(); // syntax in C++: *iterator
void moveNext(); // syntax in C++: iterator++
// in C++, to tell whether you have reached the end of the data structure, you
// have to test equality with a special "one-past-the-end" iterator
boolean equals(Iterator other);
}
// Python
public interface Iterator {
/**
* Throws a StopIteration exception if the end of the data structure has been
* reached.
*/
Object next();
}
// JavaScript
public interface Iterator {
// Nested classes inside interfaces are implicitly public and static
class NextResult { public Object value; public boolean done; }
/**
* If result.done is true, result.value is not an element but an "iterator
* return value".
*/
NextResult next();
}
Note: we also omitted some optional additional functionalities provided by some of the APIs. For example, Java's iterator API optionally supports removing the current element, and C#'s API optionally supports resetting the iterator so that it again points to the first element of the collection. In each case, the extra method throws an exception if the iterator object does not support the operation.
In the remainder of this text, we use the Java-style Iterator
API.
We can implement iterators for ArrayList
and LinkedList
as follows:
public class ArrayListIterator implements Iterator {
public ArrayList arrayList;
public int index;
public ArrayListIterator(ArrayList arrayList) {
this.arrayList = arrayList;
}
public boolean hasNext() { return index < arrayList.elements.length; }
public Object next() { return arrayList.elements[index++]; }
}
public class LinkedListIterator implements Iterator {
public LinkedList.Node node;
public LinkedListIterator(LinkedList linkedList) {
node = linkedList.firstNode;
}
public boolean hasNext() { return node != null; }
public Object next() {
Object result = node.value;
node = node.next;
return result;
}
}
Again, for now we do not apply any encapsulation.
This way, the client can now reuse the logic for printing all elements:
public class ClientProgram {
public void printAll(Iterator iterator) {
while (iterator.hasNext())
System.out.println(iterator.next());
}
public void printBoth(ArrayList arrayList, LinkedList linkedList) {
printAll(new ArrayListIterator(arrayList));
printAll(new LinkedListIterator(linkedList));
}
}
We can perform a further step of generalization, and further simplify the client code, by introducing an interface to be implemented by any collection that supports iteration:
public interface Iterable {
/** Returns a new iterator that points to the start of the data structure. */
Iterator iterator();
}
We can easily update classes ArrayList
and LinkedList
to implement this interface. For example, we can update class
ArrayList
as follows:
public class ArrayList implements Iterable {
public Object[] elements;
public Iterator iterator() { return new ArrayListIterator(this); }
}
This allows us to simplify the client code as follows:
public class ClientProgram {
public void printAll(Iterable iterable) {
for (Iterator iterator = iterable.iterator(); iterator.hasNext(); )
System.out.println(iterator.next());
}
public void printBoth(Iterable collection1, Iterable collection2) {
printAll(collection1);
printAll(collection2);
}
}
Notice that the client no longer even needs to know that it is dealing with an array-based list and a linked list.
Classes ArrayList
and ArrayListIterator
are not really independent classes; rather, they together implement the Iterable
abstraction. To make this explicit, it makes sense
to instead define ArrayListIterator
as a nested class inside of class ArrayList
:
public class ArrayList implements Iterable {
private Object[] elements;
private static class IteratorImpl implements Iterator {
private ArrayList arrayList;
private int index;
private IteratorImpl(ArrayList arrayList) { this.arrayList = arrayList; }
public boolean hasNext() { return index < arrayList.elements.length; }
public Object next() { return arrayList.elements[index++]; }
}
public Iterator iterator() { return new IteratorImpl(this); }
// Constructors and mutators not shown
}
One major advantage of defining the iterator API implementation as a nested class is that since nested classes have access to the private members of their enclosing class, we can now encapsulate
the elements
field. Another major advantage is that the nested class itself can be made private; it is now hidden from other toplevel classes, even if they are in the same package.
We can simplify the implementation of class ArrayList
by turning class
IteratorImpl
into a nonstatic nested class, more commonly referred to as an
inner class. In the same way that a nonstatic method takes an implicit this
argument that points to an instance of the enclosing class, an inner class has
an implicit EnclosingClass.this
field that points to an instance of the
enclosing class. Furthermore, each constructor takes an implicit
EnclosingClass.this
argument and uses it to initialize the implicit field:
public class ArrayList implements Iterable {
private Object[] elements;
private class IteratorImpl implements Iterator {
private int index;
private IteratorImpl() {}
public boolean hasNext() { return index < ArrayList.this.elements.length; }
public Object next() { return ArrayList.this.elements[index++]; }
}
public Iterator iterator() { return new IteratorImpl(); }
// Constructors and mutators not shown
}
This new version of IteratorImpl
is entirely equivalent to the previous one;
it is in fact identical, except that the explicit field arrayList
has been
replaced by an implicit field ArrayList.this
and the explicit
constructor parameter arrayList
has been replaced by an implicit one.
In fact, inside an inner class, we can refer to members of the enclosing class using their simple names; in case of nonstatic members, they are implicitly prefixed by EnclosingClass.this
.
This means we can further simplify the implementation of class ArrayList
as follows:
public class ArrayList implements Iterable {
private Object[] elements;
private class IteratorImpl implements Iterator {
private int index;
public boolean hasNext() { return index < elements.length; }
public Object next() { return elements[index++]; }
}
public Iterator iterator() { return new IteratorImpl(); }
// Constructors and mutators not shown
}
We also left out the constructor, which is now generated implicitly.
In fact, since class IteratorImpl
is only referred to inside method iterator()
, it is cleaner to define it as a local class inside method iterator()
:
public class ArrayList implements Iterable {
private Object[] elements;
public Iterator iterator() {
class IteratorImpl implements Iterator {
private int index;
public boolean hasNext() { return index < elements.length; }
public Object next() { return elements[index++]; }
}
return new IteratorImpl();
}
// Constructors and mutators not shown
}
Note: local classes are not visible outside the enclosing method, so there would be no point in adding
the private
keyword, and in fact this is not allowed.
As we will see later, an additional advantage of local classes is that they can (under certain conditions) refer to the parameters and local variables of the enclosing method.
We can apply one more simplification: since in our example class IteratorImpl
is referred to in only one place, to create an instance of it, we can replace it by an anonymous class.
This relieves us from the need to invent a name for it:
public class ArrayList implements Iterable {
private Object[] elements;
public Iterator iterator() {
return new Iterator() {
private int index;
public boolean hasNext() { return index < elements.length; }
public Object next() { return elements[index++]; }
};
}
// Constructors and mutators not shown
}
As you can see, an anonymous class instance creation expression consists of:
- the keyword
new
- the name of a class or interface to be implemented by the anonymous class
- a constructor argument list
- a class body
The interfaces Iterator
and Iterable
defined above exist in the Java Platform API in packages java.util
and java.lang
, respectively. When using those,
client code like
for (Iterator iterator = iterable.iterator(); iterator.hasNext(); ) {
Object element = iterator.next();
System.out.println(element);
}
can be written more concisely using an enhanced for
loop, as follows:
for (Object element : iterable) {
System.out.println(element);
}
All collection classes in package java.util
, such as java.util.ArrayList
and java.util.HashSet
, implement interface java.lang.Iterable
and can therefore be iterated over using an enhanced for
loop.
Note: the documentation for java.util.Iterator
specifies that the next
method shall throw a NoSuchElementException
if there are no elements left. For simplicity, in this course we ignore this requirement.
The kind of iterators we have seen so far are referred to as external iterators: the client requests an iterator object and then interacts with the iterator object to retrieve the successive elements of the collection. Another common type of iterator API is known as internal iterators. This is the primary type of iterators in the Ruby programming language. In the case of internal iterators, the client passes a consumer object to the collection; the collection then repeatedly calls the consumer object's accept
method, passing each element of the collection in turn as an argument.
This leads to an alternative definition of Iterable
:
public interface Iterable {
void forEach(Consumer consumer);
}
where interface Consumer
is defined as follows:
public interface Consumer {
void accept(Object value);
}
We can trivially make class ArrayList
implement this interface as follows:
public class ArrayList implements Iterable {
public Object[] elements;
public void forEach(Consumer consumer) {
for (int i = 0; i < elements.length; i++)
consumer.accept(elements[i]);
}
}
We can adapt the client program to use this API as follows:
public class ClientProgram {
public void printAll(Iterable iterable) {
iterable.forEach(new Consumer() {
public void accept(Object value) {
System.out.println(value);
}
});
}
public void printBoth(Iterable collection1, Iterable collection2) {
printAll(collection1);
printAll(collection2);
}
}
Notice that the client program implements interface Consumer
with an anonymous class whose accept
method prints the element to the screen.
It is worth pointing out that external iteration and internal iteration are not mutually exclusive.
In fact, Java's Iterable
interface provides both an iterator()
method and a forEach()
method:
public interface Iterable {
Iterator iterator();
default void forEach(Consumer consumer) {
for (Iterator i = iterator(); i.hasNext(); )
consumer.accept(i.next());
}
}
We can write this client program even more concisely using a lambda expression:
public class ClientProgram {
public void printAll(Iterable iterable) {
iterable.forEach((Object value) -> { System.out.println(value); });
}
public void printBoth(Iterable collection1, Iterable collection2) {
printAll(collection1);
printAll(collection2);
}
}
This lambda expression is exactly equivalent to the anonymous class instance creation expression it replaces; it is a very concise way
to implement a functional interface, an interface with a single abstract method. Interface Consumer
is a functional interface.
The lambda expression in the example implements interface Consumer
because this is the type of object that method forEach
expects to receive as an argument;
this is known as target typing.
In general, a lambda expression consists of a parameter list, an arrow, and a method body. We can, however, leave out the parameter types; furthermore,
if the method body is of the form { E; }
or { return E; }
, where E
is an expression, we can instead write just E
, and if there is only a single parameter,
we can leave out the parentheses:
iterable.forEach(value -> System.out.println(value));
Local classes, anonymous classes, and lambda expressions can refer to parameters and local variables from the enclosing method. Their value is copied into an implicit field of the resulting object; this is known as capturing.
For example, suppose we want to print only the elements that satisfy some condition, given by a Predicate
object:
public class ClientProgram {
public void printAll(Predicate condition, Iterable iterable) {
iterable.forEach(value -> {
if (condition.test(value))
System.out.println(value);
});
}
public void printBoth(Predicate condition,
Iterable collection1, Iterable collection2) {
printAll(condition, collection1);
printAll(condition, collection2);
}
}
The Consumer
object that results from evaluating the lambda expression has an implicit field that holds a copy of parameter condition
.
Java allows only effectively final variables to be captured this way. A variable is effectively final if it is not mutated after initialization.