Twitter GitHub Facebook Instagram dirv.me

Daniel Irvine on building software

Java 8 Streams are not like C# enumerables

15 September 2014

Java 8 Streams can only be consumed once. Once you’ve iterated over your stream, it’s done, and no longer of use.

IntStream stream = IntStream.range(0, 100); 
System.out.println(stream.filter(i -> i % 2 == 0).count()); 
System.out.println(stream.filter(i -> i % 2 != 0).count());

Running the above code will result in an exception, because it attempts to consume the same stream twice.

java.lang.IllegalStateException: stream has already been operated upon or closed
    at java.util.stream.AbstractPipeline.<init>(AbstractPipeline.java:203)
    at java.util.stream.IntPipeline.<init>(IntPipeline.java:91)
    at java.util.stream.IntPipeline$StatelessOp.<init>(IntPipeline.java:592)
    at java.util.stream.IntPipeline$9.<init>(IntPipeline.java:332)
    at java.util.stream.IntPipeline.filter(IntPipeline.java:331)

This is likely to catch out any C# programmers making the jump to Java. Although Java 8’s streams look a lot like LINQ, they are different in many ways. The biggest is that a stream is not reusable. This seems like a weakness, but it’s actually a blessing in disguise, as we’ll now see.

The example above in C# looks like this:

var stream = Enumerable.Range(0, 100);
Console.WriteLine(stream.Where(i => i % 2 == 0).Count());
Console.WriteLine(stream.Where(i => i % 2 != 0).Count());

Giving the expected output:

50
50

In these examples, the source data provider is simply a list of 100 integers. But what if iterating over the source involves retrieving rows from a database table, or making a network call? The operation will be twice as slow, but may return the same result, leaving you none the wiser that your code is performing sub-optimally.

To solve that, you can separate the retrieval of data from your source and any subsequent calls. In C# one way to do that is with a call to ToList().

var stream = Enumerable.Range(0, 100).ToList();
Console.WriteLine(stream.Where(i => i % 2 == 0).Count());
Console.WriteLine(stream.Where(i => i % 2 != 0).Count());

The stream variable now represents a list object, not a LINQ query expression. The call to ToArray has ensured that the expression is evaluated only once.

In Java, exactly the same applies:

IntStream stream = IntStream.range(0, 100).collect(Collectors.toList());

What if you actually want the C#-style behaviour? Easy, just generate your stream twice.

private IntStream stream() {
  return IntStream.range(0, 100); 
}

System.out.println(stream().filter(i -> i % 2 == 0).count()); 
System.out.println(stream().filter(i -> i % 2 != 0).count());

The benefit of the Java approach is that multiple iterations are opt-in, meaning you won’t fall victim to the C# “silent error” that would cause operation slow-down. If you really want to iterate over the source twice or more, then you’ll need to explicitly allow for that.

About the author

Daniel Irvine is a software craftsman at 8th Light, based in London. These days he prefers to code in Clojure and Ruby, despite having been a C++ and C# developer for the majority of his career.

For a longer bio please see danielirvine.com. To contact Daniel, send a tweet to @d_ir or use the comments section below.

Twitter GitHub Facebook Instagram dirv.me