A stream is a sequence of values. The package java.util.stream
defines types for streams of reference values (Stream
) and some primitives (IntStream
, LongStream
, and DoubleStream
). Streams are like iterators in that they yield their elements as required for processing, but unlike them in that they are not associated with any particular storage mechanism. A stream is either partially evaluated—some of its elements remain to be generated—or exhausted, when its elements are all used up. A stream can have as its source an array, a collection, a generator function, or an IO channel; alternatively, it may be the result of an operation on another stream (see below). A partially evaluated stream may have infinitely many elements still to be generated, for example by a generator function.
Stream types define intermediate operations (resulting in new streams), e.g. map
, and terminal operations (resulting in non-stream values), e.g. forEach
. Calls on intermediate operations are often chained together in the style of a fluent API, forming a pipeline (as previously described). Terminal operations, as the name implies, terminate a method chain. Terminal operations are also called eager because invoking them causes them to consume values from the pipeline immediately, whereas intermediate operations, also called lazy, only produce values on demand. For example, assuming that strings
has been declared as a List<String>
, this code:
IntStream ints = strings.stream().mapToInt(s -> s.length()).filter(i -> i%2 != 0);
sets up a pipeline which will first produce a stream of int
values corresponding to the lengths of the elements of strings
, then pass on the odd ones only. But none of this happens as a result of the declaration of ints
. Processing only takes place when a statement like
ints.forEach(System.out::println);
uses an eager terminal operation to pull values down the pipeline.
The following table shows a small sample of operations on Stream
. These have been chosen for simplicity; also, in the same cause, bounded generic types in their signatures have been replaced by their bounds. (Intermediate and terminal stream operations are listed in greater detail here (tbs) and here (tbs).)
operation | interface used | λ signature | return type | return value | |
---|---|---|---|---|---|
sample lazy/intermediate operations | |||||
filter | Predicate<T> | T ➞ boolean | Stream<T> | stream containing input elements that satisfy the Predicate |
|
map | Function<T,R> | T ➞ R | Stream<R> | stream of values, the result of applying the Function to each input element | |
sorted | Comparator<T> | (T, T) ➞ int | Stream<T> | stream containing the input elements, sorted by the Comparator | |
limit, skip | Stream<T> | stream including only (resp. skipping) first n input elements | |||
sample eager/terminal operations | |||||
reduce | BinaryOperator<T> | (T, T) ➞ T | Optional<T> | result of reduction of input elements (if any) using supplied BinaryOperator |
|
findFirst | Predicate<T> | T ➞ boolean | Optional<T> | first input element satisfying Predicate (if any) |
|
forEach | Consumer<T> | T ➞ void | void | void, but applies the method of supplied Consumer to every input element |
Streams may be ordered or unordered. A stream whose source is an array, a List
, or a generator function, is ordered; one whose source is a Set
is unordered. Order is preserved by most intermediate operations; exceptions are sorted
, which imposes an ordering whether one was previously present or not, and unordered
, which removes any ordering that was present on the receiver. (This operation is provided for situations where ordering is not significant for the terminal operation, but the developer wants to take advantage of the greater efficiency of some operations when executed in parallel on unordered stream than on ordered ones.) Most terminal operations respect ordering; for example toArray
, called on an ordered stream, creates an array with element ordering corresponding to that of the stream. An exception is forEach
; the order in which stream elements are processed by this operation is undefined.
That is, elements of the input stream.
The Stream package says that elements can be added to a collection even though streams have been created on it, as long as the terminal operation hasn’t been started. I think this is too loose — no changes should be allowed at all between calling “stream()” and the completion of the terminal operation. It’s cleaner and easier to enforce and describe. The currently proposed rule would be as if one were allowed to call “iterator()” and then add elements to the container as long as the first “hasNext()” or “next()” call hadn’t happened. Why include this possibility at the risk of making it harder to enforce and describe the rule?
Interesting. I agree that the spec is inconsistent with the implementation of fail-fast iterators, which don’t allow any structural modification to their collection once they have been constructed. The motivation for that rule is to detect concurrent collection modification at any time, since any concurrent modification was considered an error for the non-threadsafe collections that use fail-fast iterators. Here the view taken is more relaxed: concurrent modification is in fact only dangerous if it takes place after processing has begun. I don’t see that this view is harder either to enforce or to describe.
Of course, none of this applies to concurrent collections; concurrent modification is expected both during iteration and stream processing.