Capture of local variables is restricted to those that are effectively final. Lifting this restriction would present implementation difficulties, but it would also be undesirable; its presence prevents the introduction of a new class of multithreading bugs involving local variables. Local variables in Java have until now been immune to race conditions and visibility problems because they are accessible only to the thread executing the method in which they are declared. But a lambda can be passed from the thread that created it to a different thread, and that immunity would therefore be lost if the lambda, evaluated by the second thread, were given the ability to mutate local variables. Even the ability to read the value of mutable local variables from a different thread would introduce the necessity for synchronization or the use of volatile
in order to avoid reading stale data.
An alternative way to view this restriction is to consider the use cases that it discourages. Mutating local variables in idioms like this:
int sum = 0;
list.forEach(e -> { sum += e.size(); }); // illegal; local variable 'sum' is not effectively final
frustrates a principal purpose of introducing lambdas. The major advantage of passing a function to the forEach
method is that it allows strategies that distribute evaluation of the function for different arguments to different threads. The advantage of that is lost if these threads have to be synchronized to avoid reading stale values of the captured variable.
The restriction of capture to effectively immutable variables is intended to direct developers’ attention to more easily parallelizable, naturally thread-safe techniques. For example, in contrast to the accumulation idiom above, the statement
int sum = list.map(e -> e.size()).reduce(0, (a, b) -> a+b);
creates a pipeline in which the results of the evaluations of the map
method can much more easily be executed in parallel, and subsequently gathered together by the reduce
operation.
The restriction on local variables helps to direct developers using lambdas aways from idioms involving mutation; it does not prevent them. Mutable fields are always a potential source of concurrency problems if sharing is not properly managed; disallowing field capture by lambda expressions would reduce their usefulness without doing anything to solve this general problem.
An effectively final local variable is
one whose value is never changed.
foreach
takes a function and applies
it to every element (see this page).
TBS
is there a reason why this forEach can’t be executed in parallel:
var sum = new AtomicInteger();
list.forEach(e -> { sum.getAndAdd(e.size()); });
It’s true that using an atomic variable like this eliminates the possbility of race conditions. But even though this code is thread-safe, it’s not a style to encourage. The more parallelism there is in the execution of
e.size()
, the greater is the contention onsum
. So “the main purpose of introducing lambdas” is still frustrated.fair enough. however, I would like to know what the rationale was behind the decision of executing every forEach or map in parallel, instead of making parallel execution explicit and up to the developer.
the most pressing one for the Java platform is that they make it easier to distribute processing of collections over multiple threads
Assuming the forEach method executes serially, this code below:
var threadPool = new ThreadPoolExecutor(..);
pointList.forEach(p -> threadPool.run(() -> p.move(p.y, p.x)));
could then be implemented like this in order to reduce the boilerplate if the intention was to execute the same lambda in parallel
pointList.asParallel().forEach(p -> p.move(p.y, p.x));
I don’t really understand this limit. Why does C# or Scala permit this?
I don’t think they want to have multithreading bugs involving local variables…so where is the true motivation?
Implementation difficulties? conceptual prejudices?
If I have closures I want to have TRUE closure.
They may not want to have race conditions affecting local variables, but if they provide concurrent write access to them from different threads, that’s exactly what they will get unless the programmer ensures mutual exclusion. But if she does that, then she will lose the performance advantage of multithreading through lock contention.
The language designers considered this issue seriously — to the point of canvassing widely for use cases — and concluded that in almost all the situations in which people want to mutate local variables, there are better solutions using parallel-friendly idioms (ie variants of map-reduce).
Here’s a fuller explanation of the rationale for this decision.