Functional Programming

Learning goals

  • Understand what Functional Programming is and why it’s useful
  • Learn the common Functional Programming techniques in Java

Principles of Functional Programming

Functional Programming is a programming style where you think primarily in terms of functions, in the mathematical sense of the word.

A mathematical function is something that takes some inputs, and produces an output. Importantly that’s all it does – there are no side effects such as reading data from a web page, writing text to the console or modifying some shared data structure. In the absence of such side effects, the only thing that can influence the function’s behaviour is its inputs – so given a particular combination of inputs, we will always get back precisely the same outputs. A function that has these properties is called a pure function.

Functional programming is all about using pure functions as the building blocks of a program. Consider the following procedural (non-functional) piece of code:

public class Main {
    public static void main(String[] args) {
        int[] input = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
        List<Integer> list = new ArrayList<>();

        for (int i : input) {
            if (i % 2 == 0) {
                list.add(i);
            }
        }

        Integer[] output = list.toArray(new Integer[0]);
    }
}

This code takes an input array of integers, and produces an output array containing all the even numbers.

We call this code “procedural” because it defines a procedure for the computer to follow, step by step, to produce the result. We are talking to the computer at a detail level, specifying precisely which numbers to move around. We are also creating data structures that are modified as the program executes (the list).

Consider the functional alternative, which is to perform all this using pure functions:

public class Main {
    public static void main(String[] args) {
        int[] input = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
        int[] output = Arrays.stream(input)
                             .filter(i -> i % 2 == 0)
                             .toArray();
    }
}

This version is shorter. But that is not its main selling point. The real benefit is that we’re talking at a much more abstract level about what we want the program to achieve, rather than what we want it to do step by step. You can read the second line of code fairly directly:

  • Take input
  • Apply a filter to it
    • The filter is “number is even”
  • Convert the result to an array

Deep inside the computer’s CPU it will still be doing pretty much the same operations as the previous version, but we don’t need to know that – it’s a level of detail we don’t care about and it might not even be true. In the more general case where we don’t know the concrete type of input (we might define it as List<int>, for instance), it may be a data structure that intrinsically supports parallel programming and is in fact doing the filtering on a multitude of different threads. We don’t care, because we’re talking at a much higher level about what we want to achieve (filtering, via the Streams.filter method), rather than how we want to achieve it (a foreach loop, or perhaps some cunning multithreaded processing).

The benefits of Functional Programming

The example above is representative of the benefits of functional programming. We can summarise these as follows:

  • Readability. By expressing our intentions at a higher level than in the procedural style, we can show our intent.
  • Conciseness. By the same token, we can implement the same functionality in fewer lines of code. Fewer lines means fewer bugs (certainly fewer characters you can mistype), so provided we retain readability conciseness is a good thing.
  • Intrinsic support for parallelism. Even if there’s no magic parallelism happening behind the scenes, functional programming is inherently well suited to multithreaded computing and carrying out different work strands in parallel. This is because its focus on pure functions means that we generally have immutable data structures – that is, data structures whose contents never change. As pure functions do not modify any state of the program, two functions never try to modify the same shared state concurrently, thus avoiding race conditions.

Immutablity

Data not changing doesn’t mean we can’t perform interesting computations. We can create new data structures which are different from the original ones, but we avoid modifying those that we have already created. In the examples above the input and output lists were never modified in either case, but the procedural code had a listvariable that was modified, and the functional style eliminated this.

The limitations of Functional Programming

Clearly, there are some spheres in which Functional Programming fits naturally and makes obvious sense. For instance, complex banking calculations will naturally fit the functional pattern because they are all about performing calculations on data. Even less obvious programs generally have elements which can be expressed more clearly and concisely in the functional style – a large proportion of for and foreach loops could be replaced with functional alternatives, as in the example above.

However, strict adherence to Functional Programming principles leads to challenges that make it harder to write many programs. For instance, if side effects are outlawed how can we read user input or write files to disk? Truly functional languages have workarounds for this that “hide” the side effect behaviour in a way that allows you to treat absolutely everything as an apparently pure function call.

Java provides support for functional concepts through the Streams API, but still lets you use the procedural style when that makes more sense. Hence you can use System.out.println and accept that it’s not very functional in style, but couple it with a neat series of chained function calls to calculate the value to write.

The golden rule is as always: look for the programming style that lets you write a given bit of code in the simplest and most expressive manner possible. That’s often, but certainly not always, a functional style.

Lambda Expressions

It is fairly common in Java to end up defining an interface which contains only a single abstract method, often when you want to pass some behaviour as a parameter – e.g. a transformation, a sorting method or a runnable task.

For example, a basic interface might look as follows:

public interface Action {
    void performAction();
}

This can then be passed around as a parameter to other functions:

public void readAllData(Action openDatabaseConnection) {
    // Other code...

    openDatabaseConnection.performAction();

    // Other code...
    }

You can then use a lambda expressions to create a code segment as an action, so that you can pass it to a function:

Action readFromAwsDatabase = () -> {
    // Code for connecting to AWS database
};
Action readFromLocalDatabase = () -> {
    // Code for connecting to local database
};
if (isUsingAws) {
    readAllData(readFromAwsDatabase);
} else {
    readAllData(readFromLocalDatabase);
}

These interfaces are called functional interfaces, or, more strictly, a Single Abstract Method (SAM) type, and are discussed in more detail below.

Info

static and default members are ignored when considering functional interfaces – they are often convenient utilities, rather than the core behaviour.

As an example, here’s the basic definition of Comparator from the Java 17 standard library:

public interface Comparator<T> {
     /*
      * Compares its two arguments for order [...]
      **/

    int compare(T o1, T o2);
} 

To sort a list of instances of type T, you create a class which implements Comparator<T> and pass it to the static Collections.sort() method. The most explicit way of doing this requires an explicit class definition and instantiation:

public class EventComparator implements Comparator<Event> {
    @Override
    public int compare(Event e1, Event e2) {
        return Long.compare(e1.timestamp, e2.timestamp);
    }
}

Collections.sort(events, new EventComparator());

Java already provides some neat syntax to simplify this process. We can hide the explicit class definition and inline using an anonymous class:

Collections.sort(events, new Comparator<Event>() {
    @Override
    public int compare(Event e1,Event e2) {
        return Long.compare(e1.timestamp, e2.timestamp);
    }
});

This is more concise than the first example, but still contains a lot of boilerplate – fundamentally the only part you are interested in is the implementation of compare.

Java 8 introduced lambda expressions, which provide a way of writing this as:

Collections.sort(events, (e1, e2) -> Long.compare(e1.timestamp, e2.timestamp));

The arguments in brackets (e1, e2) represent the parameters of the method, and the other side of -> is the body of the method.

Lambda syntax

The full lambda syntax includes a number of other subtleties:

// lambdas with no parameters require an empty set of parentheses:
Supplier<String> supplier = () -> "foo";

// lambdas with a single parameter do not require parentheses:
Function<String, Integer> measure = str -> str.length;

// lambdas with multiple statements require braces and an explicit return (if applicable):
Function<String, Event> csvToEvent = input -> {
    String[] parts = input.split(",")
    return new Event(parts[0], parts[1]);
};

It is also important to remember checked exceptions are considered to be part of a Java method signature. This means that any lambda which throws a checked exception is only valid if the SAM type includes the exception!

// A functional interface for reading a file
public interface FileReader {
    String read(File file) throws IOException;
}

FileReader fr1 = file -> {
    throw new IOException("oh no"); // Valid -- the exception matches the signature
};

FileReader fr2 = file -> {
    throw new RuntimeException("oh no"); // Valid -- unchecked exceptions are always fine
};

FileReader fr3 = file -> {
    throw new Exception("oh no"); // Invalid -- Unhandled exception: java.lang.Exception
};

This can make using checked exceptions and lambdas rather awkward!

Whenever a method takes a parameter which is a functional interface, you can use a lambda expression instead. Once you are comfortable with the idea, you can start to think of the method parameter being a lambda – the functional interface is simply a way of describing the signature of the lambda.

Info

You don’t need to change existing functional interfaces in your own code to be able to pass lambda expression into them, the compiler will sort it out for you.

Java also provides a number of standard functional interfaces in the java.util.function namespace. These cover most basic types of function, if you don’t need any more specific semantics. For example, the Predicate interface, which defines a boolean valued function with one argument, is useful for partitioning collections.

The Streams API

Lambda expressions really come into their own when used alongside the Streams API, which was also introduced in Java 8. Streams allow you to manipulate collections in a powerful functional style by describing how you want to transform the list, rather than iterating or modifying a list in-place.

As an example, this expression finds a list of the first 5 line managers of employees who started over a year ago:

List<Employee> employees =...

employees.stream()
    .filter(employee -> employee.getStartDate.isBefore(Instant.now().minus(Duration.of(1, ChronoUnit.YEARS))))
    .map(Employee::getLineManager)
    .distinct()
    .limit(5)
    .collect(Collectors.toList());

Compare this to the more verbose version written using for loops, if statements, and so on.

List<Employee> employees=...

List<LineManager> lineManagers = new ArrayList<>(5);
for (Employee employee:employees) {
    if (employee.getStartDate().isBefore(Instant.now().minus(Duration.of(1, ChronoUnit.YEARS)))){
        LineManager lineManager = employee.getLineManager();
        if (!lineManagers.contains(lineManager)){
            lineManagers.add(lineManager);
            if (lineManagers().size() == 5) {
                break;
            }
        }
    }
}

The version of the code above that uses the Streams API is more functional, performing the same calculation with pure functions. It’s also shorter, but that’s not its main selling point. The real benefit is that we’re talking at a much more abstract level about what we want the program to achieve, rather than what we want it to do step by step.

Functional programming is all about using pure functions as the building blocks of a program. The first solution retrieves the first 5 line managers of employees who started over a year ago using for loops and if statements. We call this code procedural because it defines a procedure for the computer to follow, step by step, to produce the result. We are talking to the computer at a detail level, specifying precisely which numbers to move around. We are also creating data structures that are modified as the program executes. The second solution in the Streams API section is more functional, performing the same calculation with pure functions.

To access the streams API, call stream() on any of the standard collections, or using some other interface that returns a stream, like a database query or Stream.of("foo").

The stream can be modified using a variety of intermediate methods, for example:

  • map – apply a function to every element of the stream
  • flatMap – apply a function to every element of the stream, where the function returns a stream that is flattened into the result
  • filter – only include elements that satisfy the given condition

Finally, it is passed in to a terminal operation, for example:

  • collect – build a new collection from the contents of the stream. Often using the built-in Collectors factory – Collectors.toList(), Collectors.toSet() etc.
  • count – count the number of elements in the stream
  • anyMatch – returns true if any elements match the condition (note that this will usually short-circuit – returning as soon as any element matches)

Streams are lazy and single-use – the elements of the stream will not be read until the terminal operation is performed, and once they have they may not be read again.

The laziness also means that it is okay to have a stream with an infinite number of elements – as long as each operation will only operate on a finite number.

The Streams API documentation introduction has a great rundown of the differences between the Streams API and the Java Collections API. If you don’t have time to read the whole Streams API documentation, then do at least skim the introduction.

Info

Support for parallelism is built into the Streams API, with functions like Collections.parallelStream to create a parallel stream from a collection. This parallelism is implemented using lightweight thread-like entities called ForkJoinTasks, which are beyond the scope of this course, but if you are interested then you can read about them here and here.

You should avoid the use of parallel streams unless you know what you are doing, as using streams naively may result in a performance drop due to the overhead involved in splitting up the tasks, or the saturation of the thread pool used to run ForkJoinTasks (saturation means “running out of free threads”).

Immutable data structures

Recall from the above discussion that immutable data structures – those whose contents are unchanging – are useful in multithreaded code. Locking, with its associated problems, is only required if you are going to modify some data. Merely reading unchanging data is fine – many pieces of code can look at the same location in memory at the same time, without problem.

Consider an example: You have a fixed list of 100 tasks, and you want to divide these between two threads for execution. Here are two solutions:

  • Solution A. Put the tasks onto a Queue. Start two threads, and have each repeatedly pull a task off the queue until the queue is empty.
  • Solution B. Put the tasks in an array. Start two threads, one of which is responsible for even numbered items in the array and one for odd numbered items. Each thread maintains a variable to track progress through the array, until both have reached the end.

Which is better? There are potentially some big advantages to A – if the tasks are of uneven complexity then you will evenly balance the effort between the threads, while with B one thread may end up taking a lot longer if it gets unlucky with its tasks. However, B is much simpler in terms of concurrency – the only shared state is immutable (the variables to track progress are local to each thread), so unlike A, no locking is required.

In this simple example the locking required to support A would be fairly straightforward, and in any case Java provides various concurrent queues classes, such as ConcurrentLinkedQueue<E>, for precisely this sort of scenario. But keeping data immutable where possible eliminates a whole tricky class of problems, and is therefore a trick well worth bearing in mind.

Immutable data structures in Java

Most immutable classes in Java follow a similar pattern:

  • All fields are marked final.
  • If instance fields contain mutable variables such as lists then these cannot be modified
    • None of the class methods modify them
    • Their references are not exposed so no other code can modify them – expose a copy instead
    • There are no setters
  • Mark your class as final to prevent mutable subclasses from being created.

Info

Any method which might ordinarily modify the class will instead need to return a new instance with updated values.

Other programming paradigms

A programming paradigm is a way of structuring and expressing the code that implements a software development solution. This module has been focused on the functional programming paradigm, and it should be clear how that differs from other ways of programming that you’re familiar with. There are many other programming paradigms, but the following are the most important at this stage of your learning.

  • Procedural
    • Code is structured as a set of procedures
    • This is a form of imperative programming, which means that the steps that the computer needs to follow are expressed in order
    • In procedural programming, you approach a problem by deciding how the necessary logical steps should be divided up
    • This used to be the dominant programming paradigm, before the widespread adoption of object-oriented programming
  • Object-oriented
    • Code is structured using classes and objects
    • As explored in the Object-Oriented Programming module, some of the strengths of this approach are:
      • Encapsulation of an object’s properties (data) and behaviour (methods) into a single entity
      • Abstraction, which means hiding details of an object’s implementation behind its interface
      • Inheritance of properties and methods from an ancestor class
      • Polymorphism, through which an object can be interacted with as if it were an instance of an ancestor class, while it behaves according to its actual class
    • In object-oriented programming, you approach a problem by dividing it up into entities (objects) that have particular properties and behaviour
    • This is a very popular programming paradigm for general-purpose software development, particularly as there are OO patterns for many common situations – e.g., MVC (Model-View-Controller) is an OO pattern for application development
  • Functional
    • As this module has explored, code is structured using functions
      • As much as possible, functions are pure and do not have side effects (input/output are exceptions to this)
      • Programs are made up of functions composed together so that the output of one is the input for another
      • This can make programs more readable
    • In functional programming, you approach a problem by breaking down the computations into stateless (pure) functions comprised of other functions
    • Functional programming is especially suited to complex manipulation of large amounts of data
  • Event driven
    • Code is structured as units that are executed in response to particular events
    • We will see some event driven programming in the Further JS: the DOM and Bundlers module, because the JavaScript code that runs in a web browser is executed in response to events (e.g., when a button is clicked or an API call responds)
    • In event driven programming, you approach a problem by identifying the events and defining the response to each one separately; the important concept to keep in mind is that the “flow” of the program logic is controlled by external events
    • This is particularly suited to game development and device drivers

It used to be the case that each programming language had a particular paradigm that needed to be followed. There are still some languages like that, but many modern languages support multiple paradigms – as this module and the Object-Oriented Programming module have shown.