Conversational Technologies
7 min read

Introduction to Functional Programming in Python

Introduction to Functional Programming in Python

Functional programming (often abbreviated FP) is the process of building software by composing pure functions, avoiding shared state, mutable data, and side-effects. We will discuss some of these concepts and their benefits.




Purity means there are no side effects to a function. The only impact pure functions can have on the outside world is through their output value, and the only way the external world affect them is through their parameters.


This function is pure:


def addition(x, y):

   return x + y


This function is impure:


def call_server():

   return requests.get(“”)


Pure functions are nice, because they are easy to reason about. They are predictable, always do the same thing for the same input, and can be tested easily.


On any given day of the week, addition would return 4 for the input 2, 2. call_server however depends on things like your internet connection, and what kind of doodle has that day.


Here’s another example for an impure function:


import datetime

def what_day_is_it():





Immutability means not changing the state of something after it was created, much like defining constants in code.


It is kind of like purity, but on data instead of functions (in some way a data object is like a function, a list for example is a function from integer to element etc’).


In imperative programming assignments and mutation are quite common:


x = 7

x = 9

y = []



Both x and y mean different things, depending on the line of code we look at. This gives rise to subtle bugs such as this one:


x  = (x for x in [1,2,3])

sum_of_first_ten_digits = sum(x)

max_of_first_ten_digits = max(x)

# ValueError: max() arg is an empty sequence


The generator x is changed in-place after the summation, which causes the max to get an empty iterable. This however, would work:


x  = (1,2,3)

sum_of_first_ten_digits = sum(x)

max_of_first_ten_digits = max(x)


In general (and especially when debugging), we would like to know for sure whether some function mutates its input or not. If we know our object is immutable, there’s no need to check it, so one less thing to worry about.


The opposite of immutability, is mutation and state — values that change over time.




Functional programming is a fancy name but not the best name. Although revolving around function composition, functional programming is more about composition than about functions. A better name for it would be “composable programming”.


Here’s a basic implementation of compose.


def compose(f, g):

   def composition(x):

        return f(g(x))

   return composition

double_then_increment = compose(lambda x: x + 1, lambda x: x * 2)

double_then_increment(2)  # Will return 5.


Composition means combining simple(r) elements to create new ones. The elements are very trivial at first, but after some levels of composition one can get very rich behaviour from very simple components.


Note the order of functions given to compose is the opposite order for which they run.


Let’s look at the following example.


This function takes a collection of texts and produces counts of words appearing in these texts. So for [“hello how are you”, “hello you”] we would get {“hello”: 2, “how”: 1, “are”: 1, “you”: 2}.


Simple enough. But on a closer look, one can identify several ideas in this implementation which are recurring in programming. Let’s try to find these one by one and factor them out.


The first idea we can recognize here is map, which means transforming a collection of elements using some function. map is handy when the result has the same number of elements and order as the input, but each element is different.


Here we are not really interested in the sentences, but just their sequences of words, so we can just transform all of them to their word sequence.


So once we write this idea once, we can use it inside our function.

Note how we construct a new function within map, and only afterwards use it outside. This pattern of getting two variables, first the mapper, then the iterable at different times, is called currying. We will use this pattern a lot, because we want to separate building the logic, from actually running it.


After this change there is no big gain yet, since we still have to go over each words collection anyway, so we didn’t get rid of any noise or nesting yet.


The second idea is that we don’t really care about where the words came from, we can mix all the words from all the sentences and get rid of the double loop.


For this we introduce concat, which means to flatten once an iterable of iterables. Using concat we get rid of the nested loop.


With one less level of nesting, things are much nicer to look at.


Next thing we might recognize here is that we are filtering by some logic. So we might implement filter, which will get a predicate, which is a function that given an element of the iterable, returns a boolean. This boolean will decide whether or not we should keep that element.


Specifically we need a function that checks containment on a collection. Once we have that we need to take care and see that our filter is negative, so we need complement to create remove.

When looking at the implementation of remove, it looks unusual and might be nontrivial to understand at first look. This is a higher order pipeline, it’s getting a function and returning a function, first turning it to its complement, then embedding it inside a filter.


Alright, that’s nice, but this is getting a bit hard to read. luckily we can use compose to flatten this structure and naturally we will also factor it out. But if we just use compose we will get a weird reversed order of reading. So instead, let’s use compose_left (same as compose but reversed execution order).

What else can we identify here? Well, this grouping by some logic is definitely something you’ve seen many times before, so let’s factor that out as well.


groupby will take some way to transform an element to its “key”, then build a dictionary from these keys to the elements they belong to. So for example if we wanted to group by the first character we would do something like groupby(lambda x: x[0])([“hello”, “there”, “hi”]) which would give us {“h”: [“hello”, “hi”], “t”: [“there]}. In our case we just want to count, so our key function will just be identity.

Starting to look pretty neat and organized, and we’re almost done. The last idea we will recognize is changing a dictionary values using some function (len), and building a new one. This pattern is called valmap, as it is like mapping, but on a dictionary values. Remember that we like immutability so we construct a brand new dictionary, instead of mutating the given one.

And that’s it, we’ve fully deconstructed words_histogram into a bunch of reusable ideas, which recur in programming: mapping, flattening, filtering, grouping, dictionary manipulation etc’. The final result is very appealing, it has only the core idea of the algorithm, without any redundancy.

These functions we have extracted and many others are available gamla python lib (disclaimer: I’m one of its authors).




When we write functional code we transition to talk about what, instead of how. This is nice because then we can do sophisticated things under the hood, while keeping our descriptions concise. One such example is how gamla handles async code. Let’s consider the following code:

This function gets a sentence, then makes some words start with an uppercase letter, if they are “important”.


But this implementation is potentially suboptimal. Consider the case where is_word_important is calling some remote server. This means we will await on each word’s request before advancing to the next one, which will be terribly slow. Instead we would like to do it in parallel:

This is optimal, but the code is starting to look pretty messy. Using gamla we can ignore the fact this whole thing is async, and focus on the logic. The underlying implementation of will take care of this for us, optimally, using the same async primitives.

Now isn’t that nice? The programmer can care less about details, and focus on the logic, while the underlying framework optimizes the work.




The art of finding patterns in code is both fun and useful. It will help you become a better developer and lead you to useful abstractions.


However take heed of this warning — you can overdo it by identifying a wrong pattern, abusing an abstraction, or forcing your code into it even if it doesn’t fit. An objective sanity check is code length before and after using an abstraction. If your code did not shrink, you may have overdid it. If it did— good job you probably found a useful pattern!


This concludes this speedy introduction to the world of functional programming, have a nice day.

About the author