Wrapping Up

Overview

Teaching: 5 min
Exercises: min

Questions

What did we learn?

What next?

Objectives

Summary

Dask uses dask.delayed to compute task graphs.
These task graphs can then be later used to compute the result in either a multi-threaded or distributed way.
Multi-threaded python only benefits from the parallelism if it does considerable work outside the Python interpreter because of the GIL (e.g. NumPy etc.).
Distributed parallel Python code works well for most Python code but has a bit of an extra overhead for message-passing, start up/tear down of Dask cluster.
Python code is slow, but Cython can be used to easily compile computationally intensive parts into faster C code.
Dask Array provides a way to parallelize NumPy arrays either in a multi-threaded way for faster computing, or in a distributed way to reduce the individual compute node memory requirements and still give a reasonable speed up over serial execution.

On topics we have covered there can be more to learn. Check out these docs.

There are also some main features of Dask we didn’t talk about.

Dask DataFrames which provide a Pandas like interface. See also Data Frames best practices.
Dask Bag which implements operations like, map,filter, fold, and groupby on collections of generic Python objects similar to PyToolz or a Pythonic version of PySpark.
Dask Futures which supports a real-time task framework, similar to dask.delayed but is immediate rather than lazy.

Key Points