Parallel processing overview

Distributed processing in Julia is represented mainly by the package Distributed.jl.

COBREXA.jl is able to utilize this existing system to almost transparently run the large parallelizable analyses on multiple CPU cores and multiple computers connected through the network. Ultimately, the approach scales to thousands of computing nodes in large HPC facilities.

Users may run the analyses in parallel to gain speed-ups. The usual workflow in COBREXA.jl is quite straightforward:

  1. Import the Distributed package and add worker processes, e.g. using addprocs.
  2. Pick an analysis function that can be parallelized (such as screen or flux_variability_analysis) and prepare it to work on the data.
  3. Pass the desired set of worker IDs to the function using workers= argument, in the simplest form using e.g. screen(..., workers=workers()).
  4. Worker communication will be managed automatically, and the results will be computed "as usual", just appropriately faster.

Specific documentation is available about running parallel analysis locally and running distributed analysis in HPC clusters.

Functions that support parallelization

The functions that support parallel execution include:

Notably, the screening functions can be reused to run many other kinds of analyses which, in turn, inherit the parallelizability. This includes a wide range of use-cases that can thus be parallelized very easily:

  • single and multiple gene deletions (and other genetic modifications)
  • multiple reaction knockouts
  • envelope-like production profiles (e.g., enzyme-constrained growth profiles)
  • growth media explorations (such as explorations of metabolite depletion)

Mitigating parallel inefficiencies

Ideally, the speedup gained by parallel processing should be proportional to the amount of hardware one add as the workers. To reach that, it is beneficial to be aware of factors that reduce the parallel efficiency, which can be summarized as follows:

  • Parallelization within single runs of the linear solver is typically not supported (and if it is, it may be inefficient for common problem sizes). Normally, we want to parallelize the analyzes that comprise multiple independent runs of the solvers.
  • Some analysis function, such as flux_variability_analysis, have serial parts that can not be parallelized by default. Usually, pipelines may avoid the inefficiency by precomputing the serial analysis parts without involving the cluster of the workers.
  • Frequent worker communication may vastly reduce the efficiency of parallel processing; typically this happens if the time required for individual analysis steps is smaller than the network round-trip-time to the worker processes. Do not use parallelization for very small tasks.
  • Transferring large amounts of data among workers may hamper parallel efficiency too. Use a single loaded model data object and apply any required small modifications directly on the workers to avoid this kind of inefficiency.
Cost of the distribution and parallelization overhead

Before allocating extra resources into the distributed execution, always check that the tasks are properly parallelizable and sufficiently large to saturate the computation resources, so that the invested energy is not wasted. Amdahl's and Gustafson's laws give a better overview of the sources and consequences of the parallelization inefficiencies, and the costs of the resulting overhead.