Gap filling

Gap-filling is used to find easiest additions to the models that would make them feasible and capable of growth.

Typically, an infeasible model ("with gaps") is used together with an universal model (which contains "everything"), and the algorithm attempts to find the minimal amount of reactions from the universal model that make the gappy model happy. In turn, the gap-filling optimization problem becomes a MILP.

Gap-filling is sometimes used to produce "viable" genome-scale reconstructions from partial ones, but without additional manual intervention the gap-filling results are almost never biologically valid. A good use of gap-filling is to find problems in a model that cause infeasibility as follows: First the modeller makes a set of (unrealistic) universal reactions that supply or remove metabolites, and after gap-filling, metabolites that had to be supplied or removed to make the model feasible mark possible problems, thus guiding the modeller towards correct solution.

We will use a partially crippled E. coli toy model and see the minimal amount of reactions that may save it.

using COBREXA

download_model(
    "http://bigg.ucsd.edu/static/models/e_coli_core.json",
    "e_coli_core.json",
    "7bedec10576cfe935b19218dc881f3fb14f890a1871448fc19a9b4ee15b448d8",
)

import JSONFBCModels, HiGHS
model = load_model("e_coli_core.json")

JSONFBCModels.JSONFBCModel(#= 95 reactions, 72 metabolites =#)

First, let's produce an infeasible model:

import AbstractFBCModels.CanonicalModel as CM
infeasible_model = convert(CM.Model, model)

for rxn in ["TALA", "PDH", "PGI", "PYK"]
    infeasible_model.reactions[rxn].lower_bound = 0.0
    infeasible_model.reactions[rxn].upper_bound = 0.0
end

After removing the above reactions, the model will fail to solve:

flux_balance_analysis(infeasible_model, optimizer = HiGHS.Optimizer) |> println

nothing

To avoid very subtle semantic issues, we are going to remove the ATP maintenance pseudo-reaction from the universal model:

universal_model = convert(CM.Model, model)
delete!(universal_model.reactions, "ATPM")

Dict{String, AbstractFBCModels.CanonicalModel.Reaction} with 94 entries:
  "ACALD"       => Reaction("Acetaldehyde dehydrogenase (acetylating)", -1000.0…
  "PTAr"        => Reaction("Phosphotransacetylase", -1000.0, 1000.0, Dict("coa…
  "ALCD2x"      => Reaction("Alcohol dehydrogenase (ethanol)", -1000.0, 1000.0,…
  "PDH"         => Reaction("Pyruvate dehydrogenase", 0.0, 1000.0, Dict("coa_c"…
  "PYK"         => Reaction("Pyruvate kinase", 0.0, 1000.0, Dict("adp_c"=>-1.0,…
  "CO2t"        => Reaction("CO2 transporter via diffusion", -1000.0, 1000.0, D…
  "EX_nh4_e"    => Reaction("Ammonia exchange", -1000.0, 1000.0, Dict("nh4_e"=>…
  "MALt2_2"     => Reaction("Malate transport via proton symport (2 H)", 0.0, 1…
  "CS"          => Reaction("Citrate synthase", 0.0, 1000.0, Dict("coa_c"=>1.0,…
  "PGM"         => Reaction("Phosphoglycerate mutase", -1000.0, 1000.0, Dict("3…
  "TKT1"        => Reaction("Transketolase", -1000.0, 1000.0, Dict("g3p_c"=>1.0…
  "EX_mal__L_e" => Reaction("L-Malate exchange", 0.0, 1000.0, Dict("mal__L_e"=>…
  "ACONTa"      => Reaction("Aconitase (half-reaction A, Citrate hydro-lyase)",…
  "EX_pi_e"     => Reaction("Phosphate exchange", -1000.0, 1000.0, Dict("pi_e"=…
  "GLNS"        => Reaction("Glutamine synthetase", 0.0, 1000.0, Dict("adp_c"=>…
  "ICL"         => Reaction("Isocitrate lyase", 0.0, 1000.0, Dict("succ_c"=>1.0…
  "EX_o2_e"     => Reaction("O2 exchange", -1000.0, 1000.0, Dict("o2_e"=>-1.0),…
  "FBA"         => Reaction("Fructose-bisphosphate aldolase", -1000.0, 1000.0, …
  "EX_gln__L_e" => Reaction("L-Glutamine exchange", 0.0, 1000.0, Dict("gln__L_e…
  ⋮             => ⋮

Making the model feasible with a minimal set of reactions

Which of the reactions we have to fill back to get the model working again? First, let's run gap_filling_analysis to get a solution for a system that implements the reaction patching:

x = gap_filling_analysis(
    infeasible_model,
    universal_model,
    0.05,
    optimizer = HiGHS.Optimizer,
)

ConstraintTrees.Tree{Float64} with 7 elements:
  :cost_filled           => 1.0
  :fill_flags            => ConstraintTrees.Tree{Float64}(#= 94 elements =#)
  :n_filled              => 1.0
  :stoichiometry         => ConstraintTrees.Tree{Float64}(#= 72 elements =#)
  :system                => ConstraintTrees.Tree{Float64}(#= 3 elements =#)
  :universal_flux_bounds => ConstraintTrees.Tree{Float64}(#= 94 elements =#)
  :universal_fluxes      => ConstraintTrees.Tree{Float64}(#= 94 elements =#)

The reactions that had to be re-added can be found from the fill_flags:

filled_reactions = [k for (k, v) in x.fill_flags if v != 0]

1-element Vector{Symbol}:
 :TALA

Prevent solver precision issues

With many solvers, the internal tolerance settings may allow a small flux to slip through an "unused" universal reaction reaction, making the binary decision on the presence of the given reaction extremely imprecise. It is adviseable to tighten the numeric tolerance of the solver to at least 1e-7.

If we want to try to generate another solution, we have to explicitly ask the system to avoid the previous solution. That is done via setting the argument known_fill. We can also set the max_cost to avoid finding too benevolent solutions:

x2 = gap_filling_analysis(
    infeasible_model,
    universal_model,
    0.05,
    max_cost = 2.0,
    known_fills = [x.fill_flags],
    optimizer = HiGHS.Optimizer,
)

other_filled_reactions = [k for (k, v) in x2.fill_flags if v != 0]

1-element Vector{Symbol}:
 :PGI

Why is the gap-filling algorithm adding seemingly unneeded reactions?

By default, COBREXA does not do any "cleaning" on the universal model; all reactions that are present in that model will be potentially utilized in the new model, and all of them will need to respect their original bounds in the universal model. That becomes an issue with reactions that are bounded to non-zero flux (such as the ATPM reaction in the E. coli "core" model) – since their flux is required to be non-zero in any feasible model solution, they will also need to be in the fill set, because otherwise their flux would be zero.

As the simplest solution, all realistic uses of gap-filling should carefully check the set of universal reactions, and ideally exclude all exchanges and pseudo-reactions.

Model debugging: which metabolite is missing?

Gap-filling is great for detecting various broken links and imbalances in metabolic models. We show how to find the metabolites are causing the imbalance for our "broken" E. coli model.

First, we construct a few completely unnatural reactions that create/remove the metabolites from/to nowhere:

magic_model = convert(CM.Model, model)
empty!(magic_model.genes)
empty!(magic_model.reactions)

for mid in keys(magic_model.metabolites)
    magic_model.reactions[mid] = CM.Reaction(
        lower_bound = -100.0,
        upper_bound = 100.0,
        stoichiometry = Dict(mid => 1.0),
    )
end

Gap-filling now points to the metabolites that need to be somehow taken care of by the modeller in order for the model to become feasible:

xm = gap_filling_analysis(infeasible_model, magic_model, 0.05, optimizer = HiGHS.Optimizer)

blocking_metabolites = [k for (k, v) in xm.fill_flags if v != 0]

1-element Vector{Symbol}:
 :e4p_c

We can also have a look at how much of a given metabolite was used to make the model feasible again:

xm.universal_fluxes[first(blocking_metabolites)]

0.6866985638297876

This page was generated using Literate.jl.