Gap filling

Gapfilling is used to find easiest additions to the models that would make them feasible and capable of growth.

Typically, an infeasible model ("with gaps") is used together with an universal model (which contains "everything"), and the algorithm attempts to find the minimal amount of reactions from the universal model that make the gappy model happy. In turn, the gapfilling optimization problem becomes a MILP.

Gapfilling is sometimes used to produce "viable" genome-scale reconstructions from partial ones, but without additional manual intervention the gapfilling results are almost never biologically valid. A good use of gapfilling is to find problems in a model that cause infeasibility as follows: First the modeller makes a set of (unrealistic) universal reactions that supply or remove metabolites, and after gapfilling, metabolites that had to be supplied or removed to make the model feasible mark possible problems, thus guiding the modeller towards correct solution.

We will use a partially crippled E. coli toy model and see the minimal amount of reactions that may save it.

using COBREXA

download_model(
    "http://bigg.ucsd.edu/static/models/e_coli_core.json",
    "e_coli_core.json",
    "7bedec10576cfe935b19218dc881f3fb14f890a1871448fc19a9b4ee15b448d8",
)

import JSONFBCModels, HiGHS
model = load_model("e_coli_core.json")

JSONFBCModels.JSONFBCModel(#= 95 reactions, 72 metabolites =#)

First, let's produce an infeasible model:

import AbstractFBCModels.CanonicalModel as CM
infeasible_model = convert(CM.Model, model)

for rxn in ["TALA", "PDH", "PGI", "PYK"]
    infeasible_model.reactions[rxn].lower_bound = 0.0
    infeasible_model.reactions[rxn].upper_bound = 0.0
end

After removing the above reactions, the model will fail to solve:

flux_balance_analysis(infeasible_model, optimizer = HiGHS.Optimizer) |> println

nothing

To avoid very subtle semantic issues, we are going to remove the ATP maintenance pseudoreaction from the universal model:

universal_model = convert(CM.Model, model)
delete!(universal_model.reactions, "ATPM")

Dict{String, AbstractFBCModels.CanonicalModel.Reaction} with 94 entries:
  "ACALD"       => Reaction("Acetaldehyde dehydrogenase (acetylating)", -1000.0…
  "PTAr"        => Reaction("Phosphotransacetylase", -1000.0, 1000.0, Dict("coa…
  "ALCD2x"      => Reaction("Alcohol dehydrogenase (ethanol)", -1000.0, 1000.0,…
  "PDH"         => Reaction("Pyruvate dehydrogenase", 0.0, 1000.0, Dict("coa_c"…
  "PYK"         => Reaction("Pyruvate kinase", 0.0, 1000.0, Dict("adp_c"=>-1.0,…
  "CO2t"        => Reaction("CO2 transporter via diffusion", -1000.0, 1000.0, D…
  "EX_nh4_e"    => Reaction("Ammonia exchange", -1000.0, 1000.0, Dict("nh4_e"=>…
  "MALt2_2"     => Reaction("Malate transport via proton symport (2 H)", 0.0, 1…
  "CS"          => Reaction("Citrate synthase", 0.0, 1000.0, Dict("coa_c"=>1.0,…
  "PGM"         => Reaction("Phosphoglycerate mutase", -1000.0, 1000.0, Dict("3…
  "TKT1"        => Reaction("Transketolase", -1000.0, 1000.0, Dict("g3p_c"=>1.0…
  "EX_mal__L_e" => Reaction("L-Malate exchange", 0.0, 1000.0, Dict("mal__L_e"=>…
  "ACONTa"      => Reaction("Aconitase (half-reaction A, Citrate hydro-lyase)",…
  "EX_pi_e"     => Reaction("Phosphate exchange", -1000.0, 1000.0, Dict("pi_e"=…
  "GLNS"        => Reaction("Glutamine synthetase", 0.0, 1000.0, Dict("adp_c"=>…
  "ICL"         => Reaction("Isocitrate lyase", 0.0, 1000.0, Dict("succ_c"=>1.0…
  "EX_o2_e"     => Reaction("O2 exchange", -1000.0, 1000.0, Dict("o2_e"=>-1.0),…
  "FBA"         => Reaction("Fructose-bisphosphate aldolase", -1000.0, 1000.0, …
  "EX_gln__L_e" => Reaction("L-Glutamine exchange", 0.0, 1000.0, Dict("gln__L_e…
  ⋮             => ⋮

Making the model feasible with a minimal set of reactions

Which of the reactions we have to fill back to get the model working again? First, let's run gap_filling_analysis to get a solution for a system that implements the reaction patching:

x = gap_filling_analysis(
    infeasible_model,
    universal_model,
    0.05,
    optimizer = HiGHS.Optimizer,
)

ConstraintTrees.Tree{Float64} with 6 elements:
  :fill_flags            => ConstraintTrees.Tree{Float64}(#= 94 elements =#)
  :n_filled              => 1.0
  :stoichiometry         => ConstraintTrees.Tree{Float64}(#= 72 elements =#)
  :system                => ConstraintTrees.Tree{Float64}(#= 3 elements =#)
  :universal_flux_bounds => ConstraintTrees.Tree{Float64}(#= 94 elements =#)
  :universal_fluxes      => ConstraintTrees.Tree{Float64}(#= 94 elements =#)

The reactions that had to be re-added can be found from the fill_flags:

filled_reactions = [k for (k, v) in x.fill_flags if v != 0]

1-element Vector{Symbol}:
 :TALA

If we want to try to generate another solution, we have to explicitly ask the system to avoid the previous solution. That is done via setting the argument known_fill. We can also set the max_cost to avoid finding too benevolent solutions:

x2 = gap_filling_analysis(
    infeasible_model,
    universal_model,
    0.05,
    max_cost = 2.0,
    known_fills = [x.fill_flags],
    optimizer = HiGHS.Optimizer,
)

other_filled_reactions = [k for (k, v) in x2.fill_flags if v != 0]

1-element Vector{Symbol}:
 :PGI

Model debugging: which metabolite is missing?

Gap-filling is great for detecting various broken links and imbalances in metabolic models. We show how to find the metabolites are causing the imbalance for our "broken" E. coli model.

First, we construct a few completely unnatural reactions that create/remove the metabolites from/to nowhere:

magic_model = convert(CM.Model, model)
empty!(magic_model.genes)
empty!(magic_model.reactions)

for mid in keys(magic_model.metabolites)
    magic_model.reactions[mid] = CM.Reaction(
        lower_bound = -100.0,
        upper_bound = 100.0,
        stoichiometry = Dict(mid => 1.0),
    )
end

Gapfilling now points to the metabolites that need to be somehow taken care of by the modeller in order for the model to become feasible:

xm = gap_filling_analysis(infeasible_model, magic_model, 0.05, optimizer = HiGHS.Optimizer)

blocking_metabolites = [k for (k, v) in xm.fill_flags if v != 0]

1-element Vector{Symbol}:
 :e4p_c

We can also have a look at how much of a given metabolite was used to make the model feasible again:

xm.universal_fluxes[first(blocking_metabolites)]

0.6866985638297876

This page was generated using Literate.jl.