Gap filling
Gapfilling is used to find easiest additions to the models that would make them feasible and capable of growth.
Typically, an infeasible model ("with gaps") is used together with an universal model (which contains "everything"), and the algorithm attempts to find the minimal amount of reactions from the universal model that make the gappy model happy. In turn, the gapfilling optimization problem becomes a MILP.
Gapfilling is sometimes used to produce "viable" genome-scale reconstructions from partial ones, but without additional manual intervention the gapfilling results are almost never biologically valid. A good use of gapfilling is to find problems in a model that cause infeasibility as follows: First the modeller makes a set of (unrealistic) universal reactions that supply or remove metabolites, and after gapfilling, metabolites that had to be supplied or removed to make the model feasible mark possible problems, thus guiding the modeller towards correct solution.
We will use a partially crippled E. coli toy model and see the minimal amount of reactions that may save it.
using COBREXA
download_model(
"http://bigg.ucsd.edu/static/models/e_coli_core.json",
"e_coli_core.json",
"7bedec10576cfe935b19218dc881f3fb14f890a1871448fc19a9b4ee15b448d8",
)
import JSONFBCModels, HiGHS
model = load_model("e_coli_core.json")
JSONFBCModels.JSONFBCModel(#= 95 reactions, 72 metabolites =#)
First, let's produce an infeasible model:
import AbstractFBCModels.CanonicalModel as CM
infeasible_model = convert(CM.Model, model)
for rxn in ["TALA", "PDH", "PGI", "PYK"]
infeasible_model.reactions[rxn].lower_bound = 0.0
infeasible_model.reactions[rxn].upper_bound = 0.0
end
After removing the above reactions, the model will fail to solve:
flux_balance_analysis(infeasible_model, optimizer = HiGHS.Optimizer) |> println
nothing
To avoid very subtle semantic issues, we are going to remove the ATP maintenance pseudoreaction from the universal model:
universal_model = convert(CM.Model, model)
delete!(universal_model.reactions, "ATPM")
Dict{String, AbstractFBCModels.CanonicalModel.Reaction} with 94 entries:
"ACALD" => Reaction("Acetaldehyde dehydrogenase (acetylating)", -1000.0…
"PTAr" => Reaction("Phosphotransacetylase", -1000.0, 1000.0, Dict("coa…
"ALCD2x" => Reaction("Alcohol dehydrogenase (ethanol)", -1000.0, 1000.0,…
"PDH" => Reaction("Pyruvate dehydrogenase", 0.0, 1000.0, Dict("coa_c"…
"PYK" => Reaction("Pyruvate kinase", 0.0, 1000.0, Dict("adp_c"=>-1.0,…
"CO2t" => Reaction("CO2 transporter via diffusion", -1000.0, 1000.0, D…
"EX_nh4_e" => Reaction("Ammonia exchange", -1000.0, 1000.0, Dict("nh4_e"=>…
"MALt2_2" => Reaction("Malate transport via proton symport (2 H)", 0.0, 1…
"CS" => Reaction("Citrate synthase", 0.0, 1000.0, Dict("coa_c"=>1.0,…
"PGM" => Reaction("Phosphoglycerate mutase", -1000.0, 1000.0, Dict("3…
"TKT1" => Reaction("Transketolase", -1000.0, 1000.0, Dict("g3p_c"=>1.0…
"EX_mal__L_e" => Reaction("L-Malate exchange", 0.0, 1000.0, Dict("mal__L_e"=>…
"ACONTa" => Reaction("Aconitase (half-reaction A, Citrate hydro-lyase)",…
"EX_pi_e" => Reaction("Phosphate exchange", -1000.0, 1000.0, Dict("pi_e"=…
"GLNS" => Reaction("Glutamine synthetase", 0.0, 1000.0, Dict("adp_c"=>…
"ICL" => Reaction("Isocitrate lyase", 0.0, 1000.0, Dict("succ_c"=>1.0…
"EX_o2_e" => Reaction("O2 exchange", -1000.0, 1000.0, Dict("o2_e"=>-1.0),…
"FBA" => Reaction("Fructose-bisphosphate aldolase", -1000.0, 1000.0, …
"EX_gln__L_e" => Reaction("L-Glutamine exchange", 0.0, 1000.0, Dict("gln__L_e…
⋮ => ⋮
Making the model feasible with a minimal set of reactions
Which of the reactions we have to fill back to get the model working again? First, let's run gap_filling_analysis
to get a solution for a system that implements the reaction patching:
x = gap_filling_analysis(
infeasible_model,
universal_model,
0.05,
optimizer = HiGHS.Optimizer,
)
ConstraintTrees.Tree{Float64} with 6 elements:
:fill_flags => ConstraintTrees.Tree{Float64}(#= 94 elements =#)
:n_filled => 1.0
:stoichiometry => ConstraintTrees.Tree{Float64}(#= 72 elements =#)
:system => ConstraintTrees.Tree{Float64}(#= 3 elements =#)
:universal_flux_bounds => ConstraintTrees.Tree{Float64}(#= 94 elements =#)
:universal_fluxes => ConstraintTrees.Tree{Float64}(#= 94 elements =#)
The reactions that had to be re-added can be found from the fill_flags
:
filled_reactions = [k for (k, v) in x.fill_flags if v != 0]
1-element Vector{Symbol}:
:TALA
If we want to try to generate another solution, we have to explicitly ask the system to avoid the previous solution. That is done via setting the argument known_fill
. We can also set the max_cost
to avoid finding too benevolent solutions:
x2 = gap_filling_analysis(
infeasible_model,
universal_model,
0.05,
max_cost = 2.0,
known_fills = [x.fill_flags],
optimizer = HiGHS.Optimizer,
)
other_filled_reactions = [k for (k, v) in x2.fill_flags if v != 0]
1-element Vector{Symbol}:
:PGI
By default, COBREXA does not do any "cleaning" on the universal model; all reactions that are present in that model will be potentially utilized in the new model, and all of them will need to respect their original bounds in the universal model. That becomes an issue with reactions that are bounded to non-zero flux (such as the ATPM
reaction in the E. coli "core" model) – since their flux is required to be non-zero in any feasible model solution, they will also need to be in the fill set, because otherwise their flux would be zero.
As the simplest solution, all realistic uses of gapfilling should carefully check the set of universal reactions, and ideally exclude all exchanges and pseudoreactions.
Model debugging: which metabolite is missing?
Gap-filling is great for detecting various broken links and imbalances in metabolic models. We show how to find the metabolites are causing the imbalance for our "broken" E. coli model.
First, we construct a few completely unnatural reactions that create/remove the metabolites from/to nowhere:
magic_model = convert(CM.Model, model)
empty!(magic_model.genes)
empty!(magic_model.reactions)
for mid in keys(magic_model.metabolites)
magic_model.reactions[mid] = CM.Reaction(
lower_bound = -100.0,
upper_bound = 100.0,
stoichiometry = Dict(mid => 1.0),
)
end
Gapfilling now points to the metabolites that need to be somehow taken care of by the modeller in order for the model to become feasible:
xm = gap_filling_analysis(infeasible_model, magic_model, 0.05, optimizer = HiGHS.Optimizer)
blocking_metabolites = [k for (k, v) in xm.fill_flags if v != 0]
1-element Vector{Symbol}:
:e4p_c
We can also have a look at how much of a given metabolite was used to make the model feasible again:
xm.universal_fluxes[first(blocking_metabolites)]
0.6866985638297876
This page was generated using Literate.jl.