Collects a number of biological data analytics tasks.
This library provides tools for the bio_db served data,
empowering downstream analyses of user's experimental data.
The library comes with one experimental dataset: data/silac/bt.csv which
is used in the example files in directory examples/ .
- bio_analytics_version(-Vers, -Date)
- Version and release date.
?- bio_analytics_version(V,D).
V = 0:7:0,
D = date(2024,10,17).
- bio_diffex(+Exp, -DEs, -NonDEs, +Opts)
- Select a sublist of significantly, differentially expressed genes in an experiment.
Exp is in the form of a mtx/1 matrix. Selected (DEs) and rejected (NonDEs) are returned in common format,
of either (default) Gene-Ev, where Ev is the expression value for Gene in Exp, or sub-matrices of Exp.
The format is controlled by option as_pairs
.
We use ev as short for expression value, to avoid confusion with exp which short for experiment.
Opts
- as_non(AsNon=pvalue)
- what to return as non differentially expressed: either everything in Gcnm (AsNon=all, default when EvLog is true), or only those with numeric pvalue (AsNon=pvalue, default when EvLog is false)
- as_pairs(AsPairs=true)
- whether to return pairs or matrices
- de_max(DexMax=false)
- puts a cap on the number of differentially expressed genes returned
- de_mtx(DexMtx)
- returns the selected matrix. if an atom, its taken to be a filename
- exp_pv_cnm(ExpPcnm=adj.pvalue)
- the experimental column (found in MsF) on which Pcut is applied as a filter
- exp_pv_cut(Pcut=0.05)
- p.value cut off for experimental input from MSstats
- exp_ev_log(EvLog=true)
- are the expression values log values
- exp_ev_cnm(EvCnm)
- expession value column name (log2FC if EvLog=true, expression otherwise)
- exp_ev_cut_let(EvLet = -1)
- expression values below (less or equal) which values are selected
- exp_ev_cut_get(EvGet)
- expression values above (greater or equal) which values are selected (defaults is 2 if EvLog is false, and 1 otherwise)
- exp_ev_include_inf(IncInf=false)
- include infinity values as diffexs ? (default is false if EvLog is false, and true otherwise)
- gene_id_cnm(Gcnm=Symbols)
- column name for the key value in the pair lists: DEs and NonDEs.
- mtx_dx_in(Mtx)
- can be used to return the Mtx used (from
mtx(Exp,Mtx)
call)
- which(Wch=dx(UpIs,DownIs))
- returns the up and down-regulated indices
?- lib(mtx),
absolute_file_name(pack('bio_analytics/data/silac/bt.csv'), CsvF),
mtx(CsvF, Mtx, convert(true)),
assert(mtx_data(Mtx)).
CsvF = '/usr/local/users/nicos/local/git/lib/swipl/pack/bio_analytics/data/silac/bt.csv',
Mtx = [row('Protein IDs', 'Symbols', log2FC, adj.pvalue), row('Q9P126', ...), row(..., ..., ..., ...)|...].
?- lib(debug_call),
debug(testo),
mtx_data(Mtx),
debug_call(testo, dims, mtx/Mtx),
bio_diffex(Mtx, DEPrs, NonDEPrs, []),
debug_call(testo, length, de/DEPrs),
debug_call(testo, length, nde/NonDEPrs).
% Dimensions for matrix, (mtx) nR: 1245, nC: 4.
% Length for list, de: 426.
% Length for list, nde: 818.
DEPrs = ['CLEC1B'- -4.80614893171469, 'LGALS1'- -2.065877096524, ... - ...|...],
NonDEPrs = ['CNN2'- -0.69348026828097, 'CXCR4'-0.73039667221395, ... - ...|...].
?-
mtx_data(Mtx),
bio_diffex(Mtx, DEPrs, NonDEPrs, exp_pv_cut(0.01)),
debug_call(testo, length, de/DEPrs),
debug_call(testo, length, nde/NonDEPrs).
% Length for list, de: 286.
% Length for list, nde: 958.
?-
mtx_data(Mtx),
Opts = [exp_ev_cut_let(inf),exp_ev_cut_get(-inf)],
bio_diffex(Mtx, DEPrs, NonDEPrs, Opts),
debug_call(testo, length, de/DEPrs),
debug_call(testo, length, nde/NonDEPrs).
% Length for list, de: 581.
% Length for list, nde: 663.
?- mtx_data(Mtx),
Opts = [exp_ev_cut_let(inf),exp_ev_cut_get(-inf),as_pairs(false)],
bio_diffex(Mtx, DEs, NonDEs, Opts),
debug_call(testo, length, de/DEPrs),
debug_call(testo, length, nde/NonDEPrs).
% Length for list, de: 582.
% Length for list, nde: 664.
Mtx = [row('Protein IDs', 'Symbols', log2FC, adj.pvalue), row(..., ..., ..., ...)|...],
Opts = [exp_ev_cut_let(inf), exp_ev_cut_get(-inf), as_pairs(false)],
DEs = [row('Protein IDs', 'Symbols', log2FC, adj.pvalue), row('Q9P126', 'CLEC1B', -4.80614893171469, 2.057e-37), row(..., ..., ..., ...)|...],
NonDEs = [row('Protein IDs', 'Symbols', log2FC, adj.pvalue), row('B4DUT8;Q6FHC3;Q6FHE4;Q99439;B4DDF4;Q53GK7;B4DN57;B4DHU5;K7ES69;H3BVI6;H3BQH0', 'CNN2', -0.69348026828097, 0.0814675), row(..., ..., ..., ...)|...].
- author
- - nicos angelopoulos
- version
- - 0.1 2019/5/2
- - 0.2 2020/9/3, ability to return sub-matrices (9/14):
diffex_mtx()
- - 0.3 2022/12/16, changed name from exp_diffex/4 to bio_diffex/4.
- bio_p_adjust(+Obj, +Adj, +Opts)
- Adj is the p.adjusted version of Obj.
Obj can be a list, an R vector, or Cid:Mtx term (see pl_vector/3).
In the later case RelPos (see options) is the relative position for the new column (see mtx_relative_position/4),
if none is given Adj is returned as a list.
The adjustment happens via p.adjust()
function in R.
Opts
- debug(Dbg=true)
- informational, progress messages
- hdr(Hdr=q.value)
- the column header if we adding Adj
- rel(RelPos=1)
- when Obj is a matrix and this is given, it is taken to be the relative position for a new
column of the new values to be added to the matrix (see mtx_relative_position/4).
Def is to append the column one after the pvalue column.
New matrix returned in Adj.
Opts are passed to pl_vector/3 (predicate only loaded at run time, so this is a
promised dependency to pack(b_real)
).
Examples
?- Mtx = [row(exp,pv),row(a,0.1),row(b,0.02),row(c,0.001)],
bio_p_adjust( pv:Mtx, Adj, cps(Cps) ).
- author
- - nicos angelopoulos
- version
- - 0.1 2023/09/02
- See also
- - pl_vector/3
- bio_symbols(+Vect, -Symbs, +Opts)
- Convert gene identifiers pointed by Vect to Symbols.
Vect can be a list or pairlist (K-V) of which the K element is taken to be the identifier.
Vect can also be an atomic representing the Nth column on column for Mtx.
Opts
- debug(Dbg=false)
- progress, informational messages
- mtx(Mtx)
- to be used if Vect is atomic
- org(Org=hs)
- organism the data come from, via bio_db_organism/2
- gid(Gid=symb)
- the type of the experimental gene ids for the organism. default depends on Org, but currently all map to symb
(caution, this used to be
org_exp_id()
, changed in v0.2
?- hgnc_homs_hgnc_ncbi( 19295, Ncbi ), bio_symbols( [Ncbi], Symbs, gid(ncbi) ).
Ncbi = 114783,
Symbs = ['LMTK3'].
- author
- - nicos angelopoulos
- version
- - 0:1 2022/12/21
- - 0:2 2024/06/05, changed option
org_exp_id()
to gid()
- See also
- - bio_db_organism/2
- bio_volcano_plot(+Mtx, +Opts)
- Create a volcano plot of differential expressed values (x) against p-values (y).
Uses ggplot2 and bio_analytics:bio_diffex/4.
The predicate creates an R data.frame and creates a ggplot2 object from which
typically output files are produced.
Opts
- clr_down(ClrDw=brandeisblue)
- bio_colour_hex/2 compatible colour for down regulated entries
- clr_inv(ClrInv=cadetgrey)
- colour for invariant entries
- clr_line(ClrLn=bazaar)
- colour for cut-off lines
- clr_up(ClrUp=cadmiumred)
- colour for up regulated entries
- dir(Dir)
- as understood by os_dir_stem_ext/2- see Odir, below too
- debug(Dbg=true)
- informational, progress messages
- ext(Ext=pdf)
- extension that defines the type of output (passed to os_dir_stem_ext/2)
- legend_cnm(LgCnm=regulation)
- the legend frame column name (also appears as legend title, if none is given)
- legend_down(LegUp=down)
- token for up-regulation
- legend_inv(LegUp=invariant)
- token for invariants
- legend_title(LgTitle)
- by default LgCnm is used.
- legend_up(LegUp=up)
- token for up-regulation
- lim_x_max(LimXmax)
- if LimXmin is given this defaults to -LimXmin. By default ggplot2 decides this.
- lim_x_min(LimXmin)
- if LimXmax is given this defaults to -LimXmax. By default ggplot2 decides this.
- lim_y_max(LimXmax)
- By default ggplot2 decides this.
- lim_y_min(LimY=0)
- this is not used unless LimYmax is given
- odir(Odir)
- as understood by os_dir_stem_ext/2- preferred to Dir above
- plot_file(File)
- returns the output file
- rvar_cv(Rvar=bvp_cv)
- R var for colors vector
- rvar_df(Rvar=bvp_df)
- R variable for the 3 column data frame
- rvar_gt(Rvar=bvp_gt)
- R variable holding the plot term
- rvar_rmv(RRmv=true)
- remove the three R variables from above
- stem(Stem=volc_plot)
- stem for output, set to
false
if no output is wanted
- theme(Theme=classic)
- theme_Theme should be a
ggplot()
theme
Opts are passed to bio_diffex/4. This predicate also pinches the defaults from there.
Examples, fixme: use iris dataset.
?- bio_volcano_plot([]).
- author
- - nicos angelopoulos
- version
- - 0.1 2022/12/15
- See also
- - bio_diffex/4
- - os_dir_stem_ext/2
- To be done
- - add args for influencing of the gg term and the output-to-file call
- exp_gene_family_string_graph(+Exp, +Family, -Graph, +Opts)
- exp_gene_family_string_graph(+Exp, +Family, ?DEPrs, ?NonDEPrs, -Graph, +Opts)
- Generate and possibly plot the STRING graph of a known gene family as affected in a biological Exp_eriment.
Each family gene is placed of the following states according to wether it was identified in Exp and
how it was modified (in relation to background conditions):
- Red = upregulated
- Green = downregulated
- Orange = either both directions or unknown direction
- Grey = not identified
Note that de-reguation trumps identification, that is, currently there is no distiction
between genes that seen to be both significantly de-regulated and identified versus
just those that are simply significantly de-regulated.
In the exp_gene_family_string_graph/6 version DEPrs and NonDEPrs can be provided, which optimises use in
loops over families (see exp_go_over_string_graphs/4).
Opts
- exp_ev_log(EvLog=true)
- are the expression values (ev) log values ? (also passed to bio_diffex/4)
- faint_factor(Fctr=1.2)
- faint factor for non-significant nodes
- include_non_present(IncP=true)
- false excludes non-indentified family genes
- include_non_significant(IncS=true)
- false excluded non-significant family genes
- node_colours(Clrs=clrs(red,green,yellow,khaki4,grey))
- colours for the different types of nodes
- org(Org=hs)
- organism in which Family is looked in (and experiment was performed)
- org_exp_id(ExpID)
- the type of the experimental gene ids for the organism. default depends on Org, but currently all map to symb
- plot(Plot=true)
- whether to plot the graph via wgraph_plot/2
- wgraph_plot_opts(WgOpts=[])
- options for wgraph_plot/2 (in preference to any defaults from Self)
- wgraph_plot_defs(WgDefs=[])
- options for wgraph_plot/2 (added to the end, after any defaults from Self)
These Options are passed to a number of other pack predicates: bio_diffex/4, bio_symbols/3, symbols_string_graph/3 and, selectively, wgraph_plot/2.
See [pack('bio_analytics/examples/bt.pl')].
?- lib(real), lib(mtx),
absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
Ppts = [ vjust= -1, node_size(3), mode="fruchtermanreingold", format(svg), stem(bt)],
Opts = [ exp_ev_cut_let(inf), exp_ev_cut_get(-inf),
include_non_present(false), include_non_significant(false),
minw(200), wgraph_plot_opts(Ppts) ],
exp_gene_family_string_graph( CsvF, autophagy, G, Opts ).
Produces file: bt.svg
[[../doc/images/bt.svg]]
- author
- - nicos angelopoulos
- version
- - 0.1 2019/4/15
- See also
- - bio_diffex/4, bio_symbols/3, symbols_string_graph/3
- - gene_family/3
- - wgraph_plot/2
- exp_go_over(+CsvF, -GoOver, +Opts)
- Perform gene ontology over-representation analysis.
For experimental data in CsvF select de-regulated genes and on
those perform over representation analysis in gene ontology.
Results in GoOver are either as a values list or a csv file, the name of which is
assumed to be the ground value of GoOVer.
Opts
- go(GoSec=BP)
- gene ontology section, in:
[BP,MF,CC]
- go_frame(GoFrame=bioc_ann_dbi)
- which go frame of GoTerms, Evidence and Gid to use.
Default, bioc_ann_dbi, uses r_lib(org.<Org>.eg.db) frames, alternative bio_db uses bio_db tables,
before the intro of the option this was the way, so use this for backward compatibility
- go_over_pv_cut(PvCut=0.05)
- p value filter for the results
- org(Org=hs)
- one of bio_db_organism/2 first argument values
- gid(Gid)
- the type of the experimental gene ids for the organism. The default depends on Org, most take symb, apart for pig which takes ensg
(Caution, this used to be
org_exp_id(OrgExpId)
.)
- gid_to(GidTo)
- returns the db type for gene ids used in underlying call, mostly ncbi, apart for mouse where it is mgim
- stem(Stem=false)
- stem for output csv file. When false, use basename of CsvF.
- to_file(ToF=false)
- when GoOver is unbound, this controls whether the output goes to a file or a values list
- universe(Univ=go_exp)
- Univ in :
[genome,go,go_exp,experiment]
- genome(Gen)
- all gene identifiers in relevant map predicates
- go(Go)
- all gene identifiers appearing in any ontology
- go_exp(GoExp)
- gene ontology genes that appear in experiment
- experiment(Exp)
- all identifiers in the experiment
- ontology(Onto)
- not implemented yet. like Go above, but only include this Ontology branch
Options are also passed to bio_diffex/4.
?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
exp_go_over( CsvF, GoOver, [] ).
CsvF = '.../swipl/pack/bio_analytics/data/silac/bt.csv',
GoOver = [row('GOBPID', 'Pvalue', adj.pvalue, ...), row('GO:0061387', 0.000187, ...)|...].
?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
exp_go_over( CsvF, GoOver, [stem(here),to_file(true)] ).
CsvF = '.../swipl/pack/bio_analytics/data/silac/bt.csv',
GoOver = here_gontBP_p0.05_univExp.csv.
?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
exp_go_over( CsvF, 'a_file.csv', [stem(here),to_file(true)] ).
CsvF = '.../swipl/pack/bio_analytics/data/silac/bt.csv',
?- lib(by_unix).
?- @wc(-l,'a_file.csv').
183 a_file.csv
?- @wc(-l,'here_gontBP_p0.05_univExp.csv').
183 here_gontBP_p0.05_univExp.csv
true.
?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
exp_go_over( CsvF, OverF, [to_file(true)] ).
CsvF = '.../swipl/pack/bio_analytics/data/silac/bt.csv',
OverF = go_over_gontBP_p0.05_univExp.csv.
?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
exp_go_over( CsvF, OverF, [to_file(true),stem(false)] ).
CsvF = '.../swipl/pack/bio_analytics/data/silac/bt.csv',
OverF = '.../swipl/pack/bio_analytics/data/silac/bt_gontBP_p0.05_univExp.csv'.
- author
- - nicos angelopoulos
- version
- - 0.1 2019/5/2
- - 0.2 2022/12/20,
Univ=go
and Org=gallus
- - 0.3 2023/6/5, option
go_frame(GoFrame)
- See also
- - go_over_universe/6
- exp_go_over_string_graphs(+Exp, ?GoOver, ?Dir, -Opts)
- Create string graphs for all the over-represented terms in GoOver.
When GoOver is a variable exp_go_over/3 is called to generate it.
Opts
- dir_postfix(Psfx=go_strings)
- postfix for outputs directory (when Dir is a variable)
- go_id_clm(GoIdClm=1)
- column id for over represented GO terms
- ov_max(MaxOvs=false)
- when a number, it is taken as the maximal integer of terms to plot graphs for
- stem_type(Sty=go_pair_ord)
- similar to go_string_graph/3, but different default, others:
go_name, go_id
.
Here the length of GO terms (Len) is added to form go_pair_ord(I,Len)
for forwarding
- viz_de_opts(VizOpts=[])
- options for restricting genes to visualise via bio_diffex/4.
Default does not restrict what genes are visualised.
diffex_only = VizOpts
restricts genes to significants only.
- wgraph_plot_opts(WgOpts=WgOpts)
- defaults to
[vjust = -1, node_size(3), format(svg)]
.
Options are passed to exp_gene_family_string_graph/4.
?- debug(real).
?- debug(exp_go_over_string_graphs).
?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), Exp ),
exp_go_over_string_graphs( Exp, Gov, Dir, [] ).
% Sending to R: pltv <- ggnet2(lp_adj,vjexp_go_over_mtxust = -1,size = 3,
... label = pl_v_1,color = pl_v_2,edge.size = pl_v_3,edge.color = "#BEAED4")
- author
- - nicos angelopoulos
- version
- - 0.1 2019/5/5
- - 0.2 2020/9/6, option
ov_max()
- See also
- - exp_go_over/3
- - exp_gene_family_string_graph/4
- exp_go_over_string_graphs_multi(+Opts)
- Run multiple exp_go_over_string_graphs/4 runs.
Can ran across a number of values for parameters and number of datasets.
A directory is created for each combination of input parameters.
Subdirectories are created for each input dataset.
Can be used with single parameter values as well, in order to add that
parameter to the output directory's name.
Opts
- data(Data=data)
- which datasets to use, can be a directory
- data_ext(DtExt=csv)
- extension selector for when Data is a directory
- dated(Dated=true)
- adds date (ate_two_digit_dotted/1) stamp on output directories
- dfx_opts(DfxOpts)
- invariant options for exp_go_over_string_graphs/4
- dfx_viz_opts(VizOpts)
- invariant options for
viz_de_opts()
of exp_go_over_string_graphs/4
- keep_mtx_sel(KpSel=true)
- record, to file, the mtx of values significant values selected
- keep_mtx_viz(KpViz=true)
- record, to file, the mtx of values added for vizualisation (non significants)
- mtx_opts(MtxOpts=[])
- options to pass when reading the input matrices
- multi(Keys, Vals)
- parameters and associated possible values to pass to exp_go_over_string_graphs/4 (Vals, can be a single value).
If multiple Keys are given, then they synchronise- always taking the
- viz_diffex_only_constraint(DoCon=true)
- enforce that when
de_max()
and exp_pv_cut()
are the same, use viz_de_opts(diffex_only)
- viz_pv_constraint(PvCon=true)
- enforce that PvCut for non-significants must be equal or larger than that for significants
multi()
options are processed in the order they are given, with later ones varied first in the runs.
?- Opts = [],
exp_go_over_string_graphs_multi( Opts ).
- author
- - nicos angelopoulos
- version
- - 0:1 2020/09/17
- To be done
- - shorten to exp_multi/1 ?
- exp_reac_over(+Etx, ?ReOver, +Opts)
- Perform reactome pathway over-representation analysis.
Etx is the matrix of experimental values (should pass as 1st arg in mtx/2).
Opts
- debug(Dbg=false)
- informational, progress messages
- org(Org=hs)
- organism id for experimental data; bio_db_organism/2 first argument values
- gid(Gid)
- the gene id db token identifier for the genes in Etx. Default depends on Org.
- gid_to(Gto)
- returns the gene id db token identifier for interrogating the reactome db
(currently returns ncbi)
- mtx_cutoff(Cnm=_487364,Val=_487370,Dir=false)
- filter the output matrix (see mtx_column_threshold/3)
- pways(Pways=univ)
- pathways to consider, the value only affects the corrected p.values as the longer the
list of pathways the stronger the correction. In order of tightness:
- dx(Dx)
- pathways that contain at least one Gid picked by bio_diffex/4
- exp(Exp)
- pathways that contain at least one Gid in Etx
- univ(PwUniv)
- pathways that contain at least one Gid in Univ (below option
universe()
)
- reac(Reac)
- all reactome pathways for Org
- universe(Univ=experiment)
- the genes universe, or background for genes in the statistical test (also:
reac(tome)
)
Examples
?- exp_reac_over([]).
?- exp_reac_over([mtx_cutoff(0.05,'adj.pvalue',<)]).
- author
- - nicos angelopoulos
- version
- - 0.1 2024/03/16
- See also
- - bio_diffex/4
- - mtx/2
- - bio_db_organism/2
- - org_gid_map/3
- To be done
- - add oddsRatios to make the output equivalent to output from exp_go_over/3.
- gene_family(+Family, -Symbols, +Opts)
- If Family is a known family alias it is expanded to a list of its constituent gene symbols.
Family can be a gene ontology term (atom of the form GO:XXXXXX).
If family is a list of numbers or atoms that map to numbers, then they are taken to be Entrez ids which
are converted to gene symbols in Symbols.
If family is a list of symbols, it is passed on to Symbols.
Listens to debug(gene_family)
.
Known families:
- autophagy/human
- from: http://autophagy.lu/clustering/index.html
Opts (v0.2)
- org(Org=hs)
- organism for the symbols
Opts are also passed to: go_symbols_reach/3.
Located family as the bio_analytics gene collection: autophagy
?- gene_family(autophagy, Auto, org(human)), length(Auto, Len).
Auto = ['AMBRA1', 'APOL1', 'ARNT', 'ARSA', 'ARSB', 'ATF4', 'ATF6', 'ATG10', 'ATG12'|...],
Len = 232.
?- debug(gene_family).
?- gene_family(375, Symbs, []), length(Symbs, Len).
% Located GO term as the family identifier.
Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP', 'LSM1', 'MPHOSPH10', 'PRPF3', 'PRPF4'|...],
Len = 26.
?- gene_family('GO:0000375', Symbs, []), length(Symbs, Len).
% Located GO term as the family identifier.
Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP', 'LSM1', 'MPHOSPH10', 'PRPF3', 'PRPF4'|...],
Len = 26.
?- gene_family([55626, 8542, 405], Auto, []).
Converted input from Entrezes to Symbols.
Auto = ['AMBRA1', 'APOL1', 'ARNT'].
?- gene_family(unknown, Auto, org(hs)).
ERROR: Unhandled exception: gene_family(cannot_find_input_family_in_the_known_ones(unknown,[autophagy]))
Family datasets are in pack(bio_analytics/data/families)
.
- author
- - nicos angelopoulos
- version
- - 0:1 2019/3/5, from old code
- - 0:2 2023/6/7, added options
- See also
- - go_symbols_reach/4
- To be done
- - error reporting via print_message/2
- go_org_symbol(+Org, +Go, -Symb)
- Get symbols of Go id for Organism.
Go can be either a GO: atom or an integer.
[debug] ?- go_org_symbol( mouse, 2, Symbs ), write(Symbs), nl, fail.
Akt3
Mef2a
Mgme1
Mpv17
Mrpl15
Mrpl17
Mrpl39
Msto1
Opa1
Slc25a33
Slc25a36
Tymp
false.
[debug] ?- go_org_symbol( hs, 'GO:0000002', Symbs ), write(Symbs), nl, fail.
AKT3
LONP1
MEF2A
MGME1
MPV17
MSTO1
OPA1
PIF1
SLC25A33
SLC25A36
SLC25A4
TYMP
false.
- author
- - nicos angelopoulos
- version
- - 0:1 2019/04/07
- - 0:2 2023/06/09, added pig, moved to
bio_org.pl
- go_org_symbols(+GoT, +Org, -Symbols)
- Gets the symbols belonging to a GO term.
?- go_org_symbols( 'GO:0000375', human, Symbs ), length( Symbs, Len ).
Symbs = ['SCAF11', 'SLU7', 'SRSF10', 'WDR83', 'DBR1', 'MPHOSPH10', 'PRPF4', 'PRPF6', 'SF3B1'|...],
Len = 25.
?- go_org_symbols( 'GO:0000375', mouse, Symbs ), length( Symbs, Len ).
Symbs = ['Rnu4atac', 'Rnu6atac', 'Srrm1', 'Sf3a2', 'Srsf10', 'Dbr1', 'Scaf11', 'Slu7', 'Srsf10'|...],
Len = 11.
?- go_org_symbols( 375, human, Symbs ), length( Symbs, Len ).
Symbs = ['SCAF11', 'SLU7', 'SRSF10', 'WDR83', 'DBR1', 'MPHOSPH10', 'PRPF4', 'PRPF6', 'SF3B1'|...],
Len = 25.
?- go_org_symbols( 375, chicken, Symbs ), length( Symbs, Len ).
Symbs = ['DBR1'],
Len = 1.
?- go_org_symbols( 375, pig, Symbs ), length( Symbs, Len ).
Symbs = [],
Len = 0.
- author
- - nicos angelopoulos
- version
- - 0.1 2019/4/7
- See also
- - go_org_symb/3, just a findall of
- - go_symbols_reach/3 for a version that travells the ontology
- go_over_plot(+GovF, +Opts)
- Plots a lollipop plot for the GO over represented (csv) file or matrix (mtx/2), GovF.
By default when GovF is a file, an .svg is generated at the same name but postfixed by top_TopN (see options for TopN value).
Opts
- debug(Dbg=false)
- informational, progress messages
- go(GoSec)
- gene ontology section, in:
[BP,MF,CC]
, used for the label. Default is looked for in the filename.
- pfx_len(PfxLen=50)
- how many initial characters of the GO term name should be included ?
- pos_go_term(GoTPos=8)
- argument/column position for the GO term name
- pos_p_val(PvalPos=2)
- argument/column position for the p value
- stem(Stem)
- top_n(TopN=30)
- how many of the top terms to include
?- go_over_plot( 'res_resist-24.06.05/genes_15kb_resistant_enh_bp_go_over.csv', [] ).
- author
- - nicos angelopoulos
- version
- - 0:1 2024/06/05
- See also
- - b_real: gg_lollipop/2
- To be done
- - allow prepending order such as 01- as titles have to be unique
- - allow adding the Count/Size on tick label names
- go_string_graph(+GoTerm, -Graph, +Opts)
- Plot the STRING graph of all the Symbols in a GO term.
The Graph can be saved via options to wgraph_plot/2.
Opts are passed to symbols_string_graph/3 and wgraph_plot/2.
Opts
- org(Org=hs)
- organism for the operation
- plot(Plot=false)
- whether to plot the graph
- save(Save=false)
- whether to save the graph (passed to wgraph_plot/2
- stem_type(Sty=go_name)
- constructs stem for the output filenames:
go_id
, go_pair
, go_pair_ord(I,Len)
.
?- go_string_graph( 'GO:0016601', G, true ).
?- go_string_graph( 'GO:0016601', G, plot(true) ).
?- go_string_graph( 'GO:0016601', G, [plot(true),save(true)] ).
G = ['ABI2', 'AIF1', 'CDH13', 'HACD3', 'NISCH', 'ALS2'-'RAC1':892, 'BRK1'-'CYFIP1':999, ... - ... : 904, ... : ...|...].
?- ls.
% Rac_protein_signal_transduction.csv Rac_protein_signal_transduction_graph.csv Rac_protein_signal_transduction_layout.csv
true.
?- go_string_graph( 'GO:0016601', G, [plot(true),save(true),stem_type(go_pair)] ).
- author
- - nicos angelopoulos
- version
- - 0.1 2016/4/12
- - 0.2 2020/9/3, expanded go_string_graph_stem/3
- To be done
- - make sure underlying options are compatible.
- go_symbols_reach(+GoT, -Symbols, +Opts)
- Gets the symbols belonging to a GO term. Descents to GO child relations,
which by default are includes (reverse of is_a) and consists_of (reverse of part_of)
to pick up Symbols recursively.
Opts
- org(Org=hs)
- should be recognised by 1st arg of bio_db_organism/2.
- go_frame(GO=bioc_ann_dbi)
- which implementation of GO to follow (v0.3 this changes default behaviour).
Some of the options below are not functional with
GO=bioc_ann_dbi
.
Alterantive: =\bio_db\= (which used to be the default, before the option was introduced).
- descent(Desc=true)
- whether to collect symbols from descendant GO terms
- as_child_includes(Inc=true)
- collect from edge_gont_include/2
- as_child_consists_of(Cns=true)
- collect from edge_gont_consists_of/2
- as_child_regulates(Reg=false)
- collect from edge_gont_regulates/2
- as_child_negatively_regulates(Reg=false)
- collect from edge_gont_negatively_regulates/2
- as_child_positively_regulates(Reg=false)
- collect from edge_gont_positively_regulates/2
- debug(Dbg=false)
- see options_append/3
Listens to debug(go_symbols_reach)
.
?- go_symbols_reach( 'GO:0000375', Symbs, [] ), length( Symbs, Len ).
Symbs = ['AAR2', 'ALYREF', 'AQR', 'ARC', 'BCAS2', 'BUD13', 'BUD31', 'CACTIN', 'CASC3'|...],
Len = 293.
?- go_symbols_reach( 'GO:0000375', Symbs, org(mouse) ), length( Symbs, Len ).
Symbs = ['4930595M18Rik', 'Aar2', 'Aqr', 'Bud13', 'Bud31', 'Casc3', 'Cdc40', 'Cdc5l', 'Cdk13'|...].
Len = 190.
?- go_symbols_reach( 375, Symbs, true ), length( Symbs, Len ).
Symbs = ['AAR2', 'ALYREF', 'AQR', 'ARC', 'BCAS2', 'BUD13', 'BUD31', 'CACTIN', 'CASC3'|...],
Len = 293.
- author
- - nicos angelopoulos
- version
- - 0.1 2015/7/26
- - 0.2 2019/4/7, added org, moved to new pack
- - 0.3 2023/6/7, added option
go(GoImpl)
, which changes default behaviour in comparison to past
- To be done
- - re-establish the bio_db interface- it is not far off, and compare to bioc_ann_dbi, bio_db likely to be faster
- symbols_string_graph(+Symbols, -Graph, +Opts)
- Create the string database Graph between Symbols.
Opts
- cohese(Coh=max)
- method for cohesing multiple edges between two nodes
[false,max,min,umax,umin]
the max
and min
versions are more efficient but do sort the edges,
whereas umax
and umin
take much longer but leave edges in found order
- include_orphans(Orph=true)
- set to false to exclude orphans from Graph
- org(Org=human)
- which organism do the gene symbols come from, via bio_db_organism/2
- minw(500)
- minimum weight (0 =< x =< 999) - not checked
- sort_pairs(Spairs=true)
- set to false to leave order of edges dependant on order of Symbols
- sort_graph(Sort=true)
- set to false for not sorting the results
?- Got = 'GO:0043552', gene_family( Got, Symbs, true ), length( Symbs, SymbsLen ),
symbols_string_graph(Symbs, Graph, true ), length( Graph, GraphLen ).
Got = 'GO:0043552',
Symbs = ['AGAP2', 'AMBRA1', 'ATG14', 'CCL19', 'CCL21', 'CCR7', 'CD19', 'CDC42', 'EPHA8'|...],
SymbsLen = 32,
Graph = ['AGAP2', 'CDC42', 'EPHA8', 'NOD2', 'P2RY12', 'TNFAIP8L3', 'AMBRA1'-'ATG14':995, ... - ... : 994, ... : ...|...],
GraphLen = 94.
Got = 'GO:0043552',
Symbs = ['AMBRA1', 'ATG14', 'CCL19', 'CCL21', 'CCR7'|...],
SymbsLen = 31,
Graph = ['TNFAIP8L3', 'AMBRA1'-'ATG14':981, 'AMBRA1'-'PIK3R4':974|...],
GraphLen = 235.
% the following is flawed as it uses Got from mouse and tries to build the graph in human STRING...
?- Got = 'GO:0043552', gene_family( Got, mouse, Symbs ), length( Symbs, SymbsLen ),
symbols_string_graph(Symbs, Graph, true ), length( Graph, GraphLen ).
Got = 'GO:0043552',
Symbs = Graph, Graph = ['Ambra1', 'Atg14', 'Ccl19', 'Cd19', 'Cdc42', 'Epha8', 'Fgf2', 'Fgfr3', 'Fgr'|...],
SymbsLen = GraphLen, GraphLen = 28.
% this the correct way for running the first query for mouse:
?- Got = 'GO:0043552', gene_family( Got, mouse, Symbs ), length( Symbs, SymbsLen ),
symbols_string_graph(Symbs, Graph, org(mouse) ), length( Graph, GraphLen ).
Got = 'GO:0043552',
Symbs = ['Ambra1', 'Atg14', 'Ccl19', 'Cd19', 'Cdc42', 'Epha8'|...],
SymbsLen = 28,
Graph = ['Ambra1', 'Cdc42', 'Lyn', 'Nod2', 'Pdgfra', 'Sh3glb1', 'Tnfaip8l3', 'Vav3', ... : ...|...],
GraphLen = 117.
- author
- - nicos angelopoulos
- version
- - 0.1 2016/1/18
- - 0.2 2019/4/8, added organism to incorporate mouse
- - 0.3 2020/9/5, option
cohese()
, and debugs
- To be done
- - implement
cohese()
values: min, umax and umin