API documentations¶

SetLibrary¶

The data structure for microbe-set SetLibrary

class msea.SetLibrary(d_gmt=None, rank_means=None, rank_stds=None)¶

SetLibrary is convenient class for a reference microbe-set library.

enrich(input_set, adjust=False, universe=1000)¶

Perform MSEA given an input_set.

Parameters:	input_set – a set of microbes as the input for MSEA analysis against this reference set adjust – if True, adjust for the expected distributions of ranks universe – number of microbes used as the universe size for Fisher’s exact test
Returns:	returns a pandas.DataFrame object for the MSEA result table

get_empirical_ranks(n=1000, universe=1200, fix_size=None)¶

Calculate the empirical rank for each reference sets.

Parameters:	n – number of permutations universe – number of microbes used as the universe size for Fisher’s exact test fix_size – if None, uses variable sizes when generating random sets; if int, uses fixed size random sets to evaluate the null distribution of the ranks
Returns:	returns nothing

classmethod load(gmt_file=None, rank_means_file=None, rank_stds_file=None)¶

Load a reference set into a SetLibrary instance from files.

Parameters:	gmt_file – a file or url of a file for the reference set in GMT format rank_means_file – a .npy file for the array of mean ranks rank_stds_file – a .npy file for the array of std ranks
Returns:	returns a SetLibrary object

save(dirname)¶

Save the SetLibrary instance in a directory, optionally with computed rank_means and rank_stds.

Parameters:	dirname – directory name to which the object is going to be stored
Returns:	returns nothing

Utils for performing microbe-set enrichment analysis.

msea.utils.enrich(microbes, d_gmt, rank_means=None, rank_stds=None, universe=1000)¶

Perform enrichment analysis for a set of microbes against a microbe-set library using Fisher’s exact test and z-score.

Parameters:

microbes – a set of microbes as the input for MSEA analysis against this reference set
d_gmt – a dictionary of microbe-sets representing the reference microbe-set library
rank_means – (optional) the array of mean ranks from null distribution
rank_stds – (optional) the array of standard deviations of ranks from null distribution
universe – number of microbes used as the universe size for Fisher’s exact test

Returns:

returns a pandas.DataFrame object for the MSEA result table

msea.utils.fisher_test(s1, s2, universe)¶

Perform Fisher’s exact test for two sets.

Parameters:	s1 – a set of items s2 – a set of items universe – int, universe size
Returns:	returns the odds ratio and p-value

msea.utils.get_empirical_ranks(d_gmt, n=1000, universe=1200, fix_size=None)¶

Generate random microbe sets to get empirical ranks for each term.

Parameters:	n – number of permutations universe – number of microbes used as the universe size for Fisher’s exact test fix_size – if None, uses variable sizes when generating random sets; if int, uses fixed size random sets to evaluate the null distribution of the ranks
Returns:	returns the means and standard deviations of the null ranks

msea.utils.multipletests_fdr_bh(pvals, is_sorted=False)¶

FDR Benjamini-Hochberg correction for p-values adapted from statsmodels.

Parameters:	pvals – an array of nominal p-values is_sorted – bool, whether the p-values are sorted
Returns:	returns an array of corrected p-values aka FDRs/q-values

msea.utils.read_gmt(file_or_url)¶

Read a gmt file into a dictionary of sets.

Parameters:	file_or_url – a GMT file or URL of a GMT file
Returns:	a dictionary of sets

msea.utils.write_gmt(d_gmt, filename)¶

Write a dictionary of sets to a gmt file.

Parameters:	d_gmt – a dictionary of sets filename – filename for the GMT file
Returns:	returns nothing