API documentations

SetLibrary

The data structure for microbe-set SetLibrary

class msea.SetLibrary(d_gmt=None, rank_means=None, rank_stds=None)

SetLibrary is convenient class for a reference microbe-set library.

enrich(input_set, adjust=False, universe=1000)

Perform MSEA given an input_set.

Parameters:
  • input_set – a set of microbes as the input for MSEA analysis against this reference set
  • adjust – if True, adjust for the expected distributions of ranks
  • universe – number of microbes used as the universe size for Fisher’s exact test
Returns:

returns a pandas.DataFrame object for the MSEA result table

get_empirical_ranks(n=1000, universe=1200, fix_size=None)

Calculate the empirical rank for each reference sets.

Parameters:
  • n – number of permutations
  • universe – number of microbes used as the universe size for Fisher’s exact test
  • fix_size – if None, uses variable sizes when generating random sets; if int, uses fixed size random sets to evaluate the null distribution of the ranks
Returns:

returns nothing

classmethod load(gmt_file=None, rank_means_file=None, rank_stds_file=None)

Load a reference set into a SetLibrary instance from files.

Parameters:
  • gmt_file – a file or url of a file for the reference set in GMT format
  • rank_means_file – a .npy file for the array of mean ranks
  • rank_stds_file – a .npy file for the array of std ranks
Returns:

returns a SetLibrary object

save(dirname)

Save the SetLibrary instance in a directory, optionally with computed rank_means and rank_stds.

Parameters:dirname – directory name to which the object is going to be stored
Returns:returns nothing

Utility functions

Utils for performing microbe-set enrichment analysis.

msea.utils.enrich(microbes, d_gmt, rank_means=None, rank_stds=None, universe=1000)

Perform enrichment analysis for a set of microbes against a microbe-set library using Fisher’s exact test and z-score.

Parameters:
  • microbes – a set of microbes as the input for MSEA analysis against this reference set
  • d_gmt – a dictionary of microbe-sets representing the reference microbe-set library
  • rank_means – (optional) the array of mean ranks from null distribution
  • rank_stds – (optional) the array of standard deviations of ranks from null distribution
  • universe – number of microbes used as the universe size for Fisher’s exact test
Returns:

returns a pandas.DataFrame object for the MSEA result table

msea.utils.fisher_test(s1, s2, universe)

Perform Fisher’s exact test for two sets.

Parameters:
  • s1 – a set of items
  • s2 – a set of items
  • universe – int, universe size
Returns:

returns the odds ratio and p-value

msea.utils.get_empirical_ranks(d_gmt, n=1000, universe=1200, fix_size=None)

Generate random microbe sets to get empirical ranks for each term.

Parameters:
  • n – number of permutations
  • universe – number of microbes used as the universe size for Fisher’s exact test
  • fix_size – if None, uses variable sizes when generating random sets; if int, uses fixed size random sets to evaluate the null distribution of the ranks
Returns:

returns the means and standard deviations of the null ranks

msea.utils.multipletests_fdr_bh(pvals, is_sorted=False)

FDR Benjamini-Hochberg correction for p-values adapted from statsmodels.

Parameters:
  • pvals – an array of nominal p-values
  • is_sorted – bool, whether the p-values are sorted
Returns:

returns an array of corrected p-values aka FDRs/q-values

msea.utils.read_gmt(file_or_url)

Read a gmt file into a dictionary of sets.

Parameters:file_or_url – a GMT file or URL of a GMT file
Returns:a dictionary of sets
msea.utils.write_gmt(d_gmt, filename)

Write a dictionary of sets to a gmt file.

Parameters:
  • d_gmt – a dictionary of sets
  • filename – filename for the GMT file
Returns:

returns nothing