The Cytoscape Ecosystem & NDEx Network Cloud
The Ideker lab is involved in development of several bioinformatic resources for network analysis that are widely used by the biological research community. The most visible of these is a collaborative open-source software project called Cytoscape. Cytoscape is one of the principal tools used by researchers to analyze and visualize molecular interaction networks, with approximately 17,000 downloads per month, and it is also used in almost all of the lab’s research activities. Now in its 16th year, the Cytoscape codebase is curated by four main institutions via funding from the P41 NIGMS National Resource for Network Biology and an NHGRI R01, both of which Dr. Ideker directs. In the past few years the platform has been substantially extended to include a cloud-based storage system for networks, much like Google Drive and DropBox provide for online storage and sharing of other types of documents. This network store, which we call the Network Data Exchange (NDEx), is funded through a separate U24 grant from NCI.
The core Cytoscape application has been frequently extended through a straightforward plug-in architecture, giving ready access to over 330 plug-ins (Cytoscape ‘Apps’) which are presently available and approximately half of which have been described in their own peer-reviewed publications. Approximately 160 of these Apps have been newly published or significantly updated in the past three-year review period, most of which are by independent groups and do not include me or my team as authors. Popular Apps extend Cytoscape in areas such as network query and download; network integration and filtering; attribute-directed network layout; Gene Ontology enrichment analysis; as well as network motif, functional module, protein complex, or domain interaction detection. Our plan for Cytoscape development for the remainder of 2018 is to release Cytoscape 3.7, which will involve major upgrades to the core architecture and seamless roundtrip connectivity to NDEx.
The NDEx Project provides an open-source framework where scientists and organizations can find, store, share and publish biological network knowledge. The project maintains a free Public Server and an informational website with technical documentation.
The Community Detection APplication and Service (CDAPS) framework performs multiscale community detection and functional enrichment for network analysis through a service-oriented architecture. These features are provided by integrating popular community detection algorithms and enrichment tools available via CyCommunityDetection, a Cytoscape application that acts as a client to a dedicated REST server. The server runs all the algorithms and tools remotely and can be launched locally. Its source code and documentation are available at Github.
DrugCell is an interpretable deep learning model of human cancer cells trained on the responses of 1,235 tumor cell lines to 684 drugs. Analysis of DrugCell mechanisms leads directly to the design of synergistic drug combinations, which we validate systematically by combinatorial CRISPR, drug-drug screening in vitro, and patient-derived xenografts. DrugCell provides a blueprint for constructing interpretable models for predictive medicine. Source code is available via Github.
Related papers: Kuenzi B.M. et al. Predicting Drug Response and Synergy Using a Deep Learning Model of Human Cancer Cells. Cancer Cell, Volume 38, Issue 5. 2020
DCell is a VNN embedded in the hierarchical structure of 2,526 subsystems comprising a eukaryotic cell. Trained on several million genotypes, DCell simulates cellular growth nearly as accurately as laboratory observations. DCell provides a foundation for decoding the genetics of disease, drug resistance and synthetic life. Source code is available via Github.
Related papers: Ma J, Yu MK, Fong S, Ono K, Sage E, Demchak B, Sharan R, Ideker T. Using deep learning to model the hierarchical structure and function of a cell. Nat Methods. 2018 Apr;15(4):290-298. doi: 10.1038/nmeth.4627. Epub 2018 Mar 5. PMID: 29505029; PMCID: PMC5882547.
In any ‘omics study, the scale of analysis can dramatically affect the outcome. For instance, when clustering single-cell transcriptomes, the analysis can be tuned to discover broad or specific cell types. Likewise, protein communities revealed from protein networks can vary widely in size depending on the method. HiDeF uses the concept of persistent homology, drawn from mathematical topology, to identify robust structures in data at all scales simultaneously. HiDeF is available via Python and Cytoscape.
Related paper: Zheng, F et al. HiDeF: identifying persistent structures in multiscale ‘omics data. Genome Biol 22, 21 (2021). https://doi.org/10.1186/s13059-020-02228-4 [PDF] [PubMed]
We apply a recently developed technique, few-shot machine learning, to train a versatile neural network model in cell lines that can be tuned to new contexts using few additional samples. The few-shot learning framework provides a bridge from the many samples surveyed in high-throughput screens (n-of-many) to the distinctive contexts of individual patients (n-of-one).
Related paper: Ma, J. et al. Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients. Nat Cancer (2021). [PDF]
NAGA is designed to use biological networks to analyze GWAS results. NAGA assigns each gene with an association score based on the given GWAS result. To integrate prior biological knowledge, NAGA downloads a molecular network from the NDEx (hyperlink NDEx?) database and performs network propagation, providing a set of new scores for each gene. The high scoring genes form a new subnetwork, which can be compared to a set of gold standard genes in order to evaluate the enrichment for previously discovered biology.
Related paper: Carlin D.E. et al. A Fast and Flexible Framework for Network-Assisted Genomic Association. iScience. 2019 [PDF][PubMed]
DDOT is a toolkit for constructing, analyzing, and visualizing data-driven ontologies. DDOT consists of a Python package to assemble and analyze ontologies and HiView, a web application, to visualize them.
Related paper: Yu M.K. et al. DDOT: A Swiss Army Knife for Investigating Data-Driven Biological Ontologies. Cell Syst. 2019 Mar 27;8(3):267-273.e3. doi: 10.1016/j.cels.2019.02.003. Epub 2019 Mar 13. PMID: 30878356; PMCID: PMC7042149. [PDF] [PubMed]
The Clique Extracted Ontologies algorithm (CliXO) infers an ontology in the form of a hierarchical, directed acyclic graph (DAG) from pairwise similarity data. Originally developed for inferring gene ontologies from biological gene networks.
The Network Extracted Ontology (NeXO) is a gene ontology inferred directly from large-scale molecular networks. NeXO uses a principled computational approach which integrates evidence from hundreds of thousands of individual gene and protein interactions to construct a complete hierarchy of cellular components and processes.
Cell Circuit Search: Molecular interaction models provide us with a framework for integrating the large-scale data that we are now able to collect at multiple levels of biological information – genes, RNAs, proteins, and small molecules. Cell Circuit Search is a web-based interface for searching for genes that appear in our library of network models.
NetworkBLAST Software: NetworkBlast analyzes protein interaction networks in order to predict previously unknown relationships. It can compare multiple species’ protein interaction networks and infer interactions through homology. The program is best used in conjunction with Cytoscape to easily visualize the returned data.
PathBLAST Website: Pathway alignment and query against protein interaction databases to identify conserved protein interaction networks between species. PathBLAST searches the protein-protein interaction network of the target organism to extract all protein interaction pathways that align with a pathway query.
VERA and SAM: VERA and SAM was developed to address the need for a better statistical test for identifying differentially-expressed genes. VERA estimates the parameters of a statistical model that describes multiplicative and additive errors influencing an array experiment, using the method of maximum likelihood. SAM gives a value, lambda, for each gene on an array, which describes how likely it is that the gene is expressed differently between the two cell populations and was developed to address the need for a better statistical test for identifying differentially-expressed genes.
Dapple: Dapple is a program for quantitating spots on a two-color DNA microarray image. Given a pair of images from a comparative hybridization, Dapple finds the individual spots on the image, evaluates their qualities, and quantifies their total fluorescent intensities. Dapple is designed to work with microarrays on glass and is a program for quantitating spots on a two-color DNA microarray image.
enoLOGOS: Program enoLOGOS generates LOGOs of transcription factor DNA binding sites from various types of input matrices. It can utilize standard count matrices, probability matrices or matrices of “energy” values (i.e., log-frequencies).