BioAID Workflows
We share our workflows on myExperiment.org under the Creative Commons Attribution-Share Alike 3.0 License. In short, that means you can use and adapt the workflows as long as you credit us and also share your work. To use the workflows below right click and select ‘copy link location’, then load them into Taverna (File->’Load from location’). If you have a question, maybe the answer is on our Frequently Asked Questions list.
- Disease discovery workflow
- [description]
This workflow finds disease relevant to the query string via the following steps:
- A user query: a list of terms or boolean query - look at the Apache Lucene project for all details. E.g.:
EZH2 OR "Enhancer of Zeste" +(mutation chromatin) -clinical - Retrieve documents: finds relevant documents (abstract+title) based on query (edit maxHits to change the default maximum number of documents returned; the service is based on Apache Lucene)
- Discover proteins: extract proteins discovered in the set of relevant abstracts (with a ‘named entity recognizer’ trained on genomic terms using a Bayesian approach)
- Link proteins to disease contained in the OMIM disease database (with a service from Japan that interrogates OMIM)
Workflow by Marco Roos (AID = Adaptive Information Disclosure, University of Amsterdam;), Text mining services by Sophia Katrenko and Edgar Meij (AID), OMIM service from the Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, director Hideaki Sugawara.
- A user query: a list of terms or boolean query - look at the Apache Lucene project for all details. E.g.:
- ['EZH2' sample output]
- [description]
- Proteins to diseases workflow
- [description]
This workflow was based on BioAID_DiseaseDiscovery, changes: expects only one protein name, adds protein synonyms). This workflow finds diseases relevant to the query string via the following steps:
- A user query: a single protein name
- Add synonyms (service courtesy of Martijn Scheumie, Erasmus University Rotterdam)
- Retrieve documents: finds relevant documents (abstract+title) based on query
- Discover proteins: extract proteins discovered in the set of relevant abstracts
- Link proteins to disease contained in the OMIM disease database.
- [description]
- Protein discovery workflow
- [description]
This workflow finds proteins relevant to the query string via the following steps:
- A user query: a list of terms or boolean query - look at the Apache Lucene project for all details. E.g.:
EZH2 OR "Enhancer of Zeste" +(mutation chromatin) -clinical - Retrieve documents: finds relevant documents (abstract+title) based on query (edit maxHits to change the default maximum number of documents returned)
- Discover proteins: extract proteins discovered in the set of relevant abstracts.
Services by Edgar Meij and Sophia Katrenko (AID)
- A user query: a list of terms or boolean query - look at the Apache Lucene project for all details. E.g.:
- [description]
- Swanson’s algorithm as a workflow to find unique protein links (rare event)
- [description]
This workflow implements Swanson’s principle with services from the AIDA toolbox. Comments: - It may be useful to optimize the queries for the topics by experimenting with a DiscoverProteins subworkflow first. For example ‘cancer’ surprisingly does not return any proteins, possibly because clinical papers dominate the retrieval results. The query
+cancer -(therapy clinic) +(protein^10.0 proteins^10.0 gene^9 genes^9)performs much better. It contains the Lucene priority operator^[priority], where priority=1 is the default — The nature of the Swansson algorithm makes it much more likely that this workflow returns no results or false positives, than that it returns true positives. True positives returned by this workflow are true with respect to the results of the information retrieval step and information extraction step. Known limits:- Information retrieval: limited number of documents returned, uses indexes for searching, searches and returns abstracts only
- entity recognition: not guaranteed to recognize all instances of proteins.
- [description]
- Semantic version of disease discovery workflow (preliminary result)
- [description]
This workflow adds the results of a workflow that returns diseases related to an enzyme discovered through text mining to the AIDA RDF repository. Diseases and the reference to this workflow are added to a template ontology or proto-ontology that contains the classes and manually added instances (’myModel’): a form of ontology enrichment. Notes:
- You can change the enzyme to any other enzyme or a user input. Technically you can change it to any string, but non-sense results are likely to be produced when it is not a single enzyme. A boolean query is not expected.
- In case you increase ‘maxHits’ in BioAID_DiseaseDiscovery, scaling issues may arise
- Our demo repository is not a safe place to keep your data.
This is preliminary work. For web services inside BioAID_DiseaseDiscovery. RDF repository web services by Willem van Hage (AID/TNO).
- ['EZH2' sample output]
- [description]
- Workflow that provides synonyms of proteins (service and data courtiously provided by Martijn Schuemie)
- [description]
This workflow creates a query string from the query term using Martijn Schuemie’s synonym service. The service is limited to proteins, enzymes and genes. An input query that is a boolean string will be split and processed, but the boolean logic of the input query will be lost.
- [description]