I have difficulty finding a path to more information about Calpain-3 via the umls identifier C3000052. This query
PREFIX umls: <http://umls.nlm.nih.gov/>
SELECT ?umls_concept ?umls_relation ?umls_concept2
WHERE {
?umls_concept ?umls_relation ?umls_concept2.
FILTER (?umls_concept = umls:C3000052)
}
has no results, while filtering on umls:A8400986 does (used this id just as a test).
Because of this and that the response time for broad searches on umls on the cloud server (e.g. with regex) is slow, I want to have a look at mesh. The mesh subset is also on our aida server. The mesh id for human Calpain-3 on the nlm mesh website appears to be C105884.
I tried this query on the cloud umls repository:
PREFIX umls: <http://umls.nlm.nih.gov/>
SELECT ?umls_concept ?umls_relation ?umls_concept2
WHERE {
?umls_concept ?umls_relation ?umls_concept2.
FILTER (?umls_concept = <http://umls.nlm.nih.gov/MSH/MSH_C105884> || ?umls_concept2 = <http://umls.nlm.nih.gov/MSH/MSH_C105884>)
}
gave 36 triples with interesting information, but not a name yet (which is what I am looking for). But it does provide a refseq and a link to a document. However, I have not found this via a link with C3000052 of course.
Suggestion
All in all, I have stumbled upon the well known bioinformatics problem of identifiers, which leads me to propose a suggestion for CWA. Although I appreciate any attempt to find unambiguous ’standard’ identifiers, I think experience tells us that most attempts have just added a new types of identifiers. Therefore, I suggest that we focus on reducing missing links (which seems to be my problem here). This will not be a perfect solution, but perhaps a good one
I imagine we facilitate a community approach, based on these principles:
- any link to any of the many ’standard’ identifiers links in with a network of interlinked identifiers such that we can reach the identifier we need
- any link created by a user adds to the network, thereby reducing missing links (hence, the community is making the network)
- tools can be built on this network:
- e.g. a path traverser for when the required information is a number of triples away along a path. (e.g I was looking for a human readable label); gradually we could build in intelligence, such as SKOS label discovery.
- A web service for identifier discovery based on this would be really useful in workflows. Such a service would hide the RDF details from the queries (you could just say: ‘umls’, or ‘mesh’ or ‘refseq’ for the output type; I imagine the traversed path is a second output for validation)
- Would this be a connection with Okkam (Staphano Bocconi)? I think this is doable, useful for users, and enough for a computer scientist to produce papers on (imho I don’t think Okkam can produce something useful on both sides unless a community is enabled to do most of the work).