Adaptive Information Disclosure (AID)

Participating in the VL-e project

Adaptive Information Disclosure (AID) header image 1

Train a model for NER

(Copied from Trac and JavaDocs on 06/02/2008)

JavaDocs:

train_model

public String train_model(String train_set,
                          String context_window,
                          String mode,
                          String model_file,
                          String inp)
Learns a model given the input data. Input: annotated text data in XML format as in, e.g.: <doc>In contrast to the genes of the <protein>MAGE-A</protein>, <protein>MAGE- B</protein> and <protein>MAGE-C</protein> clusters, <protein>MAGED2</protein> is expressed ubiquitously. </doc>Do not forget to include a root tag <doc> (<document> is also allowed to be a root tag). All other tags are considered to be class labels (such as protein in the example above).Returns performance given 10-fold cross-validation on the input data. A model is stored in model_file provided by a user. Its log file is created in the same folder using model_file with “.log” appended. Note: a model’s log is always stored in the same folder where a model is stored (if any of these are removed by a user eventual applying of a model on the new data set (testing) by using TestModel service will fail).
Parameters:
train_set - annotated training set (in XML format)
context_window - context size (e.g., 2 means context of -2 and +2 given a word in focus)
mode - learning mode (current release supports automatic learning only so mode has to be set to a)
model_file - an absolute path to a folder where a model will be stored (including its name, e.g. “/home/Models/mymodel.mod”
inp - input type (current release supports annotated text only so inp has to be set to text)
Returns:
the results of the 10-fold cross-validation

Trac:

LearnModel service

Purpose: To train learning methods on the data provided by the user & to create a learning model for a given task. The task is to carry out named entity recognition (concept learning) based on the text context.

This is a Learner, service which provides the following functionality: it makes use of a user’s annotation to build a model
which might be used to annotate the unseen data. It also provides the information on how well this model performs on the
training data set (provided by a user) based on the 10-fold cross-validation.

Input:
train_set (String) - annotated data in XML file. All tags apart from (and tags used for annotating structural parts of Medline ) are considered to be class labels.
context_window (String) - size of a context window (e.g., 2, 3, etc). It is set to 3 by default
mode (String) - mode is currently set to “a” (automatic). It is indended to add interactive mode in the future releases
model_file (String) - path with the filename where the model file must be stored (e.g., “D://INSTALL/Data/train/corpus2TTT.mod”)
inp (String) - type of the input data. It is set to “text” in the current implementation

Output:

    Direct: String (performance info stored in the XML file).

    • The model file (which might be used for the test purposes)
      is stored with a filename provided by a user as a model_file parameter, see above)

      • date
        training data
        performance info
        ARFF header needed to use a stored model. Do NOT delete it if you plan to use a model in the future
    • The log file which contains the following information:

  • Additional:

Part of a Client example:

String endpoint = “http://localhost:8084/axis/services/LearnModel”;

Service service = new Service();

Call call = (Call) service.createCall();

call.setTargetEndpointAddress(endpoint);

call.setOperationName(”train_model”);

ret = (String) call.invoke( new Object[] { train_file[0], con[0], “a”, “D://INSTALL/Data/train/corpus2TTT.mod”, “text” } );

Realization: Uses Weka, SAX Parser

No Comments

0 responses so far ↓

You must log in to post a comment.