(Copied from Trac and JavaDocs on 06/02/2008)
JavaDocs:
train_model
public String train_model(String train_set, String context_window, String mode, String model_file, String inp)
- Learns a model given the input data. Input: annotated text data in XML format as in, e.g.:
<doc>In contrast to the genes of the <protein>MAGE-A</protein>, <protein>MAGE- B</protein> and <protein>MAGE-C</protein> clusters, <protein>MAGED2</protein> is expressed ubiquitously. </doc>Do not forget to include a root tag <doc> (<document> is also allowed to be a root tag). All other tags are considered to be class labels (such asproteinin the example above).Returns performance given 10-fold cross-validation on the input data. A model is stored inmodel_fileprovided by a user. Its log file is created in the same folder usingmodel_filewith “.log” appended. Note: a model’s log is always stored in the same folder where a model is stored (if any of these are removed by a user eventual applying of a model on the new data set (testing) by using TestModel service will fail). -
- Parameters:
train_set- annotated training set (in XML format)context_window- context size (e.g., 2 means context of -2 and +2 given a word in focus)mode- learning mode (current release supports automatic learning only somodehas to be set toa)model_file- an absolute path to a folder where a model will be stored (including its name, e.g. “/home/Models/mymodel.mod”inp- input type (current release supports annotated text only soinphas to be set totext)- Returns:
- the results of the 10-fold cross-validation
Trac:
LearnModel service
Purpose: To train learning methods on the data provided by the user & to create a learning model for a given task. The task is to carry out named entity recognition (concept learning) based on the text context.
This is a Learner, service which provides the following functionality: it makes use of a user’s annotation to build a model
which might be used to annotate the unseen data. It also provides the information on how well this model performs on the
training data set (provided by a user) based on the 10-fold cross-validation.
Input:
train_set (String) - annotated data in XML file. All tags apart from
context_window (String) - size of a context window (e.g., 2, 3, etc). It is set to 3 by default
mode (String) - mode is currently set to “a” (automatic). It is indended to add interactive mode in the future releases
model_file (String) - path with the filename where the model file must be stored (e.g., “D://INSTALL/Data/train/corpus2TTT.mod”)
inp (String) - type of the input data. It is set to “text” in the current implementation
Output:
- Direct: String (performance info stored in the XML file).
-
- The model file (which might be used for the test purposes)
-
- date
training data
performance info
ARFF header needed to use a stored model. Do NOT delete it if you plan to use a model in the future
is stored with a filename provided by a user as a model_file parameter, see above)The log file which contains the following information:
-
Additional:
Part of a Client example:
String endpoint = “http://localhost:8084/axis/services/LearnModel”;
Service service = new Service();
Call call = (Call) service.createCall();
call.setTargetEndpointAddress(endpoint);
call.setOperationName(”train_model”);
ret = (String) call.invoke( new Object[] { train_file[0], con[0], “a”, “D://INSTALL/Data/train/corpus2TTT.mod”, “text” } );
Realization: Uses Weka, SAX Parser
0 responses so far ↓
You must log in to post a comment.