Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Were the training and test set split apart randomly?

April 26, 2017apart randomly Split Test TRAINING

0

Posted

Were the training and test set split apart randomly?

1 Answer

0

Posted

Yes, using a stratified method so that the class ratios would be the same in both sets. • Do the data files distributed with the training set describe the test set instances in addition to the training set instances? Yes. The abstracts, protein-protein interactions, localization values, function values and aliases represent knowledge about all of the genes in yeast. The test set to be provided will consist solely of a list of gene identifiers. All of the information required to instantiate features for the test set instances is in the data files that were included with the training instances. • Are the MEDLINE abstracts meant to be used as input data? Yes, in fact it is probably necessary to use them to get competitive accuracies. • Why do the abstracts often contain references to gene names followed by a “p”. For example, abstract 10022848 references “sec4p” and “sec15p”, but the file gene-abstracts.txt associates this abstract with the genes “sec4” and “sec15”. The “p” suffix is ofte