I had hoped that the network would learn the matching rule “If the main connectives in the instance match the connectives in the rule pattern and the variables in the instance are in the same relations to each other as the corresponding variables in the rule pattern, then the instance fits the rule.” Unfortunately, training on one rule’s dataset and testing on the other yielded bad performance. Interestingly, training on the MT dataset and testing on the MP set yielded an inverted classification: the network classified fitting instances as non-fitting and vice versa. Finally, combining the two datasets and using a percentage split for training/testing yielded good results.
