I am developing a set of classifiers which are easily explain-able and comprehensible (unlike the deep-learning models which are buzzwords nowadays). The model belongs to a family of algorithm called Learning Vector Quantization (LVQ).
My model, like any LVQ classifier, consist of one or more prototypes of each conditions (prototypes are typical examples describing each condition). These prototypes are learned by the classifier from the training dataset. When a new subject is presented to the classifier it compares the dissimilarity of the new subject to all the prototypes. The new subject gets sorted into the same class as the prototype to which it had least dissimilarity. Now, dissimilarity is can be computed by Euclidean distance, Manhattan distance, angle based dissimilarity, probability. I developed a variant which uses angle based dissimilarity since with angle based dissimilarity even data points with missing values can be used for training the classifier. These dissimilarity measures (be it Euclidean, angle or any other) are not the absolute dissimilarity of each feature of a subject to each feature of the prototypes; the dissimilarity is adaptive.
Adaptive dissimilarity assigns different importance to different features of a dataset. This prevents confusing and uncharacteristic features from making equal contribution as the important one for making a decision. For example, let’s say you are trying to identify the Doctor at an intergalactic music festival.
You have collected the following features from all the creatures attending that festival: (1) Height, (2) No. of hearts, (3) Weapon, (4) mode of transport, (5) Species of companions, (6) Dress, (7) Catchphrase. Now if all the features had equal importance for identification of the Doctor, you can miss the Doctor. The Catchphrase changes from ‘Fantastic’ to ‘Allons-y’ to ‘Geronimo’. The Dress changes from leather jacket to long trench coat to ridiculously long scarves, or checked blazers or a striped shirt with long trench coat. The height has also greatly varied. But features (2), (3), (4) have remained constant and should therefore get higher importance- higher relevance.
My classifier studies from the training set these features and learns the relevance of features from a technique called optimisation, which provides the feature weights/ relevance/ importance such the the classification error is least. Similarly it learns the prototypes of each condition.
The following gif shows the decision boundaries (class domains) determined by my classifier model. /wp-content/uploads/2019/12/ezgif.com-gif-to-mp4.mp4?_=1
Comments