Let's consider a few seemingly simple questions about machine learning algorithms and their implementation, which, however, only a few will be ab

Scikit-Learn: Subtle Questions About Implementing Machine Learning Methods | Dasha.AI

submited by
Style Pass
2021-06-18 17:30:12

Let's consider a few seemingly simple questions about machine learning algorithms and their implementation, which, however, only a few will be able to answer correctly (you can try it yourself - without reading the explanations. Note that additional questions in this post were intentionally left unanswered). Material in this post is for the intermediate level (those who already are familiar with machine learning (ML) and the scikit-learn library)

Why SVM in sklearn gives incorrect probabilities? For example, an item may be classified in class 1, and the probability of belonging to this class may not be maximized.

You can conduct such an experiment: take a training sample of two objects belonging to different classes (0 and 1). We use the same sample as a test sample (see Fig 1). Objects are classified correctly, but their probabilities of belonging to the first class are 0.65 and 0.35. Firstly, these are very strange values, and secondly, an object from class 0 has a high probability of belonging to class 1 and vice versa. Is there really a mistake in sklearn (a library that has been actively used for so many years)?

Strictly speaking this is indeed a bug that has not yet been fixed. It has to do with how, in principle, the probabilities of belonging to classes are calculated in SVM. Take a look at the sklearn.svm.SVC function:

Leave a Comment