ML | Voting Classifier using Sklearn

ML | Voting Classifier using Sklearn

A Voting Classifier is an ensemble meta-classifier that fits multiple base classifiers on the dataset and uses their average predicted probabilities (for soft voting) or majority vote (for hard voting) to predict the class labels. This can be useful for combining the strengths of different algorithms to achieve better overall performance.

Here's how to use the Voting Classifier in scikit-learn:

  • Import Necessary Libraries:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
  • Load the Dataset and Split it:
# For demonstration, let's use the iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  • Create Individual Models:
log_clf = LogisticRegression()
dt_clf = DecisionTreeClassifier()
svm_clf = SVC(probability=True)  # Setting probability=True for soft voting
  • Create and Train the Voting Classifier:

You can choose between hard voting (voting='hard') or soft voting (voting='soft'):

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('dt', dt_clf), ('svc', svm_clf)],
    voting='hard'
)
voting_clf.fit(X_train, y_train)
  • Evaluate the Voting Classifier:
from sklearn.metrics import accuracy_score

y_pred = voting_clf.predict(X_test)
print("Voting classifier accuracy:", accuracy_score(y_test, y_pred))
  • Compare with Individual Classifiers:

This step is optional but is useful to see if the Voting Classifier provides an advantage over individual classifiers.

for clf in (log_clf, dt_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, "accuracy:", accuracy_score(y_test, y_pred))

Remember:

  • Hard Voting: Each classifier in the ensemble "votes" for a class, and the class that gets the most votes is chosen.
  • Soft Voting: Predicted probabilities for each class are averaged across all classifiers, and the class with the highest probability is chosen. This requires the classifiers to be capable of producing class probabilities (e.g., predict_proba method in scikit-learn).

In many scenarios, especially when the individual classifiers are weak or moderately accurate, the Voting Classifier can yield better results by leveraging the strength of each individual model.


More Tags

android-checkbox locale portrait dojo-1.8 removeall voip yaml laravel-artisan spfx azure-application-insights

More Programming Guides

Other Guides

More Programming Examples