hmm.classification module¶
-
class
hmm.classification.
Classifier
(num_features, cat_features, clf=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=None, verbose=0, warm_start=False))[source]¶ Bases:
object
A simple classification pipeline wrapping the sklearn library.
transforms (imputes, encodes/scales) categorical and numerical features
fits a classifier
computes accuracy scores for the classifier
- Parameters
num_features – a list of df keys for the numerical features
cat_features – a list of df keys for the categorical features
clf – a classification (discriminative) model
-
cross_val
(X, y, cv=5, verbose=True)[source]¶ Cross validate the pipeline.
- Parameters
X – a dataset
y – the ground-truth labels
cv – number of folds in the cross validation
verbose – whether or not to print test accuracy
- Returns
the cross validation score object (sklearn)
-
fit
(X, y)[source]¶ Fit the pipeline on a labeled dataset.
- Parameters
X – the data
y – the ground-truth labels
- Returns
the fitted pipeline
-
hmm.classification.
train_test_val_dev_split
(X, y)[source]¶ Split the dataset into four partitions: training (64%), testing (16%), validation (16%), and development (4%). - Training is for fitting the model. - Testing is for testing the fitted model and parameter tuning. - Validation is for final testing after tuning parameters. - Development is for examining individual rows and performing unit tests.
- Parameters
X – the dataset
y – the ground-truth labels
- Returns
four partitions of the dataset