hmm.labeling.models module¶
-
class
hmm.labeling.models.
Labeler
(lfs=[], model=LabelModel())[source]¶ Bases:
object
Wrapper for Snorkel label model.
addition/change of labeling functions
label aggregation
model fitting
model evaluation: scoring and bucket analysis
filtering NAs
- Parameters
lfs – list of labeling functions (heuristic functions)
model – the model to use - by default, Snorkel’s generative label model
-
add_lfs
(lfs)[source]¶ Add labeling functions to the model.
- Parameters
lfs – list of labeling functions to add
-
filter_probs
(X, L)[source]¶ Filter unlabeled rows (where all the labeling functions abstain) from the dataset.
- Parameters
X – the dataset
L – an n x l matrix of candidate labels, where n is the size of the dataset and l is the number of labeling functions
- Returns
the dataset with any unlabeled tuples removed
-
fit
(L_train, Y_dev=None, fit_params={})[source]¶ Fit the generative label model on a set of labels. No ground-truth labels are required for fitting, but can be included to help make the automatically generated label distribution match the ground-truth label distribution. Fitting involves only the candidate label distributions in the training set L_train.
- Parameters
L_train – an n x l matrix of candidate labels, where n is the size of the training dataset and l is the number of labeling functions
Y_dev – a held-out set of ground-truth labels
fit_params – optional set of parameters for fitting - see Snorkel docs for all options
- Returns
the fitted label model
-
get_confusion_matrix
(L_dev, y_dev)[source]¶ Compute the confusion matrix for the final labels for a held out development set.
- Parameters
L_dev – an n x l matrix of candidate labels, where n is the size of the dev dataset and l is the number of labeling functions
y_dev – ground truth labels for the dev set
- Returns
the confusion matrix as a pandas crosstab
-
get_label_buckets
(L_dev, y_dev)[source]¶ Fetch a bucket of labels (i.a. false positives, false negatives)
- Parameters
L_dev – an n x l matrix of candidate labels, where n is the size of the dev dataset and l is the number of labeling functions
y_dev – ground truth labels for the dev set
- Returns
a set of bucket labels - see the Moral Machine example for some analyses with label buckets
-
get_preds
(L, threshold=0.5)[source]¶ Produce rounded labels from a set of candidate labels produced for some dataset.
- Parameters
L – an n x l matrix of candidate labels, where n is the size of the dataset and l is the number of labeling functions
threshold – threshold for rounding posterior probabilities to discrete labels
- Returns
the rounded labels
-
label
(data, verbose=True)[source]¶ Aggregate candidate labels into a single label for each tuple in the dataframes in data.
- Parameters
data – a set of dataframes, each containing a set of tuples to label
verbose – whether or not to periodically print label status
- Returns
a set of labels for each dataframe in data
-
static
score
(model, L_valid, y_val, verbose=True)[source]¶ Validate the label model on a held out test set.
- Parameters
model – a label aggregation model
L_valid – an n x l matrix of candidate labels, where n is the size of the held-out validation set and l is the number of labeling functions
y_val – ground-truth labels for the held-out validation set
verbose – whether or not to periodically print label status
- Returns