10 Underappreciated Python Packages for Machine Learning Practitioners
Here are 10 underappreciated Python packages covering neural architecture design, calibration, UI creation and dissemination.
By Vinay Uday Prabhu, Chief Scientist at UnifyID Inc.
TLDR: Resources curated:
👉🏻:Github Repo with all the images, code and figures
👉🏻 : Collage in pdf form with clickable links
👉🏻: Colab notebook
👉🏻: HTML Document
👉🏻:Notebook in PDF format
Introduction
“The power of Open Source is the power of the people. The people rule”: Philippe Kahn
Ever since my doctoral studies that mostly entailed performing statistical analysis in R ( and admittedly Octave/MATLAB), I have strongly embraced the emergence of Python as the lingua franca amongst machine learners / data scientists / *insert latest professionbuzzword here*.
My daily workflow involves quickly reacting to the vagaries of messy realworld data, in all it’s naiveassumptionshattering glory. One major difference between graduate school and industry to me is the conquest of the innerego that goads you to implement algorithms from scratch. Once past the whiteboarding/hypothesis building phase I quickly parse through the PyPi repository to check if any of the constituent modules have already been authored. This is typically followed by a
>> pip install *PACKAGE_NAME* ritual and voila, I find myself standing on the shoulders of the opensource giants whose careful work I am now harnessing to scale the DIKW pyramid.
I authored this blogpost to acknowledge, celebrate and yes, publicize, some amazing and underappreciated PyPi packages that I used this past year; ones that I strongly feel deserve more recognition and love from our community. This is also my humble ode to the opensource scholars’ sweat equity that oft gets buried inside the pip install command.
Caveat on subdomain bias: This particular post is focused on machine learning pipelines entailing neural networks/deep learning. I plan to author similarly focused blogposts on specialized topics such as timeseries analysis and humankinematics analysis in the near future.✌️
What follows below are basic introductions into the 10 PyPi packages spanning:
a) Neural network architecture specification and training: NSLtf, Kymatio and LARQ
b) Post training calibration and performance benchmarking: NetCal, PyEER and Baycomp.
c) Pre realworld deployment stresstesting: PyOD, HyPPO and Gradio
d) Documentation / dissemination: Jupyter_to_medium
0: Pip install the above mentioned packages :)
!pip install quiet neuralstructuredlearning
!pip install quiet larq larqzoo
!pip install quiet kymatio
!pip install quiet netcal
!pip install quiet baycomp
!pip install quiet pyeer
!pip install quiet pyod
!pip install quiet hyppo
!pip install quiet gradio
!pip install quiet jupyter_to_medium
A) Neural network architecture specification and training: NSLtf, Kymatio and LARQ
1: Neural Structured Learning Tensorflow:
At the heart of most offtheshelf classification algorithms in machine learning lies the i.i.d fallacy. Simply put, the algorithm design rests on the assumption that the samples in the training set (as well as the testset) are independent and identically distributed. In reality however, this rarely holds true and there exist correlations between the samples that can be harnessed to achieve better accuracy and explainability as well. In a wide array of application scenarios (See Fig1), these correlations are captured by an underlying graph (G(V,E)) that can either be comined or statistically inferred. For example, if you are performing, say, sentiment detection of textualtweets, the underlying followerfollowing social graph provides vital cues that models the social context in which the tweet was authored. This social neighborhood information can then be harnessed to perform networkaided classification that can be crucial in guarding against textonly shortcomings such as sarcasm misdetection and hashtaghijacking.
My PhD thesis titled “Network Aided Classification and Detection of Data” literally explored the science and algorithmics of this graphenhanced machine learning and it was so heartening to see Tensorflow release the Neural structured learning framework along with a series of well crafted tutorials ( youtube playlist ) along with an easytofollow NSL Example colabnotebook.
In the example cell below, we train a NSLenhanced neural network for the standard MNIST dataset in an adversarial setting:
import tensorflow as tf
import neural_structured_learning as nsl
import numpy as np
import matplotlib.pyplot as plt
# Prepare data.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Create a base model  sequential, functional, or subclass.
model = tf.keras.Sequential([
tf.keras.Input((28, 28), name='feature'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
# Wrap the model with adversarial regularization.
adv_config = nsl.configs.make_adv_reg_config(multiplier=0.2, adv_step_size=0.05)
adv_model = nsl.keras.AdversarialRegularization(model, adv_config=adv_config)
# Compile, train, and evaluate.
adv_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
adv_model.fit({'feature': x_train, 'label': y_train}, batch_size=32, epochs=5)
adv_model.evaluate({'feature': x_test, 'label': y_test})Epoch 1/5
1875/1875 [==============================]  39s 2ms/step  loss: 0.5215  sparse_categorical_crossentropy: 0.4292  sparse_categorical_accuracy: 0.8781  scaled_adversarial_loss: 0.0924
Epoch 2/5
1875/1875 [==============================]  4s 2ms/step  loss: 0.1447  sparse_categorical_crossentropy: 0.1171  sparse_categorical_accuracy: 0.9663  scaled_adversarial_loss: 0.0276
Epoch 3/5
1875/1875 [==============================]  4s 2ms/step  loss: 0.0944  sparse_categorical_crossentropy: 0.0758  sparse_categorical_accuracy: 0.9770  scaled_adversarial_loss: 0.0186
Epoch 4/5
1875/1875 [==============================]  4s 2ms/step  loss: 0.0672  sparse_categorical_crossentropy: 0.0536  sparse_categorical_accuracy: 0.9840  scaled_adversarial_loss: 0.0137
Epoch 5/5
1875/1875 [==============================]  4s 2ms/step  loss: 0.0532  sparse_categorical_crossentropy: 0.0421  sparse_categorical_accuracy: 0.9876  scaled_adversarial_loss: 0.0111
313/313 [==============================]  1s 2ms/step  loss: 0.0940  sparse_categorical_crossentropy: 0.0751  sparse_categorical_accuracy: 0.9761  scaled_adversarial_loss: 0.0189
[0.09399436414241791,
0.07509651780128479,
0.9761000275611877,
0.018897896632552147]Y_pred_test=adv_model.predict({'feature': x_test, 'label': y_test})
Y_pred_test.shape(10000, 10)
2: Kymatio: Wavelet scattering in Python
Here’s one of the best (or worst?) kept secrets in ML: A lot of the easy datasets (read the xmnist family / catsvdogs / HotDog classification) require NO backprop/SGD training histrionics.
The classes are separable enough and the architectureinduced discriminative capacity is high enough that careful initialization using Grassmannian codebooks or wavelet filters followed by ‘lastlayer’ hyperplane learning (using standard regression techniques) should suffice to obtain a highaccuracy classifier.
In this regard, Kymatio has played a Caesaresque role in the wavelet filters world uniting all the previous siloed projects such as ScatNet
, scattering.m
, PyScatWave
, WaveletScattering.jl
, and PyScatHarm
into one easy to use monolithic portable framework that seamlessly works across six frontend–backend pairs: NumPy (CPU), scikitlearn (CPU), pure PyTorch (CPU and GPU), PyTorch+scikitcuda (GPU), TensorFlow (CPU and GPU), and Keras (CPU and GPU).
In the example cell below, we use the inbuilt Scattering2D class to train another MNISTneural network that attains 92.84% accuracy in 15 epochs. This package is wonderfully well documented with a plethora of interesting examples such as Classification of spoken digit recordings using 1D scattering transforms and 3D scattering quantum chemistry regression.
# 1: Importsfrom tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Flatten, Dense
from kymatio.keras import Scattering2D
# Above, we import the Scattering2D class from the kymatio.keras package.# 2: Model definitioninputs = Input(shape=(28, 28))
x = Scattering2D(J=3, L=8)(inputs)
x = Flatten()(x)
x_out = Dense(10, activation='softmax')(x)
model_kymatio = Model(inputs, x_out)
print(model_kymatio.summary())# 3: Compile and trainmodel_kymatio.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])# We then train the model_kymatio using model_kymatio.fit on a subset of the MNIST data.
model_kymatio.fit(x_train[:10000], y_train[:10000], epochs=15,
batch_size=64, validation_split=0.2)
# Finally, we evaluate the model_kymatio on the heldout test data.model_kymatio.evaluate(x_test, y_test)Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 28, 28)] 0
_________________________________________________________________
scattering2d (Scattering2D) (None, 217, 3, 3) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 1953) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 19540
=================================================================
Total params: 19,540
Trainable params: 19,540
Nontrainable params: 0
_________________________________________________________________
313/313 [==============================]  36s 114ms/step  loss: 0.6448  accuracy: 0.9285[0.6448228359222412, 0.9284999966621399]
3: LARQ
I met the LARQ developers last December during NEURIPS2019 in Vancouver where they unveiled their new opensource Python library for training Binarized Neural Networks (BNNs) alongside the poster of their paper titled Latent Weights Do Not Exist: Rethinking Binarized
Neural Network Optimization. While there seems to be a lot of interest towards model compression for resource constrained ondevice deployment (Here’s 42 of ‘em!), training fastandfrugal Binary Neural Networks from scratch seems to be an option that many seem to discount at the outset.
The LARQ package should help change things on that front given the ease of use, fast inference (Convolution operations turn into xor/bitshifts with binarized weights), brilliant documentation and plentiful architecture examples that one can then hack away by means of a fullfledged model zoo. This year, I have personally published work on styletransfer and a 40 kB BiPedalNet model using LARQ and it’s always a breeze to work with this toolkit. Besides the Zoo, the package is also accompanied by a highly optimized Compute Engine that currently supports various mobile platforms, has been benchmarked on a Pixel 1 phone & Raspberry Pi and also provides a collection of handoptimized TensorFlow Lite custom operators for supported instruction sets, developed in inline assembly or in C++ using compiler intrinsics.
In the example code cell below, we train a 13.19 KB BNN that hits 98.31 % on the MNIST dataset in 6 epochs and also demonstrate how easy it is to pull one of the SOTA pretrained QuickNet models from the LARQzoo and run inference.
import larq as lq
# MODEL DEFINITION (All quantized layers except the first will use the same options)kwargs = dict(input_quantizer="ste_sign",
kernel_quantizer="ste_sign",
kernel_constraint="weight_clip")model_bnn = tf.keras.models.Sequential()# In the first layer we only quantize the weights and not the input
model_bnn.add(lq.layers.QuantConv2D(32, (3, 3),
kernel_quantizer="ste_sign",
kernel_constraint="weight_clip",
use_bias=False,
input_shape=(28, 28, 1)))
model_bnn.add(tf.keras.layers.MaxPooling2D((2, 2)))
model_bnn.add(tf.keras.layers.BatchNormalization(scale=False))model_bnn.add(lq.layers.QuantConv2D(64, (3, 3), use_bias=False, **kwargs))
model_bnn.add(tf.keras.layers.MaxPooling2D((2, 2)))
model_bnn.add(tf.keras.layers.BatchNormalization(scale=False))model_bnn.add(lq.layers.QuantConv2D(64, (3, 3), use_bias=False, **kwargs))
model_bnn.add(tf.keras.layers.BatchNormalization(scale=False))
model_bnn.add(tf.keras.layers.Flatten())model_bnn.add(lq.layers.QuantDense(64, use_bias=False, **kwargs))
model_bnn.add(tf.keras.layers.BatchNormalization(scale=False))
model_bnn.add(lq.layers.QuantDense(10, use_bias=False, **kwargs))
model_bnn.add(tf.keras.layers.BatchNormalization(scale=False))
model_bnn.add(tf.keras.layers.Activation("softmax"))# MODEL DEFINITON AND TRAINING print(lq.models.summary(model_bnn))
model_bnn.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])x_train_bnn = x_train.reshape((60000, 28, 28, 1))
x_test_bnn = x_test.reshape((10000, 28, 28, 1))
model_bnn.fit(x_train_bnn,y_train, batch_size=64, epochs=6)test_loss, test_acc = model_bnn.evaluate(x_test_bnn, y_test)
print(f"Test accuracy {test_acc * 100:.2f} %")
+sequential_1 summary+
 Total params 93.6 k 
 Trainable params 93.1 k 
 Nontrainable params 468 
 Model size 13.19 KiB 
 Model size (8bit FP weights) 11.82 KiB 
 Float32 Equivalent 365.45 KiB 
 Compression Ratio of Memory 0.04 
 Number of MACs 2.79 M 
 Ratio of MACs that are binarized 0.9303 
++
None
313/313 [==============================]  1s 2ms/step  loss: 0.3632  accuracy: 0.9831
Test accuracy 98.31 %Y_pred_bnn = model_bnn.predict(x_test_bnn)
y_pred_bnn=np.argmax(Y_pred_bnn,axis=1)
(y_pred_bnn==y_test).mean()0.9831import tensorflow_datasets as tfds
import larq_zoo as lqz
from urllib.request import urlopen
from PIL import Image
#####################################img_path = "https://raw.githubusercontent.com/larq/zoo/master/tests/fixtures/elephant.jpg"with urlopen(img_path) as f:
img = Image.open(f).resize((224, 224))x = tf.keras.preprocessing.image.img_to_array(img)
x = lqz.preprocess_input(x)
x = np.expand_dims(x, axis=0)
model = lqz.sota.QuickNet(weights="imagenet")
preds = model.predict(x)
pred_dec=lqz.decode_predictions(preds, top=5)[0]
print(f'Top5 predictions: {pred_dec}')#####################################pred_dec=lqz.decode_predictions(preds, top=5)[0]
plt.imshow(img)
plt.title(f'Top prediction:\n {pred_dec[0]}');Top5 predictions: [('n02504458', 'African_elephant', 0.7053231), ('n01871265', 'tusker', 0.2933379), ('n02504013', 'Indian_elephant', 0.001338586), ('n02408429', 'water_buffalo', 7.938418e08), ('n01704323', 'triceratops', 7.2361296e08)]
B) Post training calibration and performance benchmarking: NetCal, PyEER and BayComp
In this section, we’ll look at packages that are useful in the posttraining predeployment scenario where the practitioner’s goals are to calibrate the outputs a of a pretrained model and rigorously benchmark the performance of multiple models ripe for deployment.
1: Netcal:
Often times, I have seen ML practitioners buy into this false equivalence between the output softmax values and probabilities. They are anything but that! Their coinhabitance of the (0,1] space allows them to masquerade as probabilities but the ‘raw’ softmax values are, well, ‘uncalibrated’ put nicely. Hence, posttraining calibration is a rapidly growing body of work in deep learning, and the techniques proposed herewith largely falls into 3 categories (See Fig2):
 Binning (Ex: Histogram Binning, Isotonic Regression, Bayesian Binning into Quantiles (BBQ), Ensemble of Near Isotonic Regression (ENIR))
 Scaling (Ex: Logistic Calibration/Platt Scaling, Temperature Scaling , Beta Calibration)
 Hybrid scalingbinning (Python library: https://pypi.org/project/uncertaintycalibration)
With regards to all the above stated Binning and Scaling techniques, the implementations with extremely well authored documentation is available in the NetCal. The package also included primitives for generating Reliability Diagrams and estimating calibration error metrics such as Expected /Max/Average Calibration Errors as well.
In the cell below, we see use the obtained softmax values on the MNIST testset (from the NSL trained model above) to demonstrate the usage of the Temperature Scaling calibration and ReliabilityDiagram generation routines.
# In case you also want to try the scalingbinning calibration: #!pip3 install git+https://github.com/plambda/verified_calibration.git # PyPi> Kaputfrom netcal.scaling import TemperatureScaling
import matplotlib.pyplot as plt
### Initialize and transform
temperature = TemperatureScaling()
temperature.fit(Y_pred_test, y_test)
calibrated = temperature.transform(Y_pred_test)
### Visualization
fig, axes = plt.subplots(nrows=1, ncols=2,figsize=(10,4))
axes[0].matshow(Y_pred_test.T,aspect='auto', cmap='jet')
axes[0].set_title("Original Uncalibrated softmax")
axes[0].set_xlabel("Test image index (10k images)")
axes[0].set_ylabel("Class index")
# axes[0].set_xticks([])
axes[1].matshow(calibrated.T,aspect='auto', cmap='jet')
axes[1].set_title("Tscaled softmax")
axes[1].set_xlabel("Test image index (10k images)")
# axes[1].set_xticks([])
plt.tight_layout()
plt.show()
y_pred_nsl=np.argmax(Y_pred_test,axis=1)
ind_correct=np.where(y_pred_nsl==y_test)[0]
ind_wrong=np.where(y_pred_nsl!=y_test)[0]plt.figure(figsize=(10,4))
for i in range(5):
plt.subplot(1,5,i+1)
ind_i=ind_correct[i]
plt.imshow(x_test[ind_i],cmap='gray_r')
class_pred_i=np.argmax(Y_pred_test[ind_i,:])
softmax_uncalib_i=str(np.round(Y_pred_test[ind_i,class_pred_i],3))
softmax_calib_i=str(np.round(calibrated[ind_i,class_pred_i],3))
plt.title(f'{class_pred_i}  {softmax_uncalib_i}  {softmax_calib_i}')
plt.tight_layout()
plt.suptitle('Correct predictions \n Class  Uncalibrated  Calibrated');
#############################################
plt.figure(figsize=(10,4))
for i in range(5):
plt.subplot(1,5,i+1)
ind_i=ind_wrong[i]
plt.imshow(x_test[ind_i],cmap='gray_r')
class_pred_i=np.argmax(Y_pred_test[ind_i,:])
softmax_uncalib_i=str(np.round(Y_pred_test[ind_i,class_pred_i],3))
softmax_calib_i=str(np.round(calibrated[ind_i,class_pred_i],3))
plt.title(f'{class_pred_i}  {softmax_uncalib_i}  {softmax_calib_i}')
plt.tight_layout()
plt.suptitle('Wrong predictions \n Class  Uncalibrated  Calibrated');
from netcal.presentation import ReliabilityDiagram
n_bins = 10
diagram = ReliabilityDiagram(n_bins)
diagram.plot(Y_pred_test, y_test) # visualize miscalibration of uncalibrated
2: Baycomp: So you think you have a better classifier?
One of the underrated conundrums that both ML practitioners and in some ways, research paper reviewers, grapple with, is rigorously ascertaining the predictive supremacy of one classifier model over the other(s). ModelOlympics platforms like Papers with code further promulgate this modelranking fallacy by erroneously centering the top1 accuracy metric (See Fig below) as the deciding measure.
So, given two classifications models with similar engineering overheads to deploy, how do you choose one over the other? Typically, we have a standard benchmarking dataset (or a set of datasets) that serve as the testing ground for classifierwars. After obtaining the ‘raw accuracy metrics over this datasetspace’ a statistical minded machine learner might be inclined to use tools from the frequentist null hypothesis significance testing (NHST) framework to establish which classifier is ‘better’. However, as stated here, “Many scientific fields however realized the shortcomings of frequentist reasoning and in the most radical cases even banned its use in publications”.
Baycomp emerges in this context providing a Bayesian framework for comparison of classifiers. The library helps compute three probabilities:
 P_left : The probability that the first classifier has higher accuracy scores than the second.
 P_rope: The probability that differences are within the region of practical equivalence (rope)
 P_right: The probability that the second classifier has higher scores.
The region of practical equivalence (rope) is specified by the machine learner who is well versed with what could be safely assumed to be equivalent in the domain of deployment.
In the example cell below, we consider both, a synthetic example entailing two closely competitive classifiers as well as the two classifiers we just trained using the NSLTF and LARQBNN frameworks on the MNIST dataset
# Helper function to plot the accuracies
def bar_plt2(acc_1,acc_2,label_1='Legacy classifier',label_2='New classifier',X_LABELS=['default'],Category_x='Dataset'):
# set width of bar
if(X_LABELS==['default']):
X_LABELS=list(string.ascii_uppercase[0:len(acc_1)])
barWidth = 0.25
# Set position of bar on X axis
r1 = np.arange(len(acc_1))
r2 = [x + barWidth for x in r1]
# Make the plot
plt.bar(r1, acc_1, color='#7f6d5f', width=barWidth, edgecolor='white', label=label_1)
plt.bar(r2, acc_2, color='#557f2d', width=barWidth, edgecolor='white', label=label_2)
# Add xticks on the middle of the group bars
plt.xlabel(Category_x, fontweight='bold')
plt.xticks([r + barWidth for r in range(len(acc_1))],X_LABELS)
plt.title('Accuracy comparison of the two classifiers')
# Create legend & Show graphic
plt.legend()
plt.show()
return Noneimport string
from baycomp import *
# First, let us generate two synthetic classifier accuracy vectors across 10 hypothetical datasets.
# Accuracies obtained by a legacy classifier
classifier_legacy_acc=np.random.randint(80,85,size=(10))
mean_legacy=np.mean(classifier_legacy_acc)
# Accuracies obtained by a newproposed classifier
classifier_new_acc=np.random.randint(80,87,size=(10))
mean_new=np.mean(classifier_new_acc)
print(f'The mean accuracies of the two classifiers are: {mean_legacy} and {mean_new}')
bar_plt2(classifier_legacy_acc,classifier_new_acc)The mean accuracies of the two classifiers are: 82.0 and 81.8
print('$p_{left}, p_{rope},p_{right}$ using the two_on_multiple function: ')
print(two_on_multiple(classifier_legacy_acc, classifier_new_acc, rope=1))
# With some additional arguments, the function can also plot the posterior distribution from
# which these probabilities came.
# Tests are packed into test classes.
# The above call is equivalent to
print('$p_{left}, p_{rope},p_{right}$ using the SignedRankTest.probs function: ')
print(SignedRankTest.probs(classifier_legacy_acc, classifier_new_acc, rope=1))
# and to get a plot, we call
print(SignedRankTest.plot(classifier_legacy_acc, classifier_new_acc, rope=1, names=("LegacySRT", "NewSRT")))
# To switch to another test, use another class:
SignTest.probs(classifier_legacy_acc, classifier_new_acc, rope=1)
# Finally, we can construct and query sampled posterior distributions.
posterior = SignedRankTest(classifier_legacy_acc, classifier_new_acc, rope=1)
print(posterior.probs())
posterior.plot(names=("legacyPost", "newPost"))$p_{left}, p_{rope},p_{right}$ using the two_on_multiple function:
(0.28222, 0.4604, 0.25738)
$p_{left}, p_{rope},p_{right}$ using the SignedRankTest.probs function:
(0.28056, 0.46356, 0.25588)
######################################################
acc_bnn=np.zeros(10)
acc_nsl=np.zeros(10)
for c in range(10):
mask_c=y_test==c
acc_bnn[c]= (y_pred_bnn[mask_c]==c).mean()
acc_nsl[c]= (y_pred_nsl[mask_c]==c).mean()
bar_plt2(acc_nsl,acc_bnn,label_1='NSL',label_2='BNN',X_LABELS=list(np.arange(10).astype(str)),Category_x='MNIST digit classes')
posterior = SignedRankTest(acc_nsl, acc_bnn, rope=0.005)
print(posterior.probs())
posterior.plot(names=("NSL", "BNN"))
(0.0, 0.2846, 0.7154)
Important caveat: There is a related but distinct conversation surrounding the very culture of predictive accuracy veneration in ML. This predicitivism v/s accommodation debate in science has been evolving since the days of John Herschel and William Whewell in the 1800s.
3: PyEER
Another way of comparing two classifiers, especially in the context of solving the binary authentication problem (Not surveillance but Authentication) is by plotting the comparative detection error tradeoff (DET) and Receiver operating characteristic (ROC) graphs. PyEER is an absolute tourdeforce in this regard as it serves as a onestopshop for not just plotting the relevant graphs but also autogenerating metricsreports and estimating EERoptimalthresholds. In the example cell below, we compare the AngleBased Outlier Detector (ABOD)and the KNN inlieroutlier detector binary classifiers that’ll be introduced in the forthcoming section on predeployment OutofDistribution detection techniques.
from pyeer.eer_info import get_eer_stats
from pyeer.report import generate_eer_report, export_error_rates
from pyeer.plot import plot_eer_stats
# Gather up all the 'Genuine scores' and the 'impostor scores'
gscores_abod=y_test_proba_abod[y_test_ood==0,0]
iscores_abod=y_test_proba_abod[y_test_ood==1,0]
gscores_knn=y_test_proba_knn[y_test_ood==0,0]
iscores_knn=y_test_proba_knn[y_test_ood==1,0]
# Calculating stats for classifier A
stats_abod = get_eer_stats(gscores_abod, iscores_abod)
# Calculating stats for classifier B
stats_knn = get_eer_stats(gscores_knn, iscores_knn)
print(f'EERKNN = {stats_knn.eer}, EERABOD = {stats_abod.eer}')
plot_eer_stats([stats_abod, stats_knn], ['ABOD', 'KNN'])##############################
import matplotlib.image as mpimg
img1 = mpimg.imread('DET.png')
img2 = mpimg.imread('ROC.png')
plt.figure(figsize=(9,4))
plt.subplot(121)
plt.imshow(img1)
plt.subplot(122)
plt.imshow(img2)
plt.show()EERKNN = 0.0, EERABOD = 0.008333333333333333
C: Pre realworld deployment stresstesting: PyOD, HyPPO and Gradio
Vulnerability to Outofdistribution (OOD) samples resulting in confident mispredictions is currently one of the most serious roadblocks that haunts the transition of ML ideas from pretty little papers to realworld deployment where the inputs have no guarantees to emanate from the proverbial training manifold. In a joint project with Matthew McAteer, I’ve created a landscape of susceptibility (See figure above) that should empower machine learners to cover the wide spectrum of specific vulnerabilityvectors with regards to their models.
While there is no silver bullet (and there perhaps will never be — See this & this), it’d be hard to argue against incorporating OODmodel regularization and OODdetection modules into your pipelines.
With regards to OODdetection, I felt that there were 3 recent efforts that have gone underadopted by the ML community.
The first two, PyOD and HyPPO, would be useful for prefiltering inputs before performing inference and the third, Gradio, is an amazing tool for humanintheloop whitehat stress testing and complementary to efforts such as Dynabench by FAIR.
1: PYOD
PyOD is arguably the most comprehensive and scalable Outlier Detection Python toolkit out there that includes implementation of more than 30 detection algorithms!
It is somewhat rare for a studentmaintained PyPi package to incorporate software engineering best practices that ensures that model classes implemented are covered by unit testing with cross platform continuous integration, code coverage and code maintainability checks. This combined with a a clean unified API, detailed documentation and justintime (JIT) compiled execution makes it an absolute breeze to both learn about the different techniques and use it in practice. The efforts invested by the authors towards careful parallelization has resulted in extremely fast and scalable outlier detection code that is also seamlessly compatible across Python 2 and 3 across major operating systems (Windows, Linux and MacOS).
In the example cell below, we train and visualize the results of two inlieroutlier detector binary classifiers on a synthetic dataset: the AngleBased Outlier Detector (ABOD)and the KNN outlier detector.
from pyod.models.abod import ABOD
from pyod.models.knn import KNN # kNN detector
from pyod.utils.data import generate_data
from pyod.utils.data import evaluate_print
from pyod.utils.example import visualize
# Generate sample data with pyod.utils.data.generate_data():contamination = 0.4 # percentage of outliers
n_train = 200 # number of training points
n_test = 100 # number of testing pointsX_train_ood, y_train_ood, X_test_ood, y_test_ood = generate_data(n_train=n_train, n_test=n_test, contamination=contamination)
##### 1: ABOD
clf_name_1 = 'ABOD'
clf_abod = ABOD(method="fast") # initialize detector
clf_abod.fit(X_train_ood)y_train_pred_abod = clf_abod.predict(X_train_ood) # binary labels
y_test_pred_abod = clf_abod.predict(X_test_ood) # binary labelsy_test_scores_abod = clf_abod.decision_function(X_test_ood) # raw outlier scores
y_test_proba_abod = clf_abod.predict_proba(X_test_ood) # outlier probabilityevaluate_print("ABOD", y_test_ood, y_test_scores_abod) # performance evaluation####### 2 : KNN
clf_knn = KNN() # initialize detector
clf_knn.fit(X_train_ood)y_train_pred_knn = clf_knn.predict(X_train_ood) # binary labels
y_test_pred_knn = clf_knn.predict(X_test_ood) # binary labelsy_test_scores_knn = clf_knn.decision_function(X_test_ood) # raw outlier scores
y_test_proba_knn = clf_knn.predict_proba(X_test_ood) # outlier probabilityevaluate_print("KNN", y_test_ood, y_test_scores_knn) # performance evaluationABOD ROC:0.9992, precision @ rank n:0.975
KNN ROC:1.0, precision @ rank n:1.0
Now, let’s visualize the results:
# ABOD Performance
visualize("ABOD", X_train_ood, y_train_ood, X_test_ood, y_test_ood, y_train_pred_abod,
y_test_pred_abod, show_figure=True, save_figure=False)
# KNN Performance;
visualize("KNN", X_train_ood, y_train_ood, X_test_ood, y_test_ood, y_train_pred_knn,
y_test_pred_knn, show_figure=True, save_figure=False)
2: Hyppo:
It is somewhat bewildering to witness this collective amnesia on part of the Deep Learning community that keeps treating OOD susceptibility as a uniquely ‘deep neural networks’ shortcoming that somehow merits a deeplearningsolution whilst completing ignoring the cache of approaches and solutions already explored by the statistics community.
One could argue that OODdetection by it’s very definition falls under the ambit of the multivariate hypothesis testing framework, and hence it is frustrating to see deep learning OOD papers not even benchmark the results obtained by their shiny new deepapproaches with what could be possible legacy hypothesis testing algorithms. With this setting, we now introduce HYPPO.
HYPPO (HYPothesis Testing in PythOn, pronounced “Hippo”) is arguably the most comprehensive opensource software package for multivariate hypothesis testing produced by the NEURODATA community. In the figure below, we see the landscape of modules implemnetd in this package that spans synthetic data generation (with 20 dependency structures!), Independence Tests, Ksample Tests as well as TimeSeries Tests.
In the example cell below, we see the KSampleDistance Correlation(Or “Dcorr”) test being used to hypothesis test between the inandout distributed data generated by the generate_data() module in PyOD above. In a deeplearning setting, we could deploy these tests both at the input layer level or in the featureembedding space to guesstimate if the output softmax values are even worthy of being processed further down the inference pipeline.
from hyppo.ksample import KSample
samp_in_train= X_train_ood[y_train_ood==0]
samp_out_train= X_train_ood[y_train_ood==1]
samp_in_test= X_test_ood[y_test_ood==0]
samp_out_test= X_test_ood[y_test_ood==1]
stat_in_out, pvalue_in_out = KSample("Dcorr").test(samp_in_train, samp_out_test)
print(f'Intrain v/s Outtest \n Energy test statistic: {stat_in_out}. Energy pvalue: {pvalue_in_out}')
stat_out_in, pvalue_out_in = KSample("Dcorr").test(samp_in_test, samp_out_train)
print(f'Intest v/s Outtrain \n Energy test statistic: {stat_out_in}. Energy pvalue: {pvalue_out_in}')
stat_in_in, pvalue_in_in = KSample("Dcorr").test(samp_in_train, samp_in_test)
print(f'Intrain v/s Intest \n Energy test statistic: {stat_in_in}. Energy pvalue: {pvalue_in_in}')
stat_out_out, pvalue_out_out = KSample("Dcorr").test(samp_out_train, samp_out_test)
print(f'Outtrain v/s Outtest \n Energy test statistic: {stat_out_out}. Energy pvalue: {pvalue_out_out}')Intrain v/s Outtest
Energy test statistic: 0.8626341445137959. Energy pvalue: 4.357148137679374e32
Intest v/s Outtrain
Energy test statistic: 0.7584832208162725. Energy pvalue: 4.0495216242247524e25
Intrain v/s Intest
Energy test statistic: 0.005691336487203311. Energy pvalue: 1.0
Outtrain v/s Outtest
Energy test statistic: 0.006631965940452427. Energy pvalue: 0.18021672902891694
3: Gradio:
Having a nice GUI to interact with the model you have just trained has thus far required a fair amount of JavaScriptfrontend gimmickry or the HerokuFlask route that can take focus away from the algorithmics.
Thanks to Gradio, one can can quickly fire up a gui with <10 lines of Python with prebuilt input modules that cover textual input, imageinputs with an awesome ToastUI imageeditor and a sketchpad to boot as well!
This past year, I have heavily used Gradio in my workflow, using it to investigate why Twitter’s saliency cropping algorithm yields such racist results (See Figure to the left) to why Onions were triggering NSFW filters on facebook (See tweet below) .
The NSFWOnion fiasco. Colab notebook: https://github.com/vinayprabhu/Crimes_of_Vision_Datasets/blob/master/Notebooks/Notebook_5b_Onion_Gradio_NSFW.ipynb
In the example cell below, we demonstrate two simple examples of using Gradio to fireup UIs to stresstest the MNIST classification BNN model we just trained above with a sketchpad input and to demonstrate the ease of using the InceptionV3 model to classify images. The Gradio team has also rapidly added explainaibility and embeddingsvisualization tools, and implemented SOTA blind super resolution and RealTime HighResolution Background Matting UIs as well!
import gradio as gr
import requests
inception_net = tf.keras.applications.InceptionV3() # load the model
# Download humanreadable labels for ImageNet.
response = requests.get("https://git.io/JJkYN")
labels = response.text.split("\n")
def classify_image(inp):
print(inp.shape)
inp = inp.reshape((1, 299, 299, 3))
inp = tf.keras.applications.inception_v3.preprocess_input(inp)
prediction = inception_net.predict(inp).flatten()
return {labels[i]: float(prediction[i]) for i in range(1000)}
image = gr.inputs.Image(shape=(299, 299, 3))
label = gr.outputs.Label(num_top_classes=3)
gr.Interface(fn=classify_image, inputs=image, outputs=label, capture_session=True).launch()import gradio as gr
import requests
# EXAMPLE1:We use the LARQ trained BNN to launch an interactive UI that facilitates a sktechpad inoput and prediction
def classify(image):
print(image.shape)
prediction = model_bnn.predict(image.reshape((1,28,28,1))).tolist()[0]
return {str(i): prediction[i] for i in range(10)}
sketchpad = gr.inputs.Sketchpad()
label = gr.outputs.Label(num_top_classes=3)
gr.Interface(fn=classify, inputs=sketchpad, outputs=label, capture_session=True).launch()# EXAMPLE2:Imageclassifcation with InceptionNetV3inception_net = tf.keras.applications.InceptionV3() # load the model
# Download humanreadable labels for ImageNet.
response = requests.get("https://git.io/JJkYN")
labels = response.text.split("\n")
def classify_image(inp):
print(inp.shape)
inp = inp.reshape((1, 299, 299, 3))
inp = tf.keras.applications.inception_v3.preprocess_input(inp)
prediction = inception_net.predict(inp).flatten()
return {labels[i]: float(prediction[i]) for i in range(1000)}
image = gr.inputs.Image(shape=(299, 299))
label = gr.outputs.Label(num_top_classes=3)
gr.Interface(fn=classify_image, inputs=image, outputs=label, capture_session=True).launch()
D) Documentation / dissemination:
1) Jupyter_to_medium:
Video tutorial of the Jupyter_to_medium package
Last but not the least, I used the Jupyter_to_medium PyPi package to author this very blogpost from it’s source notebook! As many of you might have experienced, converting your jupyter/colab notebook into a readable blogpost involves painful copypasting antics, code snapshotting and plugin gimmicks. All of this is a thing of past after the release of this gamechanging package.
The procedure is supersimple: pip install, finish the notebook, choose File →’Deploy as’, insert integration token from medium, and perform final edits/prettification if necessary.
On the concluding note, I’d like to thank the incredible researchers and engineers who created these wonderful PyPi packages . In the forthcoming blogpost(s), I plan to cover packages pertaining to specific topics such as timeseries analysis and dimensionality reduction. Here’s a recap picture that summarizes the packages explored above.
Hope some of you will find this blogpost useful in your ML adventures. Good luck and wish y’all a happy and productive 2021 🙌!
Feel free to leave feedback regarding the content /errata/ broken links. You may connect with me via Linkedin or Twitter as well 🙏
Bio: Vinay Uday Prabhu is Chief Scientist at UnifyID Inc.
Original. Reposted with permission.
Related:
 Data Science as a Product – Why Is It So Hard?
 Generating Beautiful Neural Network Visualizations
 Fast and Intuitive Statistical Modeling with Pomegranate
Top Stories Past 30 Days

