Need help with interpretable-research?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

liupeng3425
127 Stars 25 Forks 17 Commits 0 Opened issues

#### Description

I collected some papers about interpretable CNN and reorganized them here.

!
?

# 149,858
deep-ne...
CSS
HTML
Robotic...
16 commits

# Papers

## Interpretable Policies for Reinforcement Learning by Genetic Programming

(PDF)

Authors:Daniel Hein, Steffen Udluft, Thomas A. Runkler

Subjects:

Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)

Cite as:

arXiv:1712.04170 [cs.AI]

(or arXiv:1712.04170v1 [cs.AI] for this version)

Abstract: The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on model-based batch reinforcement learning and genetic programming, which autonomously learns policy equations from pre-existing default state-action trajectory samples. GPRL is compared to a straight-forward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing well-performing, but non-interpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing well-performing interpretable reinforcement learning policies from pre-existing default trajectory data.

## Discovery Radiomics with CLEAR-DR: Interpretable Computer Aided Diagnosis of Diabetic Retinopathy

(PDF)

Authors:Devinder Kumar, Graham W. Taylor, Alexander Wong

Subjects:

Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Cite as:

arXiv:1710.10675 [cs.AI]

(or arXiv:1710.10675v1 [cs.AI] for this version)

## Interpretation of Neural Networks is Fragile

(PDF)

Authors:Amirata Ghorbani, Abubakar Abid, James Zou

Submitted for review at ICLR 2018

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1710.10547 [stat.ML]

(or arXiv:1710.10547v1 [stat.ML] for this version)

Abstract: In order for machine learning to be deployed and trusted in many applications, it is crucial to be able to reliably explain why the machine learning algorithm makes certain predictions. For example, if an algorithm classifies a given pathology image to be a malignant tumor, then the doctor may need to know which parts of the image led the algorithm to this classification. How to interpret black-box predictors is thus an important and active area of research. A fundamental question is: how much can we trust the interpretation itself? In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different interpretations. We systematically characterize the fragility of several widely-used feature-importance interpretation methods (saliency maps, relevance propagation, and DeepLIFT) on ImageNet and CIFAR-10. Our experiments show that even small random perturbation can change the feature importance and new systematic perturbations can lead to dramatically different interpretations without changing the label. We extend these results to show that interpretations based on exemplars (e.g. influence functions) are similarly fragile. Our analysis of the geometry of the Hessian matrix gives insight on why fragility could be a fundamental challenge to the current interpretation approaches.

## Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo

(PDF)

Authors:Mitsuru Igami

Subjects:

Econometrics (econ.EM); Artificial Intelligence (cs.AI); Learning (cs.LG)

Cite as:

arXiv:1710.10967 [econ.EM]

(or arXiv:1710.10967v2 [econ.EM] for this version)

Abstract: Artificial intelligence (AI) has achieved superhuman performance in a growing number of tasks, including the classical games of chess, shogi, and Go, but understanding and explaining AI remain challenging. This paper studies the machine-learning algorithms for developing the game AIs, and provides their structural interpretations. Specifically, chess-playing Deep Blue is a calibrated value function, whereas shogi-playing Bonanza represents an estimated value function via Rust's (1987) nested fixed-point method. AlphaGo's "supervised-learning policy network" is a deep neural network (DNN) version of Hotz and Miller's (1993) conditional choice probability estimates; its "reinforcement-learning value network" is equivalent to Hotz, Miller, Sanders, and Smith's (1994) simulation method for estimating the value function. Their performances suggest DNNs are a useful functional form when the state space is large and data are sparse. Explicitly incorporating strategic interactions and unobserved heterogeneity in the data-generating process would further improve AIs' explicability.

## Contextual Regression: An Accurate and Conveniently Interpretable Nonlinear Model for Mining Discovery from Scientific Data

(PDF)

Authors:Chengyu Liu, Wei Wang

18 pages of Main Article, 30 pages of Supplementary Material

Subjects:

Quantitative Methods (q-bio.QM); Learning (cs.LG); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)

Cite as:

arXiv:1710.10728 [q-bio.QM]

(or arXiv:1710.10728v1 [q-bio.QM] for this version)

Abstract: Machine learning algorithms such as linear regression, SVM and neural network have played an increasingly important role in the process of scientific discovery. However, none of them is both interpretable and accurate on nonlinear datasets. Here we present contextual regression, a method that joins these two desirable properties together using a hybrid architecture of neural network embedding and dot product layer. We demonstrate its high prediction accuracy and sensitivity through the task of predictive feature selection on a simulated dataset and the application of predicting open chromatin sites in the human genome. On the simulated data, our method achieved high fidelity recovery of feature contributions under random noise levels up to 200%. On the open chromatin dataset, the application of our method not only outperformed the state of the art method in terms of accuracy, but also unveiled two previously unfound open chromatin related histone marks. Our method can fill the blank of accurate and interpretable nonlinear modeling in scientific data mining tasks.

## Building Data-driven Models with Microstructural Images: Generalization and Interpretability

(PDF)

Authors:Julia Ling, Maxwell Hutchinson, Erin Antono, Brian DeCost, Elizabeth A. Holm, Bryce Meredig

Subjects:

Artificial Intelligence (cs.AI); Materials Science (cond-mat.mtrl-sci)

Cite as:

arXiv:1711.00404 [cs.AI]

(or arXiv:1711.00404v1 [cs.AI] for this version)

Abstract: As data-driven methods rise in popularity in materials science applications, a key question is how these machine learning models can be used to understand microstructure. Given the importance of process-structure-property relations throughout materials science, it seems logical that models that can leverage microstructural data would be more capable of predicting property information. While there have been some recent attempts to use convolutional neural networks to understand microstructural images, these early studies have focused only on which featurizations yield the highest machine learning model accuracy for a single data set. This paper explores the use of convolutional neural networks for classifying microstructure with a more holistic set of objectives in mind: generalization between data sets, number of features required, and interpretability.

## Interpretable Feature Recommendation for Signal Analytics

(PDF)

Authors:Snehasis Banerjee, Tanushyam Chattopadhyay, Ayan Mukherjee

4 pages, Interpretable Data Mining Workshop, CIKM 2017

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1711.01870 [stat.ML]

(or arXiv:1711.01870v1 [stat.ML] for this version)

Abstract: This paper presents an automated approach for interpretable feature recommendation for solving signal data analytics problems. The method has been tested by performing experiments on datasets in the domain of prognostics where interpretation of features is considered very important. The proposed approach is based on Wide Learning architecture and provides means for interpretation of the recommended features. It is to be noted that such an interpretation is not available with feature learning approaches like Deep Learning (such as Convolutional Neural Network) or feature transformation approaches like Principal Component Analysis. Results show that the feature recommendation and interpretation techniques are quite effective for the problems at hand in terms of performance and drastic reduction in time to develop a solution. It is further shown by an example, how this human-in-loop interpretation system can be used as a prescriptive system.

## Semantic Structure and Interpretability of Word Embeddings

(PDF)

Authors:Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, Aykut Koc, Tolga Cukur

10 Pages, 7 Figures

Subjects:

Computation and Language (cs.CL)

Cite as:

arXiv:1711.00331 [cs.CL]

(or arXiv:1711.00331v2 [cs.CL] for this version)

Abstract: Dense word embeddings, which encode semantic meanings of words to low dimensional vector spaces have become very popular in natural language processing (NLP) research due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensions, which makes interpretation a big challenge. In this study, we propose a statistical method to uncover the latent semantic structure in the dense word embeddings. To perform our analysis we introduce a new dataset (SEMCAT) that contains more than 6500 words semantically grouped under 110 categories. We further propose a method to quantify the interpretability of the word embeddings; the proposed method is a practical alternative to the classical word intrusion test that requires human intervention.

## Interpretable and Pedagogical Examples

(PDF)

Authors:Smitha Milli, Pieter Abbeel, Igor Mordatch

Subjects:

Artificial Intelligence (cs.AI)

Cite as:

arXiv:1711.00694 [cs.AI]

(or arXiv:1711.00694v1 [cs.AI] for this version)

Abstract: Teachers intentionally pick the most informative examples to show their students. However, if the teacher and student are neural networks, the examples that the teacher network learns to give, although effective at teaching the student, are typically uninterpretable. We show that training the student and teacher iteratively, rather than jointly, can produce interpretable teaching strategies. We evaluate interpretability by (1) measuring the similarity of the teacher's emergent strategies to intuitive strategies in each domain and (2) conducting human experiments to evaluate how effective the teacher's strategies are at teaching humans. We show that the teacher network learns to select or generate interpretable, pedagogical examples to teach rule-based, probabilistic, boolean, and hierarchical concepts.

## Unsupervised patient representations from clinical notes with interpretable classification decisions

(PDF)

Authors:Madhumita Sushil, Simon Šuster, Kim Luyckx, Walter Daelemans

Accepted poster at NIPS 2017 Workshop on Machine Learning for Health (this https URL)

Subjects:

Computation and Language (cs.CL)

Cite as:

arXiv:1711.05198 [cs.CL]

(or arXiv:1711.05198v1 [cs.CL] for this version)

Abstract: We have two main contributions in this work: 1. We explore the usage of a stacked denoising autoencoder, and a paragraph vector model to learn task-independent dense patient representations directly from clinical notes. We evaluate these representations by using them as features in multiple supervised setups, and compare their performance with those of sparse representations. 2. To understand and interpret the representations, we explore the best encoded features within the patient representations obtained from the autoencoder model. Further, we calculate the significance of the input features of the trained classifiers when we use these pretrained representations as input.

## Interpreting Convolutional Neural Networks Through Compression

(PDF)

Authors:Reza Abbasi-Asl, Bin Yu

Presented at NIPS 2017 Symposium on Interpretable Machine Learning

Subjects:

Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Cite as:

arXiv:1711.02329 [stat.ML]

(or arXiv:1711.02329v1 [stat.ML] for this version)

Abstract: Convolutional neural networks (CNNs) achieve state-of-the-art performance in a wide variety of tasks in computer vision. However, interpreting CNNs still remains a challenge. This is mainly due to the large number of parameters in these networks. Here, we investigate the role of compression and particularly pruning filters in the interpretation of CNNs. We exploit our recently-proposed greedy structural compression scheme that prunes filters in a trained CNN. In our compression, the filter importance index is defined as the classification accuracy reduction (CAR) of the network after pruning that filter. The filters are then iteratively pruned based on the CAR index. We demonstrate the interpretability of CAR-compressed CNNs by showing that our algorithm prunes filters with visually redundant pattern selectivity. Specifically, we show the importance of shape-selective filters for object recognition, as opposed to color-selective filters. Out of top 20 CAR-pruned filters in AlexNet, 17 of them in the first layer and 14 of them in the second layer are color-selective filters. Finally, we introduce a variant of our CAR importance index that quantifies the importance of each image class to each CNN filter. We show that the most and the least important class labels present a meaningful interpretation of each filter that is consistent with the visualized pattern selectivity of that filter.

## Interpretable probabilistic embeddings: bridging the gap between topic models and neural networks

(PDF)

Authors:Anna Potapenko, Artem Popov, Konstantin Vorontsov

Appeared in AINL-2017

Subjects:

Computation and Language (cs.CL)

Cite as:

arXiv:1711.04154 [cs.CL]

(or arXiv:1711.04154v1 [cs.CL] for this version)

Abstract: We consider probabilistic topic models and more recent word embedding techniques from a perspective of learning hidden semantic representations. Inspired by a striking similarity of the two approaches, we merge them and learn probabilistic embeddings with online EM-algorithm on word co-occurrence data. The resulting embeddings perform on par with Skip-Gram Negative Sampling (SGNS) on word similarity tasks and benefit in the interpretability of the components. Next, we learn probabilistic document embeddings that outperform paragraph2vec on a document similarity task and require less memory and time for training. Finally, we employ multimodal Additive Regularization of Topic Models (ARTM) to obtain a high sparsity and learn embeddings for other modalities, such as timestamps and categories. We observe further improvement of word similarity performance and meaningful inter-modality similarities.

## Arrhythmia Classification from the Abductive Interpretation of Short Single-Lead ECG Records

(PDF)

Authors:Tomás Teijeiro, Constantino A. García, Daniel Castro, Paulo Félix

4 pages, 3 figures. Presented in the Computing in Cardiology 2017 conference

Subjects:

Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

MSC classes:

68T10

Cite as:

arXiv:1711.03892 [cs.AI]

(or arXiv:1711.03892v1 [cs.AI] for this version)

Abstract: In this work we propose a new method for the rhythm classification of short single-lead ECG records, using a set of high-level and clinically meaningful features provided by the abductive interpretation of the records. These features include morphological and rhythm-related features that are used to build two classifiers: one that evaluates the record globally, using aggregated values for each feature; and another one that evaluates the record as a sequence, using a Recurrent Neural Network fed with the individual features for each detected heartbeat. The two classifiers are finally combined using the stacking technique, providing an answer by means of four target classes: Normal sinus rhythm, Atrial fibrillation, Other anomaly, and Noisy. The approach has been validated against the 2017 Physionet/CinC Challenge dataset, obtaining a final score of 0.83 and ranking first in the competition.

## Interpretable R-CNN

(PDF)

Authors:Tianfu Wu, Xilai Li, Xi Song, Wei Sun, Liang Dong, Bo Li

13 pages

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1711.05226 [cs.CV]

(or arXiv:1711.05226v1 [cs.CV] for this version)

Abstract: This paper presents a method of learning qualitatively interpretable models in object detection using popular two-stage region-based ConvNet detection systems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI (Region-of-Interest) prediction network.By interpretable models, we focus on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. We utilize a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of RoIs. We propose an AOGParsing operator to substitute the RoIPooling operator widely used in R-CNN, so the proposed method is applicable to many state-of-the-art ConvNet based detection systems. The AOGParsing operator aims to harness both the explainable rigor of top-down hierarchical and compositional grammar models and the discriminative power of bottom-up deep neural networks through end-to-end training. In detection, a bounding box is interpreted by the best parse tree derived from the AOG on-the-fly, which is treated as the extractive rationale generated for interpreting detection. In learning, we propose a folding-unfolding method to train the AOG and ConvNet end-to-end. In experiments, we build on top of the R-FCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets with performance comparable to state-of-the-art methods.

## Interpreting Deep Visual Representations via Network Dissection

(PDF)

Authors:Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba

*B. Zhou and D. Bau contributed equally to this work. 15 pages, 27 figures

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

ACM classes:

I.2.10

Cite as:

arXiv:1711.05611 [cs.CV]

(or arXiv:1711.05611v1 [cs.CV] for this version)

Abstract: The success of recent deep convolutional neural networks (CNNs) depends on learning hidden representations that can summarize the important factors of variation behind the data. However, CNNs often criticized as being black boxes that lack interpretability, since they have millions of unexplained model parameters. In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations. The proposed method quantifies the interpretability of CNN representations by evaluating the alignment between individual hidden units and a set of visual semantic concepts. By identifying the best alignments, units are given human interpretable labels across a range of objects, parts, scenes, textures, materials, and colors. The method reveals that deep representations are more transparent and interpretable than expected: we find that representations are significantly more interpretable than they would be under a random equivalently powerful basis. We apply the method to interpret and compare the latent representations of various network architectures trained to solve different supervised and self-supervised training tasks. We then examine factors affecting the network interpretability such as the number of the training iterations, regularizations, different initializations, and the network depth and width. Finally we show that the interpreted units can be used to provide explicit explanations of a prediction given by a CNN for an image. Our results highlight that interpretability is an important property of deep neural networks that provides new insights into their hierarchical structure.

## Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

(PDF)

Authors:Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, David Madigan

Published at this http URL in the Annals of Applied Statistics (this http URL) by the Institute of Mathematical Statistics (this http URL)

Subjects:

Applications (stat.AP); Learning (cs.LG); Machine Learning (stat.ML)

Journal reference:

Annals of Applied Statistics 2015, Vol. 9, No. 3, 1350-1371

DOI:

10.1214/15-AOAS848

Report number:

IMS-AOAS-AOAS848

Cite as:

arXiv:1511.01644 [stat.AP]

(or arXiv:1511.01644v1 [stat.AP] for this version)

Abstract: We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if...then... statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS$2$ score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS$2$, but more accurate.

## Interpretable Deep Neural Networks for Single-Trial EEG Classification

(PDF)

Authors:Irene Sturm, Sebastian Bach, Wojciech Samek, Klaus-Robert Müller

5 pages, 1 figure

Subjects:

Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Cite as:

arXiv:1604.08201 [cs.NE]

(or arXiv:1604.08201v1 [cs.NE] for this version)

Abstract: Background: In cognitive neuroscience the potential of Deep Neural Networks (DNNs) for solving complex classification tasks is yet to be fully exploited. The most limiting factor is that DNNs as notorious 'black boxes' do not provide insight into neurophysiological phenomena underlying a decision. Layer-wise Relevance Propagation (LRP) has been introduced as a novel method to explain individual network decisions. New Method: We propose the application of DNNs with LRP for the first time for EEG data analysis. Through LRP the single-trial DNN decisions are transformed into heatmaps indicating each data point's relevance for the outcome of the decision. Results: DNN achieves classification accuracies comparable to those of CSP-LDA. In subjects with low performance subject-to-subject transfer of trained DNNs can improve the results. The single-trial LRP heatmaps reveal neurophysiologically plausible patterns, resembling CSP-derived scalp maps. Critically, while CSP patterns represent class-wise aggregated information, LRP heatmaps pinpoint neural patterns to single time points in single trials. Comparison with Existing Method(s): We compare the classification performance of DNNs to that of linear CSP-LDA on two data sets related to motor-imaginery BCI. Conclusion: We have demonstrated that DNN is a powerful non-linear tool for EEG analysis. With LRP a new quality of high-resolution assessment of neural activity can be reached. LRP is a potential remedy for the lack of interpretability of DNNs that has limited their utility in neuroscientific applications. The extreme specificity of the LRP-derived heatmaps opens up new avenues for investigating neural activity underlying complex perception or decision-related processes.

## InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

(PDF)

Authors:Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel

Subjects:

Learning (cs.LG); Machine Learning (stat.ML)

Cite as:

arXiv:1606.03657 [cs.LG]

(or arXiv:1606.03657v1 [cs.LG] for this version)

Abstract: This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

## The Mythos of Model Interpretability

(PDF)

Authors:Zachary C. Lipton

presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Cite as:

arXiv:1606.03490 [cs.LG]

(or arXiv:1606.03490v3 [cs.LG] for this version)

Abstract: Supervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet the task of interpretation appears underspecified. Papers provide diverse and sometimes non-overlapping motivations for interpretability, and offer myriad notions of what attributes render models interpretable. Despite this ambiguity, many papers proclaim interpretability axiomatically, absent further explanation. In this paper, we seek to refine the discourse on interpretability. First, we examine the motivations underlying interest in interpretability, finding them to be diverse and occasionally discordant. Then, we address model properties and techniques thought to confer interpretability, identifying transparency to humans and post-hoc explanations as competing notions. Throughout, we discuss the feasibility and desirability of different notions, and question the oft-made assertions that linear models are interpretable and that deep neural networks are not.

## Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models

(PDF)

Authors:Viktoriya Krakovna, Finale Doshi-Velez

presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

Cite as:

arXiv:1606.05320 [stat.ML]

(or arXiv:1606.05320v2 [stat.ML] for this version)

Abstract: As deep neural networks continue to revolutionize various application domains, there is increasing interest in making these powerful models more understandable and interpretable, and narrowing down the causes of good and bad predictions. We focus on recurrent neural networks (RNNs), state of the art models in speech recognition and translation. Our approach to increasing interpretability is by combining an RNN with a hidden Markov model (HMM), a simpler and more transparent model. We explore various combinations of RNNs and HMMs: an HMM trained on LSTM states; a hybrid model where an HMM is trained first, then a small LSTM is given HMM state distributions and trained to fill in gaps in the HMM's performance; and a jointly trained hybrid model. We find that the LSTM and HMM learn complementary information about the features in the text.

## Model-Agnostic Interpretability of Machine Learning

(PDF)

Authors:Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin

presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1606.05386 [stat.ML]

(or arXiv:1606.05386v1 [stat.ML] for this version)

Abstract: Understanding why machine learning models behave the way they do empowers both system designers and end-users in many ways: in model selection, feature engineering, in order to trust and act upon the predictions, and in more intuitive user interfaces. Thus, interpretability has become a vital concern in machine learning, and work in the area of interpretable models has found renewed interest. In some applications, such models are as accurate as non-interpretable ones, and thus are preferred for their transparency. Even when they are not accurate, they may still be preferred when interpretability is of paramount importance. However, restricting machine learning to interpretable models is often a severe limitation. In this paper we argue for explaining machine learning predictions using model-agnostic approaches. By treating the machine learning models as black-box functions, these approaches provide crucial flexibility in the choice of models, explanations, and representations, improving debugging, comparison, and interfaces for a variety of users and models. We also outline the main challenges for such methods, and review a recently-introduced model-agnostic explanation approach (LIME) that addresses these challenges.

## Learning Interpretable Musical Compositional Rules and Traces

(PDF)

Authors:Haizi Yu, Lav R. Varshney, Guy E. Garnett, Ranjitha Kumar

presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1606.05572 [stat.ML]

(or arXiv:1606.05572v1 [stat.ML] for this version)

Abstract: Throughout music history, theorists have identified and documented interpretable rules that capture the decisions of composers. This paper asks, "Can a machine behave like a music theorist?" It presents MUS-ROVER, a self-learning system for automatically discovering rules from symbolic music. MUS-ROVER performs feature learning via $n$-gram models to extract compositional rules --- statistical patterns over the resulting features. We evaluate MUS-ROVER on Bach's (SATB) chorales, demonstrating that it can recover known rules, as well as identify new, characteristic patterns for further study. We discuss how the extracted rules can be used in both machine and human composition.

## Building an Interpretable Recommender via Loss-Preserving Transformation

(PDF)

Authors:Amit Dhurandhar, Sechan Oh, Marek Petrik

Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1606.05819 [stat.ML]

(or arXiv:1606.05819v1 [stat.ML] for this version)

Abstract: We propose a method for building an interpretable recommender system for personalizing online content and promotions. Historical data available for the system consists of customer features, provided content (promotions), and user responses. Unlike in a standard multi-class classification setting, misclassification costs depend on both recommended actions and customers. Our method transforms such a data set to a new set which can be used with standard interpretable multi-class classification algorithms. The transformation has the desirable property that minimizing the standard misclassification penalty in this new space is equivalent to minimizing the custom cost function.

## Using Visual Analytics to Interpret Predictive Machine Learning Models

(PDF)

Authors:Josua Krause, Adam Perer, Enrico Bertini

presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1606.05685 [stat.ML]

(or arXiv:1606.05685v2 [stat.ML] for this version)

Abstract: It is commonly believed that increasing the interpretability of a machine learning model may decrease its predictive power. However, inspecting input-output relationships of those models using visual analytics, while treating them as black-box, can help to understand the reasoning behind outcomes without sacrificing predictive quality. We identify a space of possible solutions and provide two examples of where such techniques have been successfully used in practice.

## Interpretable Machine Learning Models for the Digital Clock Drawing Test

(PDF)

Authors:William Souillard-Mandar, Randall Davis, Cynthia Rudin, Rhoda Au, Dana Penney

Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1606.07163 [stat.ML]

(or arXiv:1606.07163v1 [stat.ML] for this version)

Abstract: The Clock Drawing Test (CDT) is a rapid, inexpensive, and popular neuropsychological screening tool for cognitive conditions. The Digital Clock Drawing Test (dCDT) uses novel software to analyze data from a digitizing ballpoint pen that reports its position with considerable spatial and temporal precision, making possible the analysis of both the drawing process and final product. We developed methodology to analyze pen stroke data from these drawings, and computed a large collection of features which were then analyzed with a variety of machine learning techniques. The resulting scoring systems were designed to be more accurate than the systems currently used by clinicians, but just as interpretable and easy to use. The systems also allow us to quantify the tradeoff between accuracy and interpretability. We created automated versions of the CDT scoring systems currently used by clinicians, allowing us to benchmark our models, which indicated that our machine learning models substantially outperformed the existing scoring systems.

## SnapToGrid: From Statistical to Interpretable Models for Biomedical Information Extraction

(PDF)

Authors:Marco A. Valenzuela-Escarcega, Gus Hahn-Powell, Dane Bell, Mihai Surdeanu

Subjects:

Computation and Language (cs.CL)

Cite as:

arXiv:1606.09604 [cs.CL]

(or arXiv:1606.09604v1 [cs.CL] for this version)

Abstract: We propose an approach for biomedical information extraction that marries the advantages of machine learning models, e.g., learning directly from data, with the benefits of rule-based approaches, e.g., interpretability. Our approach starts by training a feature-based statistical model, then converts this model to a rule-based variant by converting its features to rules, and "snapping to grid" the feature weights to discrete votes. In doing so, our proposal takes advantage of the large body of work in machine learning, but it produces an interpretable model, which can be directly edited by experts. We evaluate our approach on the BioNLP 2009 event extraction task. Our results show that there is a small performance penalty when converting the statistical model to rules, but the gain in interpretability compensates for that: with minimal effort, human experts improve this model to have similar performance to the statistical model that served as starting point.

## Meaningful Models: Utilizing Conceptual Structure to Improve Machine Learning Interpretability

(PDF)

Authors:Nick Condry

5 pages, 3 figures, presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

Cite as:

arXiv:1607.00279 [stat.ML]

(or arXiv:1607.00279v1 [stat.ML] for this version)

Abstract: The last decade has seen huge progress in the development of advanced machine learning models; however, those models are powerless unless human users can interpret them. Here we show how the mind's construction of concepts and meaning can be used to create more interpretable machine learning models. By proposing a novel method of classifying concepts, in terms of 'form' and 'function', we elucidate the nature of meaning and offer proposals to improve model understandability. As machine learning begins to permeate daily life, interpretable models may serve as a bridge between domain-expert authors and non-expert users.

## RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism

(PDF)

Authors:Edward Choi, Mohammad Taha Bahadori, Joshua A. Kulas, Andy Schuetz, Walter F. Stewart, Jimeng Sun

Accepted at Neural Information Processing Systems (NIPS) 2016

Subjects:

Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Cite as:

arXiv:1608.05745 [cs.LG]

(or arXiv:1608.05745v4 [cs.LG] for this version)

Abstract: Accuracy and interpretability are two dominant features of successful predictive models. Typically, a choice must be made in favor of complex black box models such as recurrent neural networks (RNN) for accuracy versus less accurate but more interpretable traditional models such as logistic regression. This tradeoff poses challenges in medicine where both accuracy and interpretability are important. We addressed this challenge by developing the REverse Time AttentIoN model (RETAIN) for application to Electronic Health Records (EHR) data. RETAIN achieves high accuracy while remaining clinically interpretable and is based on a two-level neural attention model that detects influential past visits and significant clinical variables within those visits (e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR data in a reverse time order so that recent clinical visits are likely to receive higher attention. RETAIN was tested on a large health system EHR dataset with 14 million visits completed by 263K patients over an 8 year period and demonstrated predictive accuracy and computational scalability comparable to state-of-the-art methods such as RNN, and ease of interpretability comparable to traditional models.

## Towards Transparent AI Systems: Interpreting Visual Question Answering Models

(PDF)

Authors:Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

Cite as:

arXiv:1608.08974 [cs.CV]

(or arXiv:1608.08974v2 [cs.CV] for this version)

Abstract: Deep neural networks have shown striking progress and obtained state-of-the-art results in many AI research fields in the recent years. However, it is often unsatisfying to not know why they predict what they do. In this paper, we address the problem of interpreting Visual Question Answering (VQA) models. Specifically, we are interested in finding what part of the input (pixels in images or words in questions) the VQA model focuses on while answering the question. To tackle this problem, we use two visualization techniques -- guided backpropagation and occlusion -- to find important words in the question and important regions in the image. We then present qualitative and quantitative analyses of these importance maps. We found that even without explicit attention mechanisms, VQA models may sometimes be implicitly attending to relevant regions in the image, and often to appropriate words in the question.

## Real Time Fine-Grained Categorization with Accuracy and Interpretability

(PDF)

Authors:Shaoli Huang, Dacheng Tao

arXiv admin note: text overlap with arXiv:1512.08086

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1610.00824 [cs.CV]

(or arXiv:1610.00824v1 [cs.CV] for this version)

Abstract: A well-designed fine-grained categorization system usually has three contradictory requirements: accuracy (the ability to identify objects among subordinate categories); interpretability (the ability to provide human-understandable explanation of recognition system behavior); and efficiency (the speed of the system). To handle the trade-off between accuracy and interpretability, we propose a novel "Deeper Part-Stacked CNN" architecture armed with interpretability by modeling subtle differences between object parts. The proposed architecture consists of a part localization network, a two-stream classification network that simultaneously encodes object-level and part-level cues, and a feature vectors fusion component. Specifically, the part localization network is implemented by exploring a new paradigm for key point localization that first samples a small number of representable pixels and then determine their labels via a convolutional layer followed by a softmax layer. We also use a cropping layer to extract part features and propose a scale mean-max layer for feature fusion learning. Experimentally, our proposed method outperform state-of-the-art approaches both in part localization task and classification task on Caltech-UCSD Birds-200-2011. Moreover, by adopting a set of sharing strategies between the computation of multiple object parts, our single model is fairly efficient running at 32 frames/sec.

## Interpreting Neural Networks to Improve Politeness Comprehension

(PDF)

Authors:Malika Aubakirova, Mohit Bansal

To appear at EMNLP 2016

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as:

arXiv:1610.02683 [cs.CL]

(or arXiv:1610.02683v1 [cs.CL] for this version)

Abstract: We present an interpretable neural network approach to predicting and understanding politeness in natural language requests. Our models are based on simple convolutional neural networks directly on raw text, avoiding any manual identification of complex sentiment or syntactic features, while performing better than such feature-based models from previous work. More importantly, we use the challenging task of politeness prediction as a testbed to next present a much-needed understanding of what these successful networks are actually learning. For this, we present several network visualizations based on activation clusters, first derivative saliency, and embedding space transformations, helping us automatically identify several subtle linguistics markers of politeness theories. Further, this analysis reveals multiple novel, high-scoring politeness strategies which, when added back as new features, reduce the accuracy gap between the original featurized system and the neural model, thus providing a clear quantitative interpretation of the success of these neural networks.

## Particle Swarm Optimization for Generating Interpretable Fuzzy Reinforcement Learning Policies

(PDF)

Authors:Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft

Subjects:

Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

Journal reference:

Engineering Applications of Artificial Intelligence, Volume 65C, October 2017, Pages 87-98

DOI:

10.1016/j.engappai.2017.07.005

Cite as:

arXiv:1610.05984 [cs.NE]

(or arXiv:1610.05984v5 [cs.NE] for this version)

Abstract: Fuzzy controllers are efficient and interpretable system controllers for continuous state and action spaces. To date, such controllers have been constructed manually or trained automatically either using expert-generated problem-specific cost functions or incorporating detailed knowledge about the optimal control strategy. Both requirements for automatic training processes are not found in most real-world reinforcement learning (RL) problems. In such applications, online learning is often prohibited for safety reasons because online learning requires exploration of the problem's dynamics during policy training. We introduce a fuzzy particle swarm reinforcement learning (FPSRL) approach that can construct fuzzy RL policies solely by training parameters on world models that simulate real system dynamics. These world models are created by employing an autonomous machine learning technique that uses previously generated transition samples of a real system. To the best of our knowledge, this approach is the first to relate self-organizing fuzzy controllers to model-based batch RL. Therefore, FPSRL is intended to solve problems in domains where online learning is prohibited, system dynamics are relatively easy to model from previously generated default policy transition samples, and it is expected that a relatively easily interpretable control policy exists. The efficiency of the proposed approach with problems from such domains is demonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole balancing, and cart-pole swing-up. Our experimental results demonstrate high-performing, interpretable fuzzy policies.

## Embedding Projector: Interactive Visualization and Interpretation of Embeddings

(PDF)

Authors:Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B. Viégas, Martin Wattenberg

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Human-Computer Interaction (cs.HC)

Cite as:

arXiv:1611.05469 [stat.ML]

(or arXiv:1611.05469v1 [stat.ML] for this version)

Abstract: Embeddings are ubiquitous in machine learning, appearing in recommender systems, NLP, and many other applications. Researchers and developers often need to explore the properties of a specific embedding, and one way to analyze embeddings is to visualize them. We present the Embedding Projector, a tool for interactive visualization and interpretation of embeddings.

## Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning

(PDF)

Authors:Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu

in the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1611.04246 [cs.CV]

(or arXiv:1611.04246v2 [cs.CV] for this version)

Abstract: This paper proposes a learning strategy that extracts object-part concepts from a pre-trained convolutional neural network (CNN), in an attempt to 1) explore explicit semantics hidden in CNN units and 2) gradually grow a semantically interpretable graphical model on the pre-trained CNN for hierarchical object understanding. Given part annotations on very few (e.g., 3-12) objects, our method mines certain latent patterns from the pre-trained CNN and associates them with different semantic parts. We use a four-layer And-Or graph to organize the mined latent patterns, so as to clarify their internal semantic hierarchy. Our method is guided by a small number of part annotations, and it achieves superior performance (about 13%-107% improvement) in part center prediction on the PASCAL VOC and ImageNet datasets.

## Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models

(PDF)

Authors:Viktoriya Krakovna, Finale Doshi-Velez

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems. arXiv admin note: substantial text overlap with arXiv:1606.05320

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1611.05934 [stat.ML]

(or arXiv:1611.05934v1 [stat.ML] for this version)

Abstract: As deep neural networks continue to revolutionize various application domains, there is increasing interest in making these powerful models more understandable and interpretable, and narrowing down the causes of good and bad predictions. We focus on recurrent neural networks, state of the art models in speech recognition and translation. Our approach to increasing interpretability is by combining a long short-term memory (LSTM) model with a hidden Markov model (HMM), a simpler and more transparent model. We add the HMM state probabilities to the output layer of the LSTM, and then train the HMM and LSTM either sequentially or jointly. The LSTM can make use of the information from the HMM, and fill in the gaps when the HMM is not performing well. A small hybrid model usually performs better than a standalone LSTM of the same size, especially on smaller data sets. We test the algorithms on text data and medical time series data, and find that the LSTM and HMM learn complementary information about the features in the text.

## GENESIM: genetic extraction of a single, interpretable model

(PDF)

Authors:Gilles Vandewiele, Olivier Janssens, Femke Ongenae, Filip De Turck, Sofie Van Hoecke

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1611.05722 [stat.ML]

(or arXiv:1611.05722v1 [stat.ML] for this version)

Abstract: Models obtained by decision tree induction techniques excel in being interpretable.However, they can be prone to overfitting, which results in a low predictive performance. Ensemble techniques are able to achieve a higher accuracy. However, this comes at a cost of losing interpretability of the resulting model. This makes ensemble techniques impractical in applications where decision support, instead of decision making, is crucial. To bridge this gap, we present the GENESIM algorithm that transforms an ensemble of decision trees to a single decision tree with an enhanced predictive performance by using a genetic algorithm. We compared GENESIM to prevalent decision tree induction and ensemble techniques using twelve publicly available data sets. The results show that GENESIM achieves a better predictive performance on most of these data sets than decision tree induction techniques and a predictive performance in the same order of magnitude as the ensemble techniques. Moreover, the resulting model of GENESIM has a very low complexity, making it very interpretable, in contrast to ensemble techniques.

## Stratified Knowledge Bases as Interpretable Probabilistic Models (Extended Abstract)

(PDF)

Authors:Ondrej Kuzelka, Jesse Davis, Steven Schockaert

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Artificial Intelligence (cs.AI)

Cite as:

arXiv:1611.06174 [cs.AI]

(or arXiv:1611.06174v1 [cs.AI] for this version)

Abstract: In this paper, we advocate the use of stratified logical theories for representing probabilistic models. We argue that such encodings can be more interpretable than those obtained in existing frameworks such as Markov logic networks. Among others, this allows for the use of domain experts to improve learned models by directly removing, adding, or modifying logical formulas.

## Learning Interpretability for Visualizations using Adapted Cox Models through a User Experiment

(PDF)

Authors:Adrien Bibal, Benoit Frénay

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)

Cite as:

arXiv:1611.06175 [stat.ML]

(or arXiv:1611.06175v1 [stat.ML] for this version)

Abstract: In order to be useful, visualizations need to be interpretable. This paper uses a user-based approach to combine and assess quality measures in order to better model user preferences. Results show that cluster separability measures are outperformed by a neighborhood conservation measure, even though the former are usually considered as intuitively representative of user motives. Moreover, combining measures, as opposed to using a single measure, further improves prediction performances.

## Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable

(PDF)

Authors:Hui Fen Tan, Giles Hooker, Martin T. Wells

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1611.07115 [stat.ML]

(or arXiv:1611.07115v1 [stat.ML] for this version)

Abstract: Ensembles of decision trees have good prediction accuracy but suffer from a lack of interpretability. We propose a new approach for interpreting tree ensembles by finding prototypes in tree space, utilizing the naturally-learned similarity measure from the tree ensemble. Demonstrating the method on random forests, we show that the method benefits from unique aspects of tree ensembles by leveraging tree structure to sequentially find prototypes. The method provides good prediction accuracy when found prototypes are used in nearest-prototype classifiers, while using fewer prototypes than competitor methods. We are investigating the sensitivity of the method to different prototype-finding procedures and demonstrating it on higher-dimensional data.

## Interpreting Finite Automata for Sequential Data

(PDF)

Authors:Christian Albert Hammerschmidt, Sicco Verwer, Qin Lin, Radu State

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

ACM classes:

I.2.6

Cite as:

arXiv:1611.07100 [stat.ML]

(or arXiv:1611.07100v2 [stat.ML] for this version)

Abstract: Automaton models are often seen as interpretable models. Interpretability itself is not well defined: it remains unclear what interpretability means without first explicitly specifying objectives or desired attributes. In this paper, we identify the key properties used to interpret automata and propose a modification of a state-merging approach to learn variants of finite state automata. We apply the approach to problems beyond typical grammar inference tasks. Additionally, we cover several use-cases for prediction, classification, and clustering on sequential data in both supervised and unsupervised scenarios to show how the identified key properties are applicable in a wide range of contexts.

## Inducing Interpretable Representations with Variational Autoencoders

(PDF)

Authors:N. Siddharth, Brooks Paige, Alban Desmaison, Jan-Willem Van de Meent, Frank Wood, Noah D. Goodman, Pushmeet Kohli, Philip H.S. Torr

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Cite as:

arXiv:1611.07492 [stat.ML]

(or arXiv:1611.07492v1 [stat.ML] for this version)

Abstract: We develop a framework for incorporating structured graphical models in the \emph{encoders} of variational autoencoders (VAEs) that allows us to induce interpretable representations through approximate variational inference. This allows us to both perform reasoning (e.g. classification) under the structural constraints of a given graphical model, and use deep generative models to deal with messy, high-dimensional domains where it is often difficult to model all the variation. Learning in this framework is carried out end-to-end with a variational objective, applying to both unsupervised and semi-supervised schemes.

## Interpretation of Prediction Models Using the Input Gradient

(PDF)

Authors:Yotam Hechtlinger

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1611.07634 [stat.ML]

(or arXiv:1611.07634v1 [stat.ML] for this version)

Abstract: State of the art machine learning algorithms are highly optimized to provide the optimal prediction possible, naturally resulting in complex models. While these models often outperform simpler more interpretable models by order of magnitudes, in terms of understanding the way the model functions, we are often facing a "black box". In this paper we suggest a simple method to interpret the behavior of any predictive model, both for regression and classification. Given a particular model, the information required to interpret it can be obtained by studying the partial derivatives of the model with respect to the input. We exemplify this insight by interpreting convolutional and multi-layer neural networks in the field of natural language processing.

## Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery

(PDF)

Authors:Scott Wisdom, Thomas Powers, James Pitton, Les Atlas

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1611.07252 [stat.ML]

(or arXiv:1611.07252v1 [stat.ML] for this version)

Abstract: Recurrent neural networks (RNNs) are powerful and effective for processing sequential data. However, RNNs are usually considered "black box" models whose internal structure and learned parameters are not interpretable. In this paper, we propose an interpretable RNN based on the sequential iterative soft-thresholding algorithm (SISTA) for solving the sequential sparse recovery problem, which models a sequence of correlated observations with a sequence of sparse latent vectors. The architecture of the resulting SISTA-RNN is implicitly defined by the computational structure of SISTA, which results in a novel stacked RNN architecture. Furthermore, the weights of the SISTA-RNN are perfectly interpretable as the parameters of a principled statistical model, which in this case include a sparsifying dictionary, iterative step size, and regularization parameters. In addition, on a particular sequential compressive sensing task, the SISTA-RNN trains faster and achieves better performance than conventional state-of-the-art black box RNNs, including long-short term memory (LSTM) RNNs.

## An unexpected unity among methods for interpreting model predictions

(PDF)

Authors:Scott Lundberg, Su-In Lee

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Artificial Intelligence (cs.AI)

Cite as:

arXiv:1611.07478 [cs.AI]

(or arXiv:1611.07478v3 [cs.AI] for this version)

Abstract: Understanding why a model made a certain prediction is crucial in many data science fields. Interpretable predictions engender appropriate trust and provide insight into how the model may be improved. However, with large modern datasets the best accuracy is often achieved by complex models even experts struggle to interpret, which creates a tension between accuracy and interpretability. Recently, several methods have been proposed for interpreting predictions from complex models by estimating the importance of input features. Here, we present how a model-agnostic additive representation of the importance of input features unifies current methods. This representation is optimal, in the sense that it is the only set of additive values that satisfies important properties. We show how we can leverage these properties to create novel visual explanations of model predictions. The thread of unity that this representation weaves through the literature indicates that there are common principles to be learned about the interpretation of model predictions that apply in many scenarios.

## Input Switched Affine Networks: An RNN Architecture Designed for Interpretability

(PDF)

Authors:Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo

ICLR 2107 submission: this https URL

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Cite as:

arXiv:1611.09434 [cs.AI]

(or arXiv:1611.09434v2 [cs.AI] for this version)

Abstract: There exist many problem domains where the interpretability of neural network models is essential for deployment. Here we introduce a recurrent architecture composed of input-switched affine transformations - in other words an RNN without any explicit nonlinearities, but with input-dependent recurrent weights. This simple form allows the RNN to be analyzed via straightforward linear methods: we can exactly characterize the linear contribution of each input to the model predictions; we can use a change-of-basis to disentangle input, output, and computational hidden unit subspaces; we can fully reverse-engineer the architecture's solution to a simple task. Despite this ease of interpretation, the input switched affine network achieves reasonable performance on a text modeling tasks, and allows greater computational efficiency than networks with standard nonlinearities.

## Large scale modeling of antimicrobial resistance with interpretable classifiers

(PDF)

Authors:Alexandre Drouin, Frédéric Raymond, Gaël Letarte St-Pierre, Mario Marchand, Jacques Corbeil, François Laviolette

Peer-reviewed and accepted for presentation at the Machine Learning for Health Workshop, NIPS 2016, Barcelona, Spain

Subjects:

Genomics (q-bio.GN); Learning (cs.LG); Machine Learning (stat.ML)

Cite as:

arXiv:1612.01030 [q-bio.GN]

(or arXiv:1612.01030v1 [q-bio.GN] for this version)

Abstract: Antimicrobial resistance is an important public health concern that has implications in the practice of medicine worldwide. Accurately predicting resistance phenotypes from genome sequences shows great promise in promoting better use of antimicrobial agents, by determining which antibiotics are likely to be effective in specific clinical cases. In healthcare, this would allow for the design of treatment plans tailored for specific individuals, likely resulting in better clinical outcomes for patients with bacterial infections. In this work, we present the recent work of Drouin et al. (2016) on using Set Covering Machines to learn highly interpretable models of antibiotic resistance and complement it by providing a large scale application of their method to the entire PATRIC database. We report prediction results for 36 new datasets and present the Kover AMR platform, a new web-based tool allowing the visualization and interpretation of the generated models.

## Interpretable Semantic Textual Similarity: Finding and explaining differences between sentences

(PDF)

Authors:I. Lopez-Gazpio, M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, E. Agirre

Preprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2016)

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

DOI:

10.1016/j.knosys.2016.12.013

Cite as:

arXiv:1612.04868 [cs.CL]

(or arXiv:1612.04868v1 [cs.CL] for this version)

Abstract: User acceptance of artificial intelligence agents might depend on their ability to explain their reasoning, which requires adding an interpretability layer that fa- cilitates users to understand their behavior. This paper focuses on adding an in- terpretable layer on top of Semantic Textual Similarity (STS), which measures the degree of semantic equivalence between two sentences. The interpretability layer is formalized as the alignment between pairs of segments across the two sentences, where the relation between the segments is labeled with a relation type and a similarity score. We present a publicly available dataset of sentence pairs annotated following the formalization. We then develop a system trained on this dataset which, given a sentence pair, explains what is similar and different, in the form of graded and typed segment alignments. When evaluated on the dataset, the system performs better than an informed baseline, showing that the dataset and task are well-defined and feasible. Most importantly, two user studies show how the system output can be used to automatically produce explanations in natural language. Users performed better when having access to the explanations, pro- viding preliminary evidence that our dataset and method to automatically produce explanations is useful in real applications.

## Towards a New Interpretation of Separable Convolutions

(PDF)

Authors:Tapabrata Ghosh

Subjects:

Learning (cs.LG); Machine Learning (stat.ML)

Cite as:

arXiv:1701.04489 [cs.LG]

(or arXiv:1701.04489v1 [cs.LG] for this version)

Abstract: In recent times, the use of separable convolutions in deep convolutional neural network architectures has been explored. Several researchers, most notably (Chollet, 2016) and (Ghosh, 2017) have used separable convolutions in their deep architectures and have demonstrated state of the art or close to state of the art performance. However, the underlying mechanism of action of separable convolutions are still not fully understood. Although their mathematical definition is well understood as a depthwise convolution followed by a pointwise convolution, deeper interpretations such as the extreme Inception hypothesis (Chollet, 2016) have failed to provide a thorough explanation of their efficacy. In this paper, we propose a hybrid interpretation that we believe is a better model for explaining the efficacy of separable convolutions.

## Towards A Rigorous Science of Interpretable Machine Learning

(PDF)

Authors:Finale Doshi-Velez, Been Kim

Subjects:

Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

Cite as:

arXiv:1702.08608 [stat.ML]

(or arXiv:1702.08608v2 [stat.ML] for this version)

Abstract: As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.

## Streaming Weak Submodularity: Interpreting Neural Networks on the Fly

(PDF)

Authors:Ethan R. Elenberg, Alexandros G. Dimakis, Moran Feldman, Amin Karbasi

To appear in NIPS 2017

Subjects:

Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

Cite as:

arXiv:1703.02647 [stat.ML]

(or arXiv:1703.02647v3 [stat.ML] for this version)

Abstract: In many machine learning applications, it is important to explain the predictions of a black-box classifier. For example, why does a deep neural network assign an image to a particular class? We cast interpretability of black-box classifiers as a combinatorial maximization problem and propose an efficient streaming algorithm to solve it subject to cardinality constraints. By extending ideas from Badanidiyuru et al. [2014], we provide a constant factor approximation guarantee for our algorithm in the case of random stream order and a weakly submodular objective function. This is the first such theoretical guarantee for this general class of functions, and we also show that no such algorithm exists for a worst case stream order. Our algorithm obtains similar explanations of Inception V3 predictions $10$ times faster than the state-of-the-art LIME framework of Ribeiro et al. [2016].

## Interpretable Structure-Evolving LSTM

(PDF)

Authors:Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing

To appear in CVPR 2017 as a spotlight paper

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

Cite as:

arXiv:1703.03055 [cs.CV]

(or arXiv:1703.03055v1 [cs.CV] for this version)

Abstract: This paper develops a general framework for learning interpretable data representation via Long Short-Term Memory (LSTM) recurrent neural networks over hierarchal graph structures. Instead of learning LSTM models over the pre-fixed structures, we propose to further learn the intermediate interpretable multi-level graph structures in a progressive and stochastic way from data during the LSTM network optimization. We thus call this model the structure-evolving LSTM. In particular, starting with an initial element-level graph representation where each node is a small data element, the structure-evolving LSTM gradually evolves the multi-level graph representations by stochastically merging the graph nodes with high compatibilities along the stacked LSTM layers. In each LSTM layer, we estimate the compatibility of two connected nodes from their corresponding LSTM gate outputs, which is used to generate a merging probability. The candidate graph structures are accordingly generated where the nodes are grouped into cliques with their merging probabilities. We then produce the new graph structure with a Metropolis-Hasting algorithm, which alleviates the risk of getting stuck in local optimums by stochastic sampling with an acceptance probability. Once a graph structure is accepted, a higher-level graph is then constructed by taking the partitioned cliques as its nodes. During the evolving process, representation becomes more abstracted in higher-levels where redundant information is filtered out, allowing more efficient propagation of long-range data dependencies. We evaluate the effectiveness of structure-evolving LSTM in the application of semantic object parsing and demonstrate its advantage over state-of-the-art LSTM models on standard benchmarks.

## Improving Interpretability of Deep Neural Networks with Semantic Information

(PDF)

Authors:Yinpeng Dong, Hang Su, Jun Zhu, Bo Zhang

To appear in CVPR 2017

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1703.04096 [cs.CV]

(or arXiv:1703.04096v2 [cs.CV] for this version)

Abstract: Interpretability of deep neural networks (DNNs) is essential since it enables users to understand the overall strengths and weaknesses of the models, conveys an understanding of how the models will behave in the future, and how to diagnose and correct potential problems. However, it is challenging to reason about what a DNN actually does due to its opaque or black-box nature. To address this issue, we propose a novel technique to improve the interpretability of DNNs by leveraging the rich semantic information embedded in human descriptions. By concentrating on the video captioning task, we first extract a set of semantically meaningful topics from the human descriptions that cover a wide range of visual concepts, and integrate them into the model with an interpretive loss. We then propose a prediction difference maximization algorithm to interpret the learned features of each neuron. Experimental results demonstrate its effectiveness in video captioning using the interpretable features, which can also be transferred to video action recognition. By clearly understanding the learned features, users can easily revise false predictions via a human-in-the-loop procedure.

## InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations

(PDF)

Authors:Yunzhu Li, Jiaming Song, Stefano Ermon

14 pages, NIPS 2017

Subjects:

Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1703.08840 [cs.LG]

(or arXiv:1703.08840v2 [cs.LG] for this version)

Abstract: The goal of imitation learning is to mimic expert behavior without access to an explicit reward signal. Expert demonstrations provided by humans, however, often show significant variability due to latent factors that are typically not explicitly modeled. In this paper, we propose a new algorithm that can infer the latent structure of expert demonstrations in an unsupervised way. Our method, built on top of Generative Adversarial Imitation Learning, can not only imitate complex behaviors, but also learn interpretable and meaningful representations of complex behavioral data, including visual demonstrations. In the driving domain, we show that a model learned from human demonstrations is able to both accurately reproduce a variety of behaviors and accurately anticipate human actions using raw visual inputs. Compared with various baselines, our method can better capture the latent structure underlying expert demonstrations, often recovering semantically meaningful factors of variation in the data.

## Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention

(PDF)

Authors:Jinkyu Kim, John Canny

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Cite as:

arXiv:1703.10631 [cs.CV]

(or arXiv:1703.10631v1 [cs.CV] for this version)

Abstract: Deep neural perception and control networks are likely to be a key component of self-driving vehicles. These models need to be explainable - they should provide easy-to-interpret rationales for their behavior - so that passengers, insurance companies, law enforcement, developers etc., can understand what triggered a particular behavior. Here we explore the use of visual explanations. These explanations take the form of real-time highlighted regions of an image that causally influence the network's output (steering control). Our approach is two-stage. In the first stage, we use a visual attention model to train a convolution network end-to-end from images to steering angle. The attention model highlights image regions that potentially influence the network's output. Some of these are true influences, but some are spurious. We then apply a causal filtering step to determine which input regions actually influence the output. This produces more succinct visual explanations and more accurately exposes the network's behavior. We demonstrate the effectiveness of our model on three datasets totaling 16 hours of driving. We first show that training with attention does not degrade the performance of the end-to-end network. Then we show that the network causally cues on a variety of features that are used by humans while driving.

## Interpretable 3D Human Action Analysis with Temporal Convolutional Networks

(PDF)

Authors:Tae Soo Kim, Austin Reiter

8 pages, 5 figures, BNMW CVPR 2017 Submission

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

MSC classes:

68T45, 68T10 (Primary)

ACM classes:

I.2.10; I.5.4

Cite as:

arXiv:1704.04516 [cs.CV]

(or arXiv:1704.04516v1 [cs.CV] for this version)

Abstract: The discriminative power of modern deep learning models for 3D human action recognition is growing ever so potent. In conjunction with the recent resurgence of 3D human action representation with 3D skeletons, the quality and the pace of recent progress have been significant. However, the inner workings of state-of-the-art learning based methods in 3D human action recognition still remain mostly black-box. In this work, we propose to use a new class of models known as Temporal Convolutional Neural Networks (TCN) for 3D human action recognition. Compared to popular LSTM-based Recurrent Neural Network models, given interpretable input such as 3D skeletons, TCN provides us a way to explicitly learn readily interpretable spatio-temporal representations for 3D human action recognition. We provide our strategy in re-designing the TCN with interpretability in mind and how such characteristics of the model is leveraged to construct a powerful 3D activity recognition method. Through this work, we wish to take a step towards a spatio-temporal model that is easier to understand, explain and interpret. The resulting model, Res-TCN, achieves state-of-the-art results on the largest 3D human action recognition dataset, NTU-RGBD.

## An Interpretable Knowledge Transfer Model for Knowledge Base Completion

(PDF)

Authors:Qizhe Xie, Xuezhe Ma, Zihang Dai, Eduard Hovy

Accepted by ACL 2017. Minor update

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

Cite as:

arXiv:1704.05908 [cs.CL]

(or arXiv:1704.05908v2 [cs.CL] for this version)

Abstract: Knowledge bases are important resources for a variety of natural language processing tasks but suffer from incompleteness. We propose a novel embedding model, \emph{ITransF}, to perform knowledge base completion. Equipped with a sparse attention mechanism, ITransF discovers hidden concepts of relations and transfer statistical strength through the sharing of concepts. Moreover, the learned associations between relations and concepts, which are represented by sparse attention vectors, can be interpreted easily. We evaluate ITransF on two benchmark datasets---WN18 and FB15k for knowledge base completion and obtains improvements on both the mean rank and [email protected] metrics, over all baselines that do not use additional information.

## Network Dissection: Quantifying Interpretability of Deep Visual Representations

(PDF)

Authors:David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba

First two authors contributed equally. Oral presentation at CVPR 2017

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

ACM classes:

I.2.10

Cite as:

arXiv:1704.05796 [cs.CV]

(or arXiv:1704.05796v1 [cs.CV] for this version)

Abstract: We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.

## Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML

(PDF)

Authors:Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy

Under Review of ICLR 2018

Subjects:

Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

Cite as:

arXiv:1705.07136 [cs.LG]

(or arXiv:1705.07136v3 [cs.LG] for this version)

Abstract: Reward augmented maximum likelihood (RAML), a simple and effective learning framework to directly optimize towards the reward function in structured prediction tasks, has led to a number of impressive empirical successes. RAML incorporates task-specific reward by performing maximum-likelihood updates on candidate outputs sampled according to an exponentiated payoff distribution, which gives higher probabilities to candidates that are close to the reference output. While RAML is notable for its simplicity, efficiency, and its impressive empirical successes, the theoretical properties of RAML, especially the behavior of the exponentiated payoff distribution, has not been examined thoroughly. In this work, we introduce softmax Q-distribution estimation, a novel theoretical interpretation of RAML, which reveals the relation between RAML and Bayesian decision theory. The softmax Q-distribution can be regarded as a smooth approximation of the Bayes decision boundary, and the Bayes decision rule is achieved by decoding with this Q-distribution. We further show that RAML is equivalent to approximately estimating the softmax Q-distribution, with the temperature $\tau$ controlling approximation error. We perform two experiments, one on synthetic data of multi-class classification and one on real data of image captioning, to demonstrate the relationship between RAML and the proposed softmax Q-distribution estimation method, verifying our theoretical analysis. Additional experiments on three structured prediction tasks with rewards defined on sequential (named entity recognition), tree-based (dependency parsing) and irregular (machine translation) structures show notable improvements over maximum likelihood baselines.

## Logic Tensor Networks for Semantic Image Interpretation

(PDF)

Authors:Ivan Donadello, Luciano Serafini, Artur d'Avila Garcez

14 pages, 2 figures, IJCAI 2017

Subjects:

Artificial Intelligence (cs.AI)

Cite as:

arXiv:1705.08968 [cs.AI]

(or arXiv:1705.08968v1 [cs.AI] for this version)

Abstract: Semantic Image Interpretation (SII) is the task of extracting structured semantic descriptions from images. It is widely agreed that the combined use of visual data and background knowledge is of great importance for SII. Recently, Statistical Relational Learning (SRL) approaches have been developed for reasoning under uncertainty and learning in the presence of data and rich knowledge. Logic Tensor Networks (LTNs) are an SRL framework which integrates neural networks with first-order fuzzy logic to allow (i) efficient learning from noisy data in the presence of logical constraints, and (ii) reasoning with logical formulas describing general properties of the data. In this paper, we develop and apply LTNs to two of the main tasks of SII, namely, the classification of an image's bounding boxes and the detection of the relevant part-of relations between objects. To the best of our knowledge, this is the first successful application of SRL to such SII tasks. The proposed approach is evaluated on a standard image processing benchmark. Experiments show that the use of background knowledge in the form of logical constraints can improve the performance of purely data-driven approaches, including the state-of-the-art Fast Region-based Convolutional Neural Networks (Fast R-CNN). Moreover, we show that the use of logical background knowledge adds robustness to the learning system when errors are present in the labels of the training data.

## Patchnet: Interpretable Neural Networks for Image Classification

(PDF)

Authors:Adityanarayanan Radhakrishnan, Charles Durham, Ali Soylemezoglu, Caroline Uhler

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1705.08078 [cs.CV]

(or arXiv:1705.08078v1 [cs.CV] for this version)

Abstract: The ability to visually understand and interpret learned features from complex predictive models is crucial for their acceptance in sensitive areas such as health care. To move closer to this goal of truly interpretable complex models, we present PatchNet, a network that restricts global context for image classification tasks in order to easily provide visual representations of learned texture features on a predetermined local scale. We demonstrate how PatchNet provides visual heatmap representations of the learned features, and we mathematically analyze the behavior of the network during convergence. We also present a version of PatchNet that is particularly well suited for lowering false positive rates in image classification tasks. We apply PatchNet to the classification of textures from the Describable Textures Dataset and to the ISBI-ISIC 2016 melanoma classification challenge.

## A Unified Approach to Interpreting Model Predictions

(PDF)

Authors:Scott Lundberg, Su-In Lee

To appear in NIPS 2017

Subjects:

Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Cite as:

arXiv:1705.07874 [cs.AI]

(or arXiv:1705.07874v2 [cs.AI] for this version)

Abstract: Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

## Interpreting Blackbox Models via Model Extraction

(PDF)

Authors:Osbert Bastani, Carolyn Kim, Hamsa Bastani

Subjects:

Learning (cs.LG)

Cite as:

arXiv:1705.08504 [cs.LG]

(or arXiv:1705.08504v1 [cs.LG] for this version)

Abstract: Interpretability has become an important issue as machine learning is increasingly used to inform consequential decisions. We propose an approach for interpreting a blackbox model by extracting a decision tree that approximates the model. Our model extraction algorithm avoids overfitting by leveraging blackbox model access to actively sample new training points. We prove that as the number of samples goes to infinity, the decision tree learned using our algorithm converges to the exact greedy decision tree. In our evaluation, we use our algorithm to interpret random forests and neural nets trained on several datasets from the UCI Machine Learning Repository, as well as control policies learned for three classical reinforcement learning problems. We show that our algorithm improves over a baseline based on CART on every problem instance. Furthermore, we show how an interpretation generated by our approach can be used to understand and debug these models.

## Interpretable & Explorable Approximations of Black Box Models

(PDF)

Authors:Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Jure Leskovec

Presented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning

Subjects:

Artificial Intelligence (cs.AI)

Cite as:

arXiv:1707.01154 [cs.AI]

(or arXiv:1707.01154v1 [cs.AI] for this version)

Abstract: We propose Black Box Explanations through Transparent Approximations (BETA), a novel model agnostic framework for explaining the behavior of any black-box classifier by simultaneously optimizing for fidelity to the original model and interpretability of the explanation. To this end, we develop a novel objective function which allows us to learn (with optimality guarantees), a small number of compact decision sets each of which explains the behavior of the black box model in unambiguous, well-defined regions of feature space. Furthermore, our framework also is capable of accepting user input when generating these approximations, thus allowing users to interactively explore how the black-box model behaves in different subspaces that are of interest to the user. To the best of our knowledge, this is the first approach which can produce global explanations of the behavior of any given black box model through joint optimization of unambiguity, fidelity, and interpretability, while also allowing users to explore model behavior based on their preferences. Experimental evaluation with real-world datasets and user studies demonstrates that our approach can generate highly compact, easy-to-understand, yet accurate approximations of various kinds of predictive models compared to state-of-the-art baselines.

## Interpretability via Model Extraction

(PDF)

Authors:Osbert Bastani, Carolyn Kim, Hamsa Bastani

Presented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017)

Subjects:

Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)

Cite as:

arXiv:1706.09773 [cs.LG]

(or arXiv:1706.09773v2 [cs.LG] for this version)

Abstract: The ability to interpret machine learning models has become increasingly important now that machine learning is used to inform consequential decisions. We propose an approach called model extraction for interpreting complex, blackbox models. Our approach approximates the complex model using a much more interpretable model; as long as the approximation quality is good, then statistical properties of the complex model are reflected in the interpretable model. We show how model extraction can be used to understand and debug random forests and neural nets trained on several datasets from the UCI Machine Learning Repository, as well as control policies learned for several classical reinforcement learning problems.

## Methods for Interpreting and Understanding Deep Neural Networks

(PDF)

Authors:Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller

14 pages, 10 figures

Subjects:

Learning (cs.LG); Machine Learning (stat.ML)

DOI:

10.1016/j.dsp.2017.10.011

Cite as:

arXiv:1706.07979 [cs.LG]

(or arXiv:1706.07979v1 [cs.LG] for this version)

Abstract: This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. It introduces some recently proposed techniques of interpretation, along with theory, tricks and recommendations, to make most efficient use of these techniques on real data. It also discusses a number of practical applications.

## MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network

(PDF)

Authors:Zizhao Zhang, Yuanpu Xie, Fuyong Xing, Mason McGough, Lin Yang

CVPR2017 Oral

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1707.02485 [cs.CV]

(or arXiv:1707.02485v1 [cs.CV] for this version)

Abstract: The inability to interpret the model prediction in semantically and visually meaningful ways is a well-known shortcoming of most existing computer-aided diagnosis methods. In this paper, we propose MDNet to establish a direct multimodal mapping between medical images and diagnostic reports that can read images, generate diagnostic reports, retrieve images by symptom descriptions, and visualize attention, to provide justifications of the network diagnosis process. MDNet includes an image model and a language model. The image model is proposed to enhance multi-scale feature ensembles and utilization efficiency. The language model, integrated with our improved attention mechanism, aims to read and explore discriminative image feature descriptions from reports to learn a direct mapping from sentence words to image pixels. The overall network is trained end-to-end by using our developed optimization strategy. Based on a pathology bladder cancer images and its diagnostic reports (BCIDR) dataset, we conduct sufficient experiments to demonstrate that MDNet outperforms comparative baselines. The proposed image model obtains state-of-the-art performance on two CIFAR datasets as well.

## A Formal Framework to Characterize Interpretability of Procedures

(PDF)

Authors:Amit Dhurandhar, Vijay Iyengar, Ronny Luss, Karthikeyan Shanmugam

presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia

Subjects:

Artificial Intelligence (cs.AI)

Cite as:

arXiv:1707.03886 [cs.AI]

(or arXiv:1707.03886v1 [cs.AI] for this version)

Abstract: We provide a novel notion of what it means to be interpretable, looking past the usual association with human understanding. Our key insight is that interpretability is not an absolute concept and so we define it relative to a target model, which may or may not be a human. We define a framework that allows for comparing interpretable procedures by linking it to important practical aspects such as accuracy and robustness. We characterize many of the current state-of-the-art interpretable methods in our framework portraying its general applicability.

## Interpreting Classifiers through Attribute Interactions in Datasets

(PDF)

Authors:Andreas Henelius, Kai Puolamäki, Antti Ukkonen

presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1707.07576 [stat.ML]

(or arXiv:1707.07576v1 [stat.ML] for this version)

Abstract: In this work we present the novel ASTRID method for investigating which attribute interactions classifiers exploit when making predictions. Attribute interactions in classification tasks mean that two or more attributes together provide stronger evidence for a particular class label. Knowledge of such interactions makes models more interpretable by revealing associations between attributes. This has applications, e.g., in pharmacovigilance to identify interactions between drugs or in bioinformatics to investigate associations between single nucleotide polymorphisms. We also show how the found attribute partitioning is related to a factorisation of the data generating distribution and empirically demonstrate the utility of the proposed method.

## Interpretable Active Learning

(PDF)

Authors:Richard L. Phillips, Kyu Hyun Chang, Sorelle A. Friedler

6 pages, 5 figures, presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1708.00049 [stat.ML]

(or arXiv:1708.00049v1 [stat.ML] for this version)

Abstract: Active learning has long been a topic of study in machine learning. However, as increasingly complex and opaque models have become standard practice, the process of active learning, too, has become more opaque. There has been little investigation into interpreting what specific trends and patterns an active learning strategy may be exploring. This work expands on the Local Interpretable Model-agnostic Explanations framework (LIME) to provide explanations for active learning recommendations. We demonstrate how LIME can be used to generate locally faithful explanations for an active learning strategy, and how these explanations can be used to understand how different models and datasets explore a problem space over time. In order to quantify the per-subgroup differences in how an active learning strategy queries spatial regions, we introduce a notion of uncertainty bias (based on disparate impact) to measure the discrepancy in the confidence for a model's predictions between one subgroup and another. Using the uncertainty bias measure, we show that our query explanations accurately reflect the subgroup focus of the active learning queries, allowing for an interpretable explanation of what is being learned as points with similar sources of uncertainty have their uncertainty bias resolved. We demonstrate that this technique can be applied to track uncertainty bias over user-defined clusters or automatically generated clusters based on the source of uncertainty.

## Using Program Induction to Interpret Transition System Dynamics

(PDF)

Authors:Svetlin Penkov, Subramanian Ramamoorthy

Presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia. arXiv admin note: substantial text overlap with arXiv:1705.08320

Subjects:

Artificial Intelligence (cs.AI)

Cite as:

arXiv:1708.00376 [cs.AI]

(or arXiv:1708.00376v1 [cs.AI] for this version)

Abstract: Explaining and reasoning about processes which underlie observed black-box phenomena enables the discovery of causal mechanisms, derivation of suitable abstract representations and the formulation of more robust predictions. We propose to learn high level functional programs in order to represent abstract models which capture the invariant structure in the observed data. We introduce the $\pi$-machine (program-induction machine) -- an architecture able to induce interpretable LISP-like programs from observed data traces. We propose an optimisation procedure for program learning based on backpropagation, gradient descent and A* search. We apply the proposed method to two problems: system identification of dynamical systems and explaining the behaviour of a DQN agent. Our results show that the $\pi$-machine can efficiently induce interpretable programs from individual data traces.

## Warp: a method for neural network interpretability applied to gene expression profiles

(PDF)

Authors:Trofimov Assya, Lemieux Sebastien, Perreault Claude

5 pages, 3 figures, NIPS2016, Machine Learning in Computational Biology workshop

Subjects:

Genomics (q-bio.GN); Artificial Intelligence (cs.AI)

Cite as:

arXiv:1708.04988 [q-bio.GN]

(or arXiv:1708.04988v1 [q-bio.GN] for this version)

Abstract: We show a proof of principle for warping, a method to interpret the inner working of neural networks in the context of gene expression analysis. Warping is an efficient way to gain insight to the inner workings of neural nets and make them more interpretable. We demonstrate the ability of warping to recover meaningful information for a given class on a samplespecific individual basis. We found warping works well in both linearly and nonlinearly separable datasets. These encouraging results show that warping has a potential to be the answer to neural networks interpretability in computational biology.

## DeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation of Self-Reported Pain

(PDF)

Authors:Dianbo Liu, Fengjiao Peng, Andrew Shea, Ognjen (Oggi)Rudovic, Rosalind Picard

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

Cite as:

arXiv:1708.04670 [cs.CV]

(or arXiv:1708.04670v1 [cs.CV] for this version)

Abstract: Previous research on automatic pain estimation from facial expressions has focused primarily on "one-size-fits-all" metrics (such as PSPI). In this work, we focus on directly estimating each individual's self-reported visual-analog scale (VAS) pain metric, as this is considered the gold standard for pain measurement. The VAS pain score is highly subjective and context-dependent, and its range can vary significantly among different persons. To tackle these issues, we propose a novel two-stage personalized model, named DeepFaceLIFT, for automatic estimation of VAS. This model is based on (1) Neural Network and (2) Gaussian process regression models, and is used to personalize the estimation of self-reported pain via a set of hand-crafted personal features and multi-task learning. We show on the benchmark dataset for pain analysis (The UNBC-McMaster Shoulder Pain Expression Archive) that the proposed personalized model largely outperforms the traditional, unpersonalized models: the intra-class correlation improves from a baseline performance of 19\% to a personalized performance of 35\% while also providing confidence in the model\textquotesingle s estimates -- in contrast to existing models for the target task. Additionally, DeepFaceLIFT automatically discovers the pain-relevant facial regions for each person, allowing for an easy interpretation of the pain-related facial cues.

## Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples

(PDF)

Authors:Yinpeng Dong, Hang Su, Jun Zhu, Fan Bao

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1708.05493 [cs.CV]

(or arXiv:1708.05493v1 [cs.CV] for this version)

Abstract: Deep neural networks (DNNs) have demonstrated impressive performance on a wide array of tasks, but they are usually considered opaque since internal structure and learned parameters are not interpretable. In this paper, we re-examine the internal representations of DNNs using adversarial images, which are generated by an ensemble-optimization algorithm. We find that: (1) the neurons in DNNs do not truly detect semantic objects/parts, but respond to objects/parts only as recurrent discriminative patches; (2) deep visual representations are not robust distributed codes of visual concepts because the representations of adversarial images are largely not consistent with those of real images, although they have similar visual appearance, both of which are different from previous findings. To further improve the interpretability of DNNs, we propose an adversarial training scheme with a consistent loss such that the neurons are endowed with human-interpretable concepts. The induced interpretable representations enable us to trace eventual outcomes back to influential neurons. Therefore, human users can know how the models make predictions, as well as when and why they make errors.

## More cat than cute? Interpretable Prediction of Adjective-Noun Pairs

(PDF)

Authors:Delia Fernandez, Alejandro Woodward, Victor Campos, Xavier Giro-i-Nieto, Brendan Jou, Shih-Fu Chang

Oral paper at ACM Multimedia 2017 Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes (MUSA2)

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

DOI:

10.1145/3132515.3132520

Cite as:

arXiv:1708.06039 [cs.CV]

(or arXiv:1708.06039v1 [cs.CV] for this version)

Abstract: The increasing availability of affect-rich multimedia resources has bolstered interest in understanding sentiment and emotions in and from visual content. Adjective-noun pairs (ANP) are a popular mid-level semantic construct for capturing affect via visually detectable concepts such as "cute dog" or "beautiful landscape". Current state-of-the-art methods approach ANP prediction by considering each of these compound concepts as individual tokens, ignoring the underlying relationships in ANPs. This work aims at disentangling the contributions of the

adjectives' and
nouns' in the visual prediction of ANPs. Two specialised classifiers, one trained for detecting adjectives and another for nouns, are fused to predict 553 different ANPs. The resulting ANP prediction model is more interpretable as it allows us to study contributions of the adjective and noun components. Source code and models are available at this https URL .

## Interpretable Categorization of Heterogeneous Time Series Data

(PDF)

Authors:Ritchie Lee, Mykel J. Kochenderfer, Ole J. Mengshoel, Joshua Silbermann

10 pages, 7 figures

Subjects:

Learning (cs.LG)

Cite as:

arXiv:1708.09121 [cs.LG]

(or arXiv:1708.09121v1 [cs.LG] for this version)

Abstract: The explanation of heterogeneous multivariate time series data is a central problem in many applications. The problem requires two major data mining challenges to be addressed simultaneously: Learning models that are human-interpretable and mining of heterogeneous multivariate time series data. The intersection of these two areas is not adequately explored in the existing literature. To address this gap, we propose grammar-based decision trees and an algorithm for learning them. Grammar-based decision tree extends decision trees with a grammar framework. Logical expressions, derived from context-free grammar, are used for branching in place of simple thresholds on attributes. The added expressivity enables support for a wide range of data types while retaining the interpretability of decision trees. By choosing a grammar based on temporal logic, we show that grammar-based decision trees can be used for the interpretable classification of high-dimensional and heterogeneous time series data. In addition to classification, we show how grammar-based decision trees can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply grammar-based decision trees to analyze the classic Australian Sign Language dataset as well as categorize and explain near mid-air collisions to support the development of a prototype aircraft collision avoidance system.

## Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

(PDF)

Authors:Wojciech Samek, Thomas Wiegand, Klaus-Robert Müller

8 pages, 2 figures

Subjects:

Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Cite as:

arXiv:1708.08296 [cs.AI]

(or arXiv:1708.08296v1 [cs.AI] for this version)

Abstract: With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g., in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks.

## Interpreting Shared Deep Learning Models via Explicable Boundary Trees

(PDF)

Authors:Huijun Wu, Chen Wang, Jie Yin, Kai Lu, Liming Zhu

9 pages, 10 figures

Subjects:

Learning (cs.LG); Human-Computer Interaction (cs.HC)

Cite as:

arXiv:1709.03730 [cs.LG]

(or arXiv:1709.03730v1 [cs.LG] for this version)

Abstract: Despite outperforming the human in many tasks, deep neural network models are also criticized for the lack of transparency and interpretability in decision making. The opaqueness results in uncertainty and low confidence when deploying such a model in model sharing scenarios, when the model is developed by a third party. For a supervised machine learning model, sharing training process including training data provides an effective way to gain trust and to better understand model predictions. However, it is not always possible to share all training data due to privacy and policy constraints. In this paper, we propose a method to disclose a small set of training data that is just sufficient for users to get the insight of a complicated model. The method constructs a boundary tree using selected training data and the tree is able to approximate the complicated model with high fidelity. We show that traversing data points in the tree gives users significantly better understanding of the model and paves the way for trustworthy model sharing.

## Interpretable Graph-Based Semi-Supervised Learning via Flows

(PDF)

Authors:Raif M. Rustamov, James T. Klosowski

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1709.04764 [stat.ML]

(or arXiv:1709.04764v1 [stat.ML] for this version)

Abstract: In this paper, we consider the interpretability of the foundational Laplacian-based semi-supervised learning approaches on graphs. We introduce a novel flow-based learning framework that subsumes the foundational approaches and additionally provides a detailed, transparent, and easily understood expression of the learning process in terms of graph flows. As a result, one can visualize and interactively explore the precise subgraph along which the information from labeled nodes flows to an unlabeled node of interest. Surprisingly, the proposed framework avoids trading accuracy for interpretability, but in fact leads to improved prediction accuracy, which is supported both by theoretical considerations and empirical results. The flow-based framework guarantees the maximum principle by construction and can handle directed graphs in an out-of-the-box manner.

## Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

(PDF)

Authors:Wei-Ning Hsu, Yu Zhang, James Glass

Accepted to NIPS 2017

Subjects:

Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

Cite as:

arXiv:1709.07902 [cs.LG]

(or arXiv:1709.07902v1 [cs.LG] for this version)

Abstract: We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.

## CTD: Fast, Accurate, and Interpretable Method for Static and Dynamic Tensor Decompositions

(PDF)

Authors:Jungwoo Lee, Dongjin Choi, Lee Sael

Subjects:

Numerical Analysis (cs.NA); Learning (cs.LG); Machine Learning (stat.ML)

Cite as:

arXiv:1710.03608 [cs.NA]

(or arXiv:1710.03608v1 [cs.NA] for this version)

Abstract: How can we find patterns and anomalies in a tensor, or multi-dimensional array, in an efficient and directly interpretable way? How can we do this in an online environment, where a new tensor arrives each time step? Finding patterns and anomalies in a tensor is a crucial problem with many applications, including building safety monitoring, patient health monitoring, cyber security, terrorist detection, and fake user detection in social networks. Standard PARAFAC and Tucker decomposition results are not directly interpretable. Although a few sampling-based methods have previously been proposed towards better interpretability, they need to be made faster, more memory efficient, and more accurate. In this paper, we propose CTD, a fast, accurate, and directly interpretable tensor decomposition method based on sampling. CTD-S, the static version of CTD, provably guarantees a high accuracy that is 17 ~ 83x more accurate than that of the state-of-the-art method. Also, CTD-S is made 5 ~ 86x faster, and 7 ~ 12x more memory-efficient than the state-of-the-art method by removing redundancy. CTD-D, the dynamic version of CTD, is the first interpretable dynamic tensor decomposition method ever proposed. Also, it is made 2 ~ 3x faster than already fast CTD-S by exploiting factors at previous time step and by reordering operations. With CTD, we demonstrate how the results can be effectively interpreted in the online distributed denial of service (DDoS) attack detection.

## Interpretable Convolutional Neural Networks

(PDF)

Authors:Quanshi Zhang, Ying Nian Wu, Song-Chun Zhu

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1710.00935 [cs.CV]

(or arXiv:1710.00935v3 [cs.CV] for this version)

Abstract: This paper proposes a method to modify traditional convolutional neural networks (CNNs) into interpretable CNNs, in order to clarify knowledge representations in high conv-layers of CNNs. In an interpretable CNN, each filter in a high conv-layer represents a certain object part. We do not need any annotations of object parts or textures to supervise the learning process. Instead, the interpretable CNN automatically assigns each filter in a high conv-layer with an object part during the learning process. Our method can be applied to different types of CNNs with different structures. The clear knowledge representation in an interpretable CNN can help people understand the logics inside a CNN, i.e., based on which patterns the CNN makes the decision. Experiments showed that filters in an interpretable CNN were more semantically meaningful than those in traditional CNNs.

## Interpretable Machine Learning for Privacy-Preserving Pervasive Systems

(PDF)

Authors:Benjamin Baron, Mirco Musolesi

Subjects:

Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)

Cite as:

arXiv:1710.08464 [stat.ML]

(or arXiv:1710.08464v3 [stat.ML] for this version)

Abstract: The presence of pervasive systems in our everyday lives and the interaction of users with connected devices such as smartphones or home appliances generate increasing amounts of traces that reflect users' behavior. A plethora of machine learning techniques enable service providers to process these traces to extract latent information about the users. While most of the existing projects have focused on the accuracy of these techniques, little work has been done on the interpretation of the inference and identification algorithms based on them. In this paper, we propose a machine learning interpretability framework for inference algorithms based on data collected through pervasive systems and we outline the open challenges in this research area. Our interpretability framework enable users to understand how the traces they generate could expose their privacy, while allowing for usable and personalized services at the same time.

## InterpNET: Neural Introspection for Interpretable Deep Learning

(PDF)

Authors:Shane Barratt

Presented at NIPS 2017 Symposium on Interpretable Machine Learning

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1710.09511 [stat.ML]

(or arXiv:1710.09511v2 [stat.ML] for this version)

Abstract: Humans are able to explain their reasoning. On the contrary, deep neural networks are not. This paper attempts to bridge this gap by introducing a new way to design interpretable neural networks for classification, inspired by physiological evidence of the human visual system's inner-workings. This paper proposes a neural network design paradigm, termed InterpNET, which can be combined with any existing classification architecture to generate natural language explanations of the classifications. The success of the module relies on the assumption that the network's computation and reasoning is represented in its internal layer activations. While in principle InterpNET could be applied to any existing classification architecture, it is evaluated via an image classification and explanation task. Experiments on a CUB bird classification and explanation dataset show qualitatively and quantitatively that the model is able to generate high-quality explanations. While the current state-of-the-art METEOR score on this dataset is 29.2, InterpNET achieves a much higher METEOR score of 37.9.

## MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural Networks

(PDF)

Authors:Minmin Chen

Presented at NIPS 2017 Symposium on Interpretable Machine Learning

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1711.06788 [stat.ML]

(or arXiv:1711.06788v1 [stat.ML] for this version)

Abstract: We introduce MinimalRNN, a new recurrent neural network architecture that achieves comparable performance as the popular gated RNNs with a simplified structure. It employs minimal updates within RNN, which not only leads to efficient learning and testing but more importantly better interpretability and trainability. We demonstrate that by endorsing the more restrictive update rule, MinimalRNN learns disentangled RNN states. We further examine the learning dynamics of different RNN structures using input-output Jacobians, and show that MinimalRNN is able to capture longer range dependencies than existing RNN architectures.

## Beyond Sparsity: Tree Regularization of Deep Models for Interpretability

(PDF)

Authors:Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez

To appear in AAAI 2018. Contains 9-page main paper and appendix with supplementary material

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1711.06178 [stat.ML]

(or arXiv:1711.06178v1 [stat.ML] for this version)

Abstract: The lack of interpretability remains a key barrier to the adoption of deep models in many applications. In this work, we explicitly regularize deep models so human users might step through the process behind their predictions in little time. Specifically, we train deep time-series models so their class-probability predictions have high accuracy while being closely modeled by decision trees with few nodes. Using intuitive toy examples as well as medical tasks for treating sepsis and HIV, we demonstrate that this new tree regularization yields models that are easier for humans to simulate than simpler L1 or L2 penalties without sacrificing predictive power.

## The Promise and Peril of Human Evaluation for Model Interpretability

(PDF)

Authors:Bernease Herman

Presented at NIPS 2017 Symposium on Interpretable Machine Learning

Subjects:

Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Cite as:

arXiv:1711.07414 [cs.AI]

(or arXiv:1711.07414v1 [cs.AI] for this version)

Abstract: Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive explanations. We discuss reasoning suggesting that functional interpretability may be correlated with cognitive function and user preferences. If this is indeed the case, evaluation and optimization using functional metrics could perpetuate implicit cognitive bias in explanations that threaten transparency. Finally, we propose two potential research directions to disambiguate cognitive function and explanation models, retaining control over the tradeoff between accuracy and interpretability.

## Unleashing the Potential of CNNs for Interpretable Few-Shot Learning

(PDF)

Authors:Boyang Deng, Qing Liu, Siyuan Qiao, Alan Yuille

Under review as a conference paper at ICLR 2018

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

Cite as:

arXiv:1711.08277 [cs.CV]

(or arXiv:1711.08277v1 [cs.CV] for this version)

Abstract: Convolutional neural networks (CNNs) have been generally acknowledged as one of the driving forces for the advancement of computer vision. Despite their promising performances on many tasks, CNNs still face major obstacles on the road to achieving ideal machine intelligence. One is the difficulty of interpreting them and understanding their inner workings, which is important for diagnosing their failures and correcting them. Another is that standard CNNs require large amounts of annotated data, which is sometimes very hard to obtain. Hence, it is desirable to enable them to learn from few examples. In this work, we address these two limitations of CNNs by developing novel and interpretable models for few-shot learning. Our models are based on the idea of encoding objects in terms of visual concepts, which are interpretable visual cues represented within CNNs. We first use qualitative visualizations and quantitative statistics, to uncover several key properties of feature encoding using visual concepts. Motivated by these properties, we present two intuitive models for the problem of few-shot learning. Experiments show that our models achieve competitive performances, while being much more flexible and interpretable than previous state-of-the-art few-shot learning methods. We conclude that visual concepts expose the natural capability of CNNs for few-shot learning.

## Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action Recognition

(PDF)

Authors:Jingxuan Hou, Tae Soo Kim, Austin Reiter

8 pages, 8 figures, CVPR18 submission

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1711.08502 [cs.CV]

(or arXiv:1711.08502v1 [cs.CV] for this version)

Abstract: Despite the growing discriminative capabilities of modern deep learning methods for recognition tasks, the inner workings of the state-of-art models still remain mostly black-boxes. In this paper, we propose a systematic interpretation of model parameters and hidden representations of Residual Temporal Convolutional Networks (Res-TCN) for action recognition in time-series data. We also propose a Feature Map Decoder as part of the interpretation analysis, which outputs a representation of model's hidden variables in the same domain as the input. Such analysis empowers us to expose model's characteristic learning patterns in an interpretable way. For example, through the diagnosis analysis, we discovered that our model has learned to achieve view-point invariance by implicitly learning to perform rotational normalization of the input to a more discriminative view. Based on the findings from the model interpretation analysis, we propose a targeted refinement technique, which can generalize to various other recognition models. The proposed work introduces a three-stage paradigm for model learning: training, interpretable diagnosis and targeted refinement. We validate our approach on skeleton based 3D human action recognition benchmark of NTU RGB+D. We show that the proposed workflow is an effective model learning strategy and the resulting Multi-stream Residual Temporal Convolutional Network (MS-Res-TCN) achieves the state-of-the-art performance on NTU RGB+D.

## SPINE: SParse Interpretable Neural Embeddings

(PDF)

Authors:Anant Subramanian, Danish Pruthi, Harsh Jhamtani, Taylor Berg-Kirkpatrick, Eduard Hovy

AAAI 2018

Subjects:

Computation and Language (cs.CL)

Cite as:

arXiv:1711.08792 [cs.CL]

(or arXiv:1711.08792v1 [cs.CL] for this version)

Abstract: Prediction without justification has limited utility. Much of the success of neural models can be attributed to their ability to learn rich, dense and expressive representations. While these representations capture the underlying complexity and latent trends in the data, they are far from being interpretable. We propose a novel variant of denoising k-sparse autoencoders that generates highly efficient and interpretable distributed word representations (word embeddings), beginning with existing word representations from state-of-the-art methods like GloVe and word2vec. Through large scale human evaluation, we report that our resulting word embedddings are much more interpretable than the original GloVe and word2vec embeddings. Moreover, our embeddings outperform existing popular word embeddings on a diverse suite of benchmark downstream tasks.

## Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients

(PDF)

Authors:Andrew Slavin Ross, Finale Doshi-Velez

To appear in AAAI 2018

Subjects:

Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1711.09404 [cs.LG]

(or arXiv:1711.09404v1 [cs.LG] for this version)

Abstract: Deep neural networks have proven remarkably effective at solving many classification problems, but have been criticized recently for two major weaknesses: the reasons behind their predictions are uninterpretable, and the predictions themselves can often be fooled by small adversarial perturbations. These problems pose major obstacles for the adoption of neural networks in domains that require security or transparency. In this work, we evaluate the effectiveness of defenses that differentiably penalize the degree to which small changes in inputs can alter model predictions. Across multiple attacks, architectures, defenses, and datasets, we find that neural networks trained with this input gradient regularization exhibit robustness to transferred adversarial examples generated to fool all of the other models. We also find that adversarial examples generated to fool gradient-regularized models fool all other models equally well, and actually lead to more "legitimate," interpretable misclassifications as rated by people (which we confirm in a human subject experiment). Finally, we demonstrate that regularizing input gradients makes them more naturally interpretable as rationales for model predictions. We conclude by discussing this relationship between interpretability and robustness in deep neural networks.

## Interpretable Convolutional Neural Networks for Effective Translation Initiation Site Prediction

(PDF)

Authors:Jasper Zuallaert, Mijung Kim, Yvan Saeys, Wesley De Neve

Presented at International Workshop on Deep Learning in Bioinformatics, Biomedicine, and Healthcare Informatics (DLB2H 2017) --- in conjunction with the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2017)

Subjects:

Genomics (q-bio.GN); Learning (cs.LG)

Cite as:

arXiv:1711.09558 [q-bio.GN]

(or arXiv:1711.09558v1 [q-bio.GN] for this version)

Abstract: Thanks to rapidly evolving sequencing techniques, the amount of genomic data at our disposal is growing increasingly large. Determining the gene structure is a fundamental requirement to effectively interpret gene function and regulation. An important part in that determination process is the identification of translation initiation sites. In this paper, we propose a novel approach for automatic prediction of translation initiation sites, leveraging convolutional neural networks that allow for automatic feature extraction. Our experimental results demonstrate that we are able to improve the state-of-the-art approaches with a decrease of 75.2% in false positive rate and with a decrease of 24.5% in error rate on chosen datasets. Furthermore, an in-depth analysis of the decision-making process used by our predictive model shows that our neural network implicitly learns biologically relevant features from scratch, without any prior knowledge about the problem at hand, such as the Kozak consensus sequence, the influence of stop and start codons in the sequence and the presence of donor splice site patterns. In summary, our findings yield a better understanding of the internal reasoning of a convolutional neural network when applying such a neural network to genomic data.

## Interpretable Facial Relational Network Using Relational Importance

(PDF)

Authors:Seong Tae Kim, Yong Man Ro

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as:

arXiv:1711.10688 [cs.CV]

(or arXiv:1711.10688v1 [cs.CV] for this version)

Abstract: Human face analysis is an important task in computer vision. According to cognitive-psychological studies, facial dynamics could provide crucial cues for face analysis. In particular, the motion of facial local regions in facial expression is related to the motion of other facial regions. In this paper, a novel deep learning approach which exploits the relations of facial local dynamics has been proposed to estimate facial traits from expression sequence. In order to exploit the relations of facial dynamics in local regions, the proposed network consists of a facial local dynamic feature encoding network and a facial relational network. The facial relational network is designed to be interpretable. Relational importance is automatically encoded and facial traits are estimated by combining relational features based on the relational importance. The relations of facial dynamics for facial trait estimation could be interpreted by using the relational importance. By comparative experiments, the effectiveness of the proposed method has been validated. Experimental results show that the proposed method outperforms the state-of-the-art methods in gender and age estimation.

## An interpretable latent variable model for attribute applicability in the Amazon catalogue

(PDF)

Authors:Tammo Rukat, Dustin Lange, Cédric Archambeau

Presented at NIPS 2017 Symposium on Interpretable Machine Learning

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)

Cite as:

arXiv:1712.00126 [stat.ML]

(or arXiv:1712.00126v2 [stat.ML] for this version)

Abstract: Learning attribute applicability of products in the Amazon catalog (e.g., predicting that a shoe should have a value for size, but not for battery-type at scale is a challenge. The need for an interpretable model is contingent on (1) the lack of ground truth training data, (2) the need to utilise prior information about the underlying latent space and (3) the ability to understand the quality of predictions on new, unseen data. To this end, we develop the MaxMachine, a probabilistic latent variable model that learns distributed binary representations, associated to sets of features that are likely to co-occur in the data. Layers of MaxMachines can be stacked such that higher layers encode more abstract information. Any set of variables can be clamped to encode prior information. We develop fast sampling based posterior inference. Preliminary results show that the model improves over the baseline in 17 out of 19 product groups and provides qualitatively reasonable predictions.

## Where Classification Fails, Interpretation Rises

(PDF)

Authors:Chanh Nguyen, Georgi Georgiev, Yujie Ji, Ting Wang

6 pages, 6 figures

Subjects:

Learning (cs.LG); Machine Learning (stat.ML)

Cite as:

arXiv:1712.00558 [cs.LG]

(or arXiv:1712.00558v1 [cs.LG] for this version)

Abstract: An intriguing property of deep neural networks is their inherent vulnerability to adversarial inputs, which significantly hinders their application in security-critical domains. Most existing detection methods attempt to use carefully engineered patterns to distinguish adversarial inputs from their genuine counterparts, which however can often be circumvented by adaptive adversaries. In this work, we take a completely different route by leveraging the definition of adversarial inputs: while deceiving for deep neural networks, they are barely discernible for human visions. Building upon recent advances in interpretable models, we construct a new detection framework that contrasts an input's interpretation against its classification. We validate the efficacy of this framework through extensive experiments using benchmark datasets and attacks. We believe that this work opens a new direction for designing adversarial input detection methods.

## SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

(PDF)

Authors:Garrett B. Goh, Nathan O. Hodas, Charles Siegel, Abhinav Vishnu

Subjects:

Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

Cite as:

arXiv:1712.02034 [stat.ML]

(or arXiv:1712.02034v1 [stat.ML] for this version)

Abstract: Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2Vec, a deep RNN that automatically learns features from SMILES strings to predict chemical properties, without the need for additional explicit chemical information, or the "grammar" of how SMILES encode structural data. Using Bayesian optimization methods to tune the network architecture, we show that an optimized SMILES2Vec model can serve as a general-purpose neural network for learning a range of distinct chemical properties including toxicity, activity, solubility and solvation energy, while outperforming contemporary MLP networks that uses engineered features. Furthermore, we demonstrate proof-of-concept of interpretability by developing an explanation mask that localizes on the most important characters used in making a prediction. When tested on the solubility dataset, this localization identifies specific parts of a chemical that is consistent with established first-principles knowledge of solubility with an accuracy of 88%, demonstrating that neural networks can learn technically accurate chemical concepts. The fact that SMILES2Vec validates established chemical facts, while providing state-of-the-art accuracy, makes it a potential tool for widespread adoption of interpretable deep learning by the chemistry community.