Towards Few-shot Self-explaining Graph Neural Networks (2024)

11institutetext: State Key Laboratory of Cognitive Intelligence,
University of Science and Technology of China, Hefei, China
22institutetext: Institute of Artificial Intelligence
Hefei Comprehensive National Science Center, Hefei, China
22email: {jypeng28,lnyue,zaixi,percy}@mail.ustc.edu.cn22email: {qiliuql,kkzhang08@ustc.edu.cn}

Jingyu Peng11  Qi Liu1122  Linan Yue11  Zaixi Zhang11  Kai Zhang11  Yunhao Sha11

Abstract

Recent advancements in Graph Neural Networks (GNNs) have spurred an upsurge of research dedicated to enhancing the explainability of GNNs, particularly in critical domains such as medicine. A promising approach is the self-explaining method, which outputs explanations along with predictions. However, existing self-explaining models require a large amount of training data, rendering them unavailable in few-shot scenarios. To address this challenge, in this paper, we propose a Meta-learned Self-Explaining GNN (MSE-GNN), a novel framework that generates explanations to support predictions in few-shot settings. MSE-GNN adopts a two-stage self-explaining structure, consisting of an explainer and a predictor. Specifically, the explainer first imitates the attention mechanism of humans to select the explanation subgraph, whereby attention is naturally paid to regions containing important characteristics. Subsequently, the predictor mimics the decision-making process, which makes predictions based on the generated explanation. Moreover, with a novel meta-training process and a designed mechanism that exploits task information, MSE-GNN can achieve remarkable performance on new few-shot tasks. Extensive experimental results on four datasets demonstrate that MSE-GNN can achieve superior performance on prediction tasks while generating high-quality explanations compared with existing methods. The code is publicly available at https://github.com/jypeng28/MSE-GNN.

Keywords:

Explainability Graph Neural Network Meta Learning.

1 Introduction

Due to the widespread presence of graph data in diverse domains [49, 48], Graph Neural Networks (GNNs) [14, 36, 6] are attracting increasing attention from the research community. Leveraging the message passing paradigm, GNNs have exhibited remarkable efficacy across multiple scenes, including molecule property prediction [35], social network analysis [2, 45], and recommender system [4]. Despite these successes, a significant drawback of GNN models is their lack of explainability, making it unavailable for humans to understand the basis of predictions. This limitation undermines the complete trust in GNN predictions, consequently restricting their application in high-stake scenarios including medical [50] and finance [24] fields. Furthermore, the European Union has explicitly emphasized the necessity of explainability for trustworthy AI in [28] and any studies focusing on explainability have been conducted on interpretability in other fields [41, 43]. Therefore, there is an immediate and pressing need for research into the explainability of GNNs.

Towards Few-shot Self-explaining Graph Neural Networks (1)

The field of GNN explainability has witnessed substantial scholarly attention [17, 21, 30, 16]. Generally, research on the explainability of GNN can be divided into two main categories: post-hoc explanations and self-explaining methods [40]. Among them, the post-hoc explanation strives to elucidate the predictions made by a trained GNN model. Typically, this is achieved by leveraging another explanatory model to select a subset of input as the explanation for GNN prediction. Despite their utility, these post-hoc explainers often fall short of revealing the actual reasoning process of the model [25] and require optimization for each input graph, which is time-consuming. Therefore, in this paper, we focus on self-explaining methods.

The self-explaining method refers to intrinsically explainable GNN models that offer predictions and explanations concurrently, with the prediction being rooted in the explanation. One prevalent type of self-explaining model typically adopts a “explainer-predictor” two-stage paradigm, as illustrated in Figure 1. This paradigm contains two stages, one is called the explainer, which generates an explanation for each input graph, and the other is the predictor making predictions based on the generated explanation [37, 17].

Although the self-explaining methods in GNN are promising, they still suffer from heavily relying on extensive training data, which restricts their applicability in situations with limited data sizes.For instance, during new drug discovery processes, clinical trials are conducted to assess various drug attributes such as toxicity and side effects. Due to safety concerns, the number of participants in these trials is restricted, resulting in limited experimental data. In such few-shot scenarios, existing self-explaining models fail to achieve satisfactory performance, while existing few-shot learning methods are lack of explainability. Hence, there is a pressing need to design a self-explaining GNN for few-shot scenarios.

Drawing on the fundamental human intelligence traits of rapid learning and self-explainability [23, 29, 7], we develop Meta-learned Self-Explaining GNN (MSE-GNN) for few-shot scenarios:

  1. I.

    During classification tasks, humans initially concentrate on regions that contain crucial features, and subsequently perform classification based on these features, adhering to a two-stage paradigm [23].

  2. II.

    When learning new concepts, humans tend to seek representative instances or prototypes and compare new instances with these prototypes to categorize them [29].

  3. III.

    Humans can learn meta-knowledge from a multitude of tasks, which enables them to achieve impressive performance on new tasks with limited data, which is called “learn to learn” [7].

By incorporating these attributes into our MSE-GNN, we aim to solve the explainability of GNNs in few-shot scenarios, and then enhance the performance of both explanation and prediction tasks.

Specifically, the MSE-GNN model follows the two-stage paradigm as depicted in Figure 1, which naturally mimics the human’s two-stage recognition process as mentioned inI. Among them, the explainer, which is composed of a GNN encoder and a MLP, predicts the probability of each node being selected as an explanation. Then, node representations encoded by another GNN encoder are separated into explanation and non-explanation based on the prediction of the explainer. Subsequently, the predictor mimics the decision-making process, which makes predictions based on the explanation with a MLP.

Furthermore, the MSE-GNN model incorporates a novel mechanism that exploits task information to help with selecting explanations and making predictions. Prototype, as stated in II, has been proven to be effective to generate representative representations for each category [31, 46]. Therefore, in MSE-GNN, the concept of prototype is utilized in generating task information. The training framework of optimization-based meta-learning imitates the paradigm of “learning to learn” inIII, where models can acquire meta-knowledge by learning from a vast array of tasks. One of the most popular and effective methods is MAML [7] (Model-Agnostic Meta-Learning). Therefore, we design a new meta-training framework based on MAML to train MSE-GNN.

We conduct extensive experiments on one synthetic dataset [39] and three real datasets of graph classification tasks [15, 11], which show excellent performance on both prediction and explanation generated.

2 Problem Definition

In this section, we will elaborate on the problem definition of our research. Following [20], we form the few-shot graph classification problem as N-way K-shot graph classification. Given the dataset 𝒢={(G1,y1),(G2,y2),,(Gn,yn)}𝒢subscript𝐺1subscript𝑦1subscript𝐺2subscript𝑦2subscript𝐺𝑛subscript𝑦𝑛\mathcal{G}=\{(G_{1},y_{1}),(G_{2},y_{2}),...,(G_{n},y_{n})\}caligraphic_G = { ( italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , ( italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }, where Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes a graph with a node set Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and a edge set Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the number of nodes in the graph. The structure feature is represented by an adjacency matrix Aini×nisubscript𝐴𝑖superscriptsubscript𝑛𝑖subscript𝑛𝑖A_{i}\in\mathbb{R}^{n_{i}\times n_{i}}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. The node attribute matrix is represented as Xini×dsubscript𝑋𝑖superscriptsubscript𝑛𝑖𝑑X_{i}\in\mathbb{R}^{n_{i}\times d}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × italic_d end_POSTSUPERSCRIPT, where d𝑑ditalic_d is the dimension of the node attribute.

Then, the dataset is splitted into {Gtrain,ytrain}superscript𝐺𝑡𝑟𝑎𝑖𝑛superscript𝑦𝑡𝑟𝑎𝑖𝑛\{G^{train},y^{train}\}{ italic_G start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT } and {Gtest,ytest}superscript𝐺𝑡𝑒𝑠𝑡superscript𝑦𝑡𝑒𝑠𝑡\{G^{test},y^{test}\}{ italic_G start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT } as training set and test set respectively according to label y𝑦yitalic_y. Where ytrainytest=superscript𝑦𝑡𝑟𝑎𝑖𝑛superscript𝑦𝑡𝑒𝑠𝑡y^{train}\bigcap y^{test}=\varnothingitalic_y start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT ⋂ italic_y start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT = ∅. When training, a task 𝒯𝒯\mathcal{T}caligraphic_T is sampled each time and each task contains support set Dsuptrain=(Gitrain,yitrain)i=1ssuperscriptsubscript𝐷𝑠𝑢𝑝𝑡𝑟𝑎𝑖𝑛superscriptsubscriptsuperscriptsubscript𝐺𝑖𝑡𝑟𝑎𝑖𝑛superscriptsubscript𝑦𝑖𝑡𝑟𝑎𝑖𝑛𝑖1𝑠D_{sup}^{train}={(G_{i}^{train},y_{i}^{train})}_{i=1}^{s}italic_D start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT = ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT and query set Dquetrain=(Gitrain,yitrain)i=1qsuperscriptsubscript𝐷𝑞𝑢𝑒𝑡𝑟𝑎𝑖𝑛superscriptsubscriptsuperscriptsubscript𝐺𝑖𝑡𝑟𝑎𝑖𝑛superscriptsubscript𝑦𝑖𝑡𝑟𝑎𝑖𝑛𝑖1𝑞D_{que}^{train}={(G_{i}^{train},y_{i}^{train})}_{i=1}^{q}italic_D start_POSTSUBSCRIPT italic_q italic_u italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT = ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT, where s𝑠sitalic_s and q𝑞qitalic_q stands for the size of support set and query set respectively. It is noteworthy that the same class space is shared in the same task.

In each task, our goal is to optimize our model on the support set Dsupsubscript𝐷𝑠𝑢𝑝D_{sup}italic_D start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT and make predictions on the query set Dquesubscript𝐷𝑞𝑢𝑒D_{que}italic_D start_POSTSUBSCRIPT italic_q italic_u italic_e end_POSTSUBSCRIPT. If a support set contains N𝑁Nitalic_N classes and K𝐾Kitalic_K data for each class, then we name the problem as N-way K-shot. When testing, we firstly finetune the learned model on support set Dsuptest=(Gitest,yitest)i=1ssuperscriptsubscript𝐷𝑠𝑢𝑝𝑡𝑒𝑠𝑡superscriptsubscriptsuperscriptsubscript𝐺𝑖𝑡𝑒𝑠𝑡superscriptsubscript𝑦𝑖𝑡𝑒𝑠𝑡𝑖1𝑠D_{sup}^{test}={(G_{i}^{test},y_{i}^{test})}_{i=1}^{s}italic_D start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT = ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT and then report the classification performance of finetuned model on Dquetest=(Gitest,yitest)i=1qsuperscriptsubscript𝐷𝑞𝑢𝑒𝑡𝑒𝑠𝑡superscriptsubscriptsuperscriptsubscript𝐺𝑖𝑡𝑒𝑠𝑡superscriptsubscript𝑦𝑖𝑡𝑒𝑠𝑡𝑖1𝑞D_{que}^{test}={(G_{i}^{test},y_{i}^{test})}_{i=1}^{q}italic_D start_POSTSUBSCRIPT italic_q italic_u italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT = ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT. Our goal of the few-shot graph classification problem is to develop a model that can obtain meta-knowledge across {Gtrain,ytrain}superscript𝐺𝑡𝑟𝑎𝑖𝑛superscript𝑦𝑡𝑟𝑎𝑖𝑛\{G^{train},y^{train}\}{ italic_G start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT } and predicts labels for graphs in the query set in test stage Dquetestsuperscriptsubscript𝐷𝑞𝑢𝑒𝑡𝑒𝑠𝑡D_{que}^{test}italic_D start_POSTSUBSCRIPT italic_q italic_u italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT.

In the explanation generation task, for each graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, a node mask vector mi[0,1]ni×1subscript𝑚𝑖superscript01subscript𝑛𝑖1m_{i}\in{[0,1]}^{n_{i}\times 1}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × 1 end_POSTSUPERSCRIPT is the explanation subgraph selected, a higher value means that the corresponding node is more important for making prediction and vice versa. Although selecting edges for explanation is a viable approach, in this paper we focus on node selection due to its computational complexity.

3 The Proposed MSE-GNN

3.1 Architecture of MSE-GNN

In Figure 2, we show the overall architecture of the MSE-GNN, which contains three components: an explainer g𝑔gitalic_g that outputs the explanation selected, a predictor p𝑝pitalic_p making the finalprediction, and a graph encoder f𝑓fitalic_f.

Before we present the details of MSE-GNN, we first clarify serveral concepts. Specifically, existing works often combine self-explaining methods with the concept of rationale [37, 17]. The rationale in graph data refers to the subsets of nodes or subsets of edges, which form subgraphs that determine the prediction. Hence, we posit that explanation and rationale are equivalent, as they share the same concept.

In MSE-GNN, the input graph is encoded by f𝑓fitalic_f and each node v𝑣vitalic_v is encoded into a node embedding h(v)dsubscript𝑣superscript𝑑h_{(v)}\in\mathbb{R}^{d}italic_h start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, where d𝑑ditalic_d stands for the dimension of hidden size. The encoder can be any kind of GNN, e.g. GCN [14], GIN [38], and GraphSAGE [10]. The selector outputs a mask vector m𝑚mitalic_m for each graph as an explanation, which divides the graph into rationale (explanation) Grsubscript𝐺𝑟G_{r}italic_G start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and non-rationale Gnsubscript𝐺𝑛G_{n}italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Then the predictor makes predictions based on the graph embedding rationale subgraph. Meanwhile, augmented graphs that combine rationale and non-rationale from different graphs are fed into the predictor to ensure the robustness of the predictor. We categorize the parameters into fast parameters and slow parameters according to the timing of updating, which will be described in detail in section 3.3.

3.1.1 Task Information.

MSE-GNN generates task information for the explainer and the predictor to facilitate explanation generation and prediction within each task, which is composed of prototypes representing each class.

In each task, a support set is provided, which contains data from multiple classes. We aim to extract prototypes from these data that capture the characteristics of each class in the task, in order to help with task-specific selection of explanations and the classification task. Encoded by encoder f𝑓fitalic_f, each graph is represented by a matrix containing embedding of each node:

Hi=[,h(v),]vViT=f(Gi)|Vi|×d.subscript𝐻𝑖superscriptsubscriptsubscript𝑣𝑣subscript𝑉𝑖𝑇𝑓subscript𝐺𝑖superscriptsubscript𝑉𝑖𝑑H_{i}=[...,h_{(v)},...]_{v\in V_{i}}^{T}=f(G_{i})\in\mathbb{R}^{|V_{i}|\times d}.italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ … , italic_h start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT , … ] start_POSTSUBSCRIPT italic_v ∈ italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = italic_f ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT | italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | × italic_d end_POSTSUPERSCRIPT .(1)

To obtain representation for each graph hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the readout function, e.g. mean pooling is employed, to aggregate node embeddings. By leveraging the concept of prototype learning, we further fuse the graph representations of each class with another readout function. Thus, we can obtain a prototype embedding for each class:

TIc=freadout([,freadout(Hi),]yi=c)d.𝑇subscript𝐼𝑐subscript𝑓𝑟𝑒𝑎𝑑𝑜𝑢𝑡subscriptsubscript𝑓𝑟𝑒𝑎𝑑𝑜𝑢𝑡subscript𝐻𝑖subscript𝑦𝑖𝑐superscript𝑑TI_{c}=f_{readout}([...,f_{readout}(H_{i}),...]_{y_{i}=c})\in\mathbb{R}^{d}.italic_T italic_I start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_r italic_e italic_a italic_d italic_o italic_u italic_t end_POSTSUBSCRIPT ( [ … , italic_f start_POSTSUBSCRIPT italic_r italic_e italic_a italic_d italic_o italic_u italic_t end_POSTSUBSCRIPT ( italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … ] start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_c end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .(2)

For an N-way K-shot classification problem, the task information is formed by concatenating prototypes of N classes. It is worth noting that, task information for each input graph of both Dsupsubscript𝐷𝑠𝑢𝑝D_{sup}italic_D start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT and Dquesubscript𝐷𝑞𝑢𝑒D_{que}italic_D start_POSTSUBSCRIPT italic_q italic_u italic_e end_POSTSUBSCRIPT is composed solely of graphs in Dsupsubscript𝐷𝑠𝑢𝑝D_{sup}italic_D start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT to prevent label leakage.

Towards Few-shot Self-explaining Graph Neural Networks (5)

3.1.2 Explainer.

The explainer is responsible for choosing the explanation subgraph corresponding to each input graph. Specifically, given an input graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the explainer firstly uses another GNN encoder to map each node to another node embedding h(v)superscriptsubscript𝑣h_{(v)}^{\prime}italic_h start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for each node in Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for selection. Then, a MLP is utilized to transform the node embeddings into a soft mask vector mi[0,1]ni×1subscript𝑚𝑖superscript01subscript𝑛𝑖1m_{i}\in{[0,1]}^{n_{i}\times 1}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × 1 end_POSTSUPERSCRIPT, with task information TIc𝑇subscript𝐼𝑐{TI}_{c}italic_T italic_I start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and node embedding h(v)superscriptsubscript𝑣h_{(v)}^{\prime}italic_h start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT concatenated as input:

mi=σ(MLP([,[h(v),TI],]vViT)),subscript𝑚𝑖𝜎𝑀𝐿𝑃superscriptsubscriptsuperscriptsubscript𝑣𝑇𝐼𝑣subscript𝑉𝑖𝑇m_{i}=\sigma(MLP([...,[h_{(v)}^{\prime},TI],...]_{v\in V_{i}}^{T})),italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_σ ( italic_M italic_L italic_P ( [ … , [ italic_h start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_T italic_I ] , … ] start_POSTSUBSCRIPT italic_v ∈ italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ) ,(3)

where σ𝜎\sigmaitalic_σ denotes the sigmoid function. Hence, we can decompose the input graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT into a rationale subgraph and non-rationale subgraph according to misubscript𝑚𝑖m_{i}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT respectively:

Gir={Ai,Ximi}Gin={Ai,Ximi¯},formulae-sequencesubscriptsuperscript𝐺𝑟𝑖subscript𝐴𝑖direct-productsubscript𝑋𝑖subscript𝑚𝑖subscriptsuperscript𝐺𝑛𝑖subscript𝐴𝑖direct-productsubscript𝑋𝑖¯subscript𝑚𝑖G^{r}_{i}=\{A_{i},X_{i}\odot m_{i}\}\qquad G^{n}_{i}=\{A_{i},X_{i}\odot%\overline{m_{i}}\},italic_G start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } italic_G start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ over¯ start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG } ,(4)

where mi¯=𝟏mi¯subscript𝑚𝑖1subscript𝑚𝑖\overline{m_{i}}=\mathbf{1}-m_{i}over¯ start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = bold_1 - italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Meanwhile, given the node embedding h(v)subscript𝑣h_{(v)}italic_h start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT from encoder f𝑓fitalic_f, we can obtain the graph embedding for Girsubscriptsuperscript𝐺𝑟𝑖G^{r}_{i}italic_G start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Ginsubscriptsuperscript𝐺𝑛𝑖G^{n}_{i}italic_G start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT:

hir=freadout(Himi)hin=freadout(Himi¯).formulae-sequencesubscriptsuperscript𝑟𝑖subscript𝑓𝑟𝑒𝑎𝑑𝑜𝑢𝑡direct-productsubscript𝐻𝑖subscript𝑚𝑖subscriptsuperscript𝑛𝑖subscript𝑓𝑟𝑒𝑎𝑑𝑜𝑢𝑡direct-productsubscript𝐻𝑖¯subscript𝑚𝑖h^{r}_{i}=f_{readout}(H_{i}\odot m_{i})\qquad h^{n}_{i}=f_{readout}(H_{i}\odot%\overline{m_{i}}).italic_h start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_r italic_e italic_a italic_d italic_o italic_u italic_t end_POSTSUBSCRIPT ( italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_h start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_r italic_e italic_a italic_d italic_o italic_u italic_t end_POSTSUBSCRIPT ( italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ over¯ start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) .(5)

3.1.3 Predictor and Graph Augmentation.

The predictor takes the graph embedding hhitalic_h as input and makes the final prediction y^=p(h)^𝑦𝑝\hat{y}=p(h)over^ start_ARG italic_y end_ARG = italic_p ( italic_h ) with a MLP. Moreover, we enhance the robustness of the predictor through graph augmentation. Specifically, within the input graph, the rationale component represents the crucial part that determines the category, while the non-rationale component represents the noisy part. By combining the rationale and non-rationale from different graphs in the same task, additional data with noise are generated. Then we assign the label based on rationale. This approach allows us to increase the amount of noisy data, thereby improving the robustness of the predictor. We do the combination operation by adding subgraph embeddings:

h(i,j)=hir+hjny(i,j)=yi,formulae-sequencesubscript𝑖𝑗superscriptsubscript𝑖𝑟superscriptsubscript𝑗𝑛subscript𝑦𝑖𝑗subscript𝑦𝑖h_{(i,j)}=h_{i}^{r}+h_{j}^{n}\qquad y_{(i,j)}=y_{i},italic_h start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT = italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT + italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,(6)

where hirsuperscriptsubscript𝑖𝑟h_{i}^{r}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT denotes rationale from Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and hjnsuperscriptsubscript𝑗𝑛h_{j}^{n}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT means the non-rationale from Gjsubscript𝐺𝑗G_{j}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

Therefore, in addition to task information TI𝑇𝐼TIitalic_T italic_I, the predictor p𝑝pitalic_p receives the embeddings of both the rationale subgraphs hirsuperscriptsubscript𝑖𝑟h_{i}^{r}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPTand the artificially augmented graphs h(i,j)subscript𝑖𝑗h_{(i,j)}italic_h start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT for optimization, and the output are denoted as yi^^subscript𝑦𝑖\hat{y_{i}}over^ start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG and y(i,j)^^subscript𝑦𝑖𝑗\hat{y_{(i,j)}}over^ start_ARG italic_y start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT end_ARG respectively.

Input:Distribution over meta-training tasks: p(𝒯)𝑝𝒯p(\mathcal{T})italic_p ( caligraphic_T ); Local learning rate: η1subscript𝜂1\eta_{1}italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT; Global learning rate: η2subscript𝜂2\eta_{2}italic_η start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT; Local update times: T𝑇Titalic_T.
Output:Meta-trained parameters for encoder and explanation selector:θfsubscript𝜃𝑓\theta_{f}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, θgsubscript𝜃𝑔\theta_{g}italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT, and initialization of parameters for predictor θpsubscript𝜃𝑝\theta_{p}italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT

1:Initialize θ={θf,θg,θp}𝜃subscript𝜃𝑓subscript𝜃𝑔subscript𝜃𝑝\theta=\{\theta_{f},\theta_{g},\theta_{p}\}italic_θ = { italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT } randomly;

2:whilenot convergeddo

3:Sample task 𝒯𝒯\mathcal{T}caligraphic_T with support graphs Dsuptrainsuperscriptsubscript𝐷𝑠𝑢𝑝𝑡𝑟𝑎𝑖𝑛D_{sup}^{train}italic_D start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT and query graphs Dquetrainsuperscriptsubscript𝐷𝑞𝑢𝑒𝑡𝑟𝑎𝑖𝑛D_{que}^{train}italic_D start_POSTSUBSCRIPT italic_q italic_u italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT.

4:Set fast adaptation parameters: θp=θpsuperscriptsubscript𝜃𝑝subscript𝜃𝑝\theta_{p}^{\prime}=\theta_{p}italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT

5:fort = 0 \rightarrow Tdo

6:Evaluate θpsup(θf,θg,θp)subscriptsubscript𝜃𝑝subscript𝑠𝑢𝑝subscript𝜃𝑓subscript𝜃𝑔superscriptsubscript𝜃𝑝\nabla_{\theta_{p}}\mathcal{L}_{sup}(\theta_{f},\theta_{g},\theta_{p}^{\prime})∇ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) by calculating loss via Equation 10

7:Update θpsuperscriptsubscript𝜃𝑝\theta_{p}^{\prime}italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : θpθpη1θpsup(θf,θg,θp)superscriptsubscript𝜃𝑝superscriptsubscript𝜃𝑝subscript𝜂1subscriptsuperscriptsubscript𝜃𝑝subscript𝑠𝑢𝑝subscript𝜃𝑓subscript𝜃𝑔superscriptsubscript𝜃𝑝\theta_{p}^{\prime}\leftarrow\theta_{p}^{\prime}-\eta_{1}\cdot\nabla_{\theta_{%p}^{\prime}}\mathcal{L}_{sup}(\theta_{f},\theta_{g},\theta_{p}^{\prime})italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ ∇ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )

8:endfor

9:Evaluate θque(θf,θg,θp)subscript𝜃subscript𝑞𝑢𝑒subscript𝜃𝑓subscript𝜃𝑔superscriptsubscript𝜃𝑝\nabla_{\theta}\mathcal{L}_{que}(\theta_{f},\theta_{g},\theta_{p}^{\prime})∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_q italic_u italic_e end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) by calculating loss via Equation 10

10:Update θ𝜃\thetaitalic_θ : θθη2θque(θf,θg,θp)𝜃𝜃subscript𝜂2subscript𝜃subscript𝑞𝑢𝑒subscript𝜃𝑓subscript𝜃𝑔superscriptsubscript𝜃𝑝\theta\leftarrow\theta-\eta_{2}\cdot\nabla_{\theta}\mathcal{L}_{que}(\theta_{f%},\theta_{g},\theta_{p}^{\prime})italic_θ ← italic_θ - italic_η start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_q italic_u italic_e end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )

11:endwhile

3.2 Optimization Objective

The optimization objective of MSE-GNN is to achieve both high accuracy in predictions and generate precise explanations, which reveal the underlying reasons behind the predictions. Therefore, we design several types of loss functions and constraints. For the sake of simplicity, we consider a binary classification task without loss of generality.

With the prediction of each rationale graph embedding p(hi)𝑝subscript𝑖p(h_{i})italic_p ( italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and corresponding ground-truth label yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the loss function is defined as:

Lir=yilog(yi^)+(1yi)log(1yi^).superscriptsubscript𝐿𝑖𝑟subscript𝑦𝑖𝑙𝑜𝑔^subscript𝑦𝑖1subscript𝑦𝑖𝑙𝑜𝑔1^subscript𝑦𝑖L_{i}^{r}=y_{i}log(\hat{y_{i}})+(1-y_{i})log(1-\hat{y_{i}}).italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_l italic_o italic_g ( over^ start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) + ( 1 - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_l italic_o italic_g ( 1 - over^ start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) .(7)

For the artificially augmented graph, our aim is to minimize the prediction values for instances of the same category while maximizing the prediction values for instances of different categories. To achieve this, we employ a contrastive loss function. For example, for a 2-way K-shot classification task, we can obtain 4K24superscript𝐾24K^{2}4 italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT augmented graphs, where each rationale graph is combined with other 2K12𝐾12K-12 italic_K - 1 non-rationales, then the loss is computed as:

Lia=12k1j=1j=2K1ij1yi=yjlogexp(yi^yj^)/τk=1k=K1ikexp(yi^yj^)/τ,superscriptsubscript𝐿𝑖𝑎12𝑘1superscriptsubscript𝑗1𝑗2𝐾subscript1𝑖𝑗subscript1subscript𝑦𝑖subscript𝑦𝑗^subscript𝑦𝑖^subscript𝑦𝑗𝜏superscriptsubscript𝑘1𝑘𝐾subscript1𝑖𝑘^subscript𝑦𝑖^subscript𝑦𝑗𝜏L_{i}^{a}=-\frac{1}{2k-1}\sum_{j=1}^{j=2K}1_{i\neq j}\cdot 1_{y_{i}=y_{j}}\log%\frac{\exp(\hat{y_{i}}\cdot\hat{y_{j}})/\tau}{\sum_{k=1}^{k=K}1_{i\neq k}\exp(%\hat{y_{i}}\cdot\hat{y_{j}})/\tau},italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT = - divide start_ARG 1 end_ARG start_ARG 2 italic_k - 1 end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j = 2 italic_K end_POSTSUPERSCRIPT 1 start_POSTSUBSCRIPT italic_i ≠ italic_j end_POSTSUBSCRIPT ⋅ 1 start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log divide start_ARG roman_exp ( over^ start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ⋅ over^ start_ARG italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) / italic_τ end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k = italic_K end_POSTSUPERSCRIPT 1 start_POSTSUBSCRIPT italic_i ≠ italic_k end_POSTSUBSCRIPT roman_exp ( over^ start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ⋅ over^ start_ARG italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) / italic_τ end_ARG ,(8)

where τ𝜏\tauitalic_τ is a scalar temperature hyperparameter.

Besides, to address the deviation in the size of rationales, we introduce a penalty based on the number of rationale nodes, the following regularization term is utilized:

Lreg=|1Nminiγ|,superscript𝐿𝑟𝑒𝑔superscriptsubscript1𝑁topsubscript𝑚𝑖subscript𝑛𝑖𝛾L^{reg}=|\frac{1_{N}^{\top}\cdot m_{i}}{n_{i}}-\gamma|,italic_L start_POSTSUPERSCRIPT italic_r italic_e italic_g end_POSTSUPERSCRIPT = | divide start_ARG 1 start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⋅ italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_γ | ,(9)

where γ𝛾\gammaitalic_γ is manually set to control the rationale size. Finally, the total loss function can be formulated as:

L=αrLr+αaLa+αregLreg,𝐿subscript𝛼𝑟superscript𝐿𝑟subscript𝛼𝑎superscript𝐿𝑎subscript𝛼𝑟𝑒𝑔superscript𝐿𝑟𝑒𝑔L=\alpha_{r}\cdot L^{r}+\alpha_{a}\cdot L^{a}+\alpha_{reg}\cdot L^{reg},italic_L = italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋅ italic_L start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT + italic_α start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ⋅ italic_L start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT + italic_α start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT ⋅ italic_L start_POSTSUPERSCRIPT italic_r italic_e italic_g end_POSTSUPERSCRIPT ,(10)

where αrsubscript𝛼𝑟\alpha_{r}italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, αasubscript𝛼𝑎\alpha_{a}italic_α start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, and αregsubscript𝛼𝑟𝑒𝑔\alpha_{reg}italic_α start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT are hypermeters controlling the weight of each loss.

3.3 Meta Training

Inspired by the concept of “learn to learn” [7], we propose a new meta-training framework based on MAML to obtain meta knowledge from various tasks. We denote θfsubscript𝜃𝑓\theta_{f}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, θgsubscript𝜃𝑔\theta_{g}italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT, and θpsubscript𝜃𝑝\theta_{p}italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT as the parameters of encoder, explanation selector, and the predictor. Specifically, MSE-GNN is trained from two procedures. One is global update, which aims to learn the parameters of encoder θfsubscript𝜃𝑓\theta_{f}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, explanation generator θgsubscript𝜃𝑔\theta_{g}italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT, and initialization of the predictor θpsubscript𝜃𝑝\theta_{p}italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT from different tasks, the other is called local update, which performs fast adaption on new tasks and locally update only parameters of the predictor θpsuperscriptsubscript𝜃𝑝\theta_{p}^{\prime}italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT within each task. According to the timing of updating, we categorize the parameters into fast parameters (θpsubscript𝜃𝑝\theta_{p}italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT) and slow parameters (θfsubscript𝜃𝑓\theta_{f}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and θgsubscript𝜃𝑔\theta_{g}italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT), as shown in Figure 2.

SyntheticMNIST-spMolsiderMoltox21
# Graphs10,00070,0001,4277,831
Avg # nodes74.575.033.618.6
Avg # edges237.8777.070.738.6
# Train tasks / classes55197
# Validate tasks / classes2232
# Test tasks / classes3353

The meta-training process is demonstrated in Algorithm1. Firstly, we sample a task composed of support Dsuptrainsuperscriptsubscript𝐷𝑠𝑢𝑝𝑡𝑟𝑎𝑖𝑛D_{sup}^{train}italic_D start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT and query data Dquetrainsuperscriptsubscript𝐷𝑞𝑢𝑒𝑡𝑟𝑎𝑖𝑛D_{que}^{train}italic_D start_POSTSUBSCRIPT italic_q italic_u italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT for each episode. Then adaption is operated by updating θpsubscript𝜃𝑝\theta_{p}italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT for T times on Dsuptrainsuperscriptsubscript𝐷𝑠𝑢𝑝𝑡𝑟𝑎𝑖𝑛D_{sup}^{train}italic_D start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT, where T is a hyperparameter controlling the number of local updates, which is shown in lines 5-8. With updated θpsuperscriptsubscript𝜃𝑝\theta_{p}^{\prime}italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we utilize the loss on Dquetrainsuperscriptsubscript𝐷𝑞𝑢𝑒𝑡𝑟𝑎𝑖𝑛D_{que}^{train}italic_D start_POSTSUBSCRIPT italic_q italic_u italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT to update θfsubscript𝜃𝑓\theta_{f}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, θgsubscript𝜃𝑔\theta_{g}italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and θpsubscript𝜃𝑝\theta_{p}italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT.

It is important to highlight that, the explainer is trained from a variety of tasks and frozen when optimizing each task, which ensures the stability of the explanation selected across different tasks and prevents over-fitting. Therefore, θfsubscript𝜃𝑓\theta_{f}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and θgsubscript𝜃𝑔\theta_{g}italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT are only updated in the global update and fixed in the local update. While the predictor needs to learn the relationship between features and categories in different tasks based on the generated explanations. As a result, the θpsubscript𝜃𝑝\theta_{p}italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is optimized in the local update to learn the association between features and categories. Hyperparameters of loss computation in line 6 and line 9 can be differently set according to the goal of local and global optimization.

4 Experiments

4.1 Datasets and Experimental Setup

4.1.1 Dataset.

We conduct extensive experiments on four datasets to validate the performance of MSE-GNN: (i) Synthetic: Due to the lack of graph datasets with explanation ground-truth, following [39], we create a synthetic dataset for classification, which contains 10 classes and 500 samples for each class. Each graph is composed of two parts: the rationale part and the non-rationale part. The label of each graph is determined by the rationale part. Therefore, the ground-truth of the explanation subgraph is the rationale part of each graph. (ii) MNIST-sp [15]: MNIST-sp takes the MNIST images and transforms them into 70,000 superpixel graphs. Each graph consists of 75 nodes and is assigned one of 10 class labels. The subgraphs that represent the digits can be interpreted as ground truth explanations. (iii) OGBG-Molsider and OGBG-Moltox21 [11]: These two datasets are molecule datasets from the graph property prediction task on Open Graph Benchmark (OGBG), they contain 27 and 12 binary labels for each graph, which transformed into 27 and 12 binary classification tasks respectively. The dataset statistics are available in Table 1.

AccuracyAUC-ROC
SyntheticMNIST-spOGBG-molsiderOGBG-moltox21
GINGraphSAGEGINGraphSAGEGINGraphSAGEGINGraphSAGE
ProtoNet0.8284±0.058subscript0.8284plus-or-minus0.0580.8284_{\pm{0.058}}0.8284 start_POSTSUBSCRIPT ± 0.058 end_POSTSUBSCRIPT0.8327±0.027subscript0.8327plus-or-minus0.0270.8327_{\pm{0.027}}0.8327 start_POSTSUBSCRIPT ± 0.027 end_POSTSUBSCRIPT0.5736±0.008subscript0.5736plus-or-minus0.0080.5736_{\pm{0.008}}0.5736 start_POSTSUBSCRIPT ± 0.008 end_POSTSUBSCRIPT0.6575±0.034subscript0.6575plus-or-minus0.0340.6575_{\pm{0.034}}0.6575 start_POSTSUBSCRIPT ± 0.034 end_POSTSUBSCRIPT0.5540±0.006subscript0.5540plus-or-minus0.0060.5540_{\pm{0.006}}0.5540 start_POSTSUBSCRIPT ± 0.006 end_POSTSUBSCRIPT0.5468±0.006subscript0.5468plus-or-minus0.0060.5468_{\pm{0.006}}0.5468 start_POSTSUBSCRIPT ± 0.006 end_POSTSUBSCRIPT0.6614±0.009subscript0.6614plus-or-minus0.0090.6614_{\pm{0.009}}0.6614 start_POSTSUBSCRIPT ± 0.009 end_POSTSUBSCRIPT0.6495±0.008subscript0.6495plus-or-minus0.0080.6495_{\pm{0.008}}0.6495 start_POSTSUBSCRIPT ± 0.008 end_POSTSUBSCRIPT
MAML0.8259±0.007subscript0.8259plus-or-minus0.0070.8259_{\pm{0.007}}0.8259 start_POSTSUBSCRIPT ± 0.007 end_POSTSUBSCRIPT0.6409±0.327subscript0.6409plus-or-minus0.3270.6409_{\pm{0.327}}0.6409 start_POSTSUBSCRIPT ± 0.327 end_POSTSUBSCRIPT0.6283±0.012subscript0.6283plus-or-minus0.0120.6283_{\pm{0.012}}0.6283 start_POSTSUBSCRIPT ± 0.012 end_POSTSUBSCRIPT0.6722±0.009subscript0.6722plus-or-minus0.0090.6722_{\pm{0.009}}0.6722 start_POSTSUBSCRIPT ± 0.009 end_POSTSUBSCRIPT0.6219±0.005subscript0.6219plus-or-minus0.0050.6219_{\pm{0.005}}0.6219 start_POSTSUBSCRIPT ± 0.005 end_POSTSUBSCRIPT0.6538±0.016subscript0.6538plus-or-minus0.0160.6538_{\pm{0.016}}0.6538 start_POSTSUBSCRIPT ± 0.016 end_POSTSUBSCRIPT0.7217±0.030subscript0.7217plus-or-minus0.0300.7217_{\pm{0.030}}0.7217 start_POSTSUBSCRIPT ± 0.030 end_POSTSUBSCRIPT0.6965±0.014subscript0.6965plus-or-minus0.0140.6965_{\pm{0.014}}0.6965 start_POSTSUBSCRIPT ± 0.014 end_POSTSUBSCRIPT
ASMAML0.8911±0.010subscript0.8911plus-or-minus0.0100.8911_{\pm{0.010}}0.8911 start_POSTSUBSCRIPT ± 0.010 end_POSTSUBSCRIPT0.7849±0.014subscript0.7849plus-or-minus0.0140.7849_{\pm{0.014}}0.7849 start_POSTSUBSCRIPT ± 0.014 end_POSTSUBSCRIPT0.6526±0.004subscript0.6526plus-or-minus0.0040.6526_{\pm{0.004}}0.6526 start_POSTSUBSCRIPT ± 0.004 end_POSTSUBSCRIPT0.6699±0.023subscript0.6699plus-or-minus0.0230.6699_{\pm{0.023}}0.6699 start_POSTSUBSCRIPT ± 0.023 end_POSTSUBSCRIPT0.6288±0.007subscript0.6288plus-or-minus0.0070.6288_{\pm{0.007}}0.6288 start_POSTSUBSCRIPT ± 0.007 end_POSTSUBSCRIPT0.6818±0.008subscript0.6818plus-or-minus0.008\bm{0.6818_{\pm{0.008}}}bold_0.6818 start_POSTSUBSCRIPT bold_± bold_0.008 end_POSTSUBSCRIPT0.7432±0.030subscript0.7432plus-or-minus0.0300.7432_{\pm{0.030}}0.7432 start_POSTSUBSCRIPT ± 0.030 end_POSTSUBSCRIPT0.7181±0.017subscript0.7181plus-or-minus0.0170.7181_{\pm{0.017}}0.7181 start_POSTSUBSCRIPT ± 0.017 end_POSTSUBSCRIPT
GREA_Raw0.6970±0.005subscript0.6970plus-or-minus0.0050.6970_{\pm{0.005}}0.6970 start_POSTSUBSCRIPT ± 0.005 end_POSTSUBSCRIPT0.6970±0.020subscript0.6970plus-or-minus0.0200.6970_{\pm{0.020}}0.6970 start_POSTSUBSCRIPT ± 0.020 end_POSTSUBSCRIPT0.6405±0.009subscript0.6405plus-or-minus0.0090.6405_{\pm{0.009}}0.6405 start_POSTSUBSCRIPT ± 0.009 end_POSTSUBSCRIPT0.6667±0.009subscript0.6667plus-or-minus0.0090.6667_{\pm{0.009}}0.6667 start_POSTSUBSCRIPT ± 0.009 end_POSTSUBSCRIPT0.5210±0.009subscript0.5210plus-or-minus0.0090.5210_{\pm{0.009}}0.5210 start_POSTSUBSCRIPT ± 0.009 end_POSTSUBSCRIPT0.5180±0.007subscript0.5180plus-or-minus0.0070.5180_{\pm{0.007}}0.5180 start_POSTSUBSCRIPT ± 0.007 end_POSTSUBSCRIPT0.5654±0.015subscript0.5654plus-or-minus0.0150.5654_{\pm{0.015}}0.5654 start_POSTSUBSCRIPT ± 0.015 end_POSTSUBSCRIPT0.5479±0.006subscript0.5479plus-or-minus0.0060.5479_{\pm{0.006}}0.5479 start_POSTSUBSCRIPT ± 0.006 end_POSTSUBSCRIPT
CAL_Raw0.7248±0.006subscript0.7248plus-or-minus0.0060.7248_{\pm{0.006}}0.7248 start_POSTSUBSCRIPT ± 0.006 end_POSTSUBSCRIPT0.7488±0.007subscript0.7488plus-or-minus0.0070.7488_{\pm{0.007}}0.7488 start_POSTSUBSCRIPT ± 0.007 end_POSTSUBSCRIPT0.6498±0.006subscript0.6498plus-or-minus0.0060.6498_{\pm{0.006}}0.6498 start_POSTSUBSCRIPT ± 0.006 end_POSTSUBSCRIPT0.6670±0.010subscript0.6670plus-or-minus0.0100.6670_{\pm{0.010}}0.6670 start_POSTSUBSCRIPT ± 0.010 end_POSTSUBSCRIPT0.5978±0.044subscript0.5978plus-or-minus0.0440.5978_{\pm{0.044}}0.5978 start_POSTSUBSCRIPT ± 0.044 end_POSTSUBSCRIPT0.6230±0.008subscript0.6230plus-or-minus0.0080.6230_{\pm{0.008}}0.6230 start_POSTSUBSCRIPT ± 0.008 end_POSTSUBSCRIPT0.6161±0.064subscript0.6161plus-or-minus0.0640.6161_{\pm{0.064}}0.6161 start_POSTSUBSCRIPT ± 0.064 end_POSTSUBSCRIPT0.6814±0.014subscript0.6814plus-or-minus0.0140.6814_{\pm{0.014}}0.6814 start_POSTSUBSCRIPT ± 0.014 end_POSTSUBSCRIPT
GREA_Meta0.8728±0.013subscript0.8728plus-or-minus0.0130.8728_{\pm{0.013}}0.8728 start_POSTSUBSCRIPT ± 0.013 end_POSTSUBSCRIPT0.9180±0.002subscript0.9180plus-or-minus0.0020.9180_{\pm{0.002}}0.9180 start_POSTSUBSCRIPT ± 0.002 end_POSTSUBSCRIPT0.6537±0.009subscript0.6537plus-or-minus0.0090.6537_{\pm{0.009}}0.6537 start_POSTSUBSCRIPT ± 0.009 end_POSTSUBSCRIPT0.7430±0.008subscript0.7430plus-or-minus0.0080.7430_{\pm{0.008}}0.7430 start_POSTSUBSCRIPT ± 0.008 end_POSTSUBSCRIPT0.6542±0.005subscript0.6542plus-or-minus0.0050.6542_{\pm{0.005}}0.6542 start_POSTSUBSCRIPT ± 0.005 end_POSTSUBSCRIPT0.6303±0.008subscript0.6303plus-or-minus0.0080.6303_{\pm{0.008}}0.6303 start_POSTSUBSCRIPT ± 0.008 end_POSTSUBSCRIPT0.7650±0.004subscript0.7650plus-or-minus0.0040.7650_{\pm{0.004}}0.7650 start_POSTSUBSCRIPT ± 0.004 end_POSTSUBSCRIPT0.7582±0.007subscript0.7582plus-or-minus0.0070.7582_{\pm{0.007}}0.7582 start_POSTSUBSCRIPT ± 0.007 end_POSTSUBSCRIPT
CAL_Meta0.8451±0.021subscript0.8451plus-or-minus0.0210.8451_{\pm{0.021}}0.8451 start_POSTSUBSCRIPT ± 0.021 end_POSTSUBSCRIPT0.9096±0.003subscript0.9096plus-or-minus0.0030.9096_{\pm{0.003}}0.9096 start_POSTSUBSCRIPT ± 0.003 end_POSTSUBSCRIPT0.6888±0.007subscript0.6888plus-or-minus0.007\bm{0.6888_{\pm{0.007}}}bold_0.6888 start_POSTSUBSCRIPT bold_± bold_0.007 end_POSTSUBSCRIPT0.7445±0.019subscript0.7445plus-or-minus0.019\bm{0.7445_{\pm{0.019}}}bold_0.7445 start_POSTSUBSCRIPT bold_± bold_0.019 end_POSTSUBSCRIPT0.6580±0.012subscript0.6580plus-or-minus0.0120.6580_{\pm{0.012}}0.6580 start_POSTSUBSCRIPT ± 0.012 end_POSTSUBSCRIPT0.6553±0.018subscript0.6553plus-or-minus0.0180.6553_{\pm{0.018}}0.6553 start_POSTSUBSCRIPT ± 0.018 end_POSTSUBSCRIPT0.7442±0.012subscript0.7442plus-or-minus0.0120.7442_{\pm{0.012}}0.7442 start_POSTSUBSCRIPT ± 0.012 end_POSTSUBSCRIPT0.7652±0.005subscript0.7652plus-or-minus0.0050.7652_{\pm{0.005}}0.7652 start_POSTSUBSCRIPT ± 0.005 end_POSTSUBSCRIPT
MSE-GNN0.9103±0.004subscript0.9103plus-or-minus0.004\bm{0.9103_{\pm{0.004}}}bold_0.9103 start_POSTSUBSCRIPT bold_± bold_0.004 end_POSTSUBSCRIPT0.9200±0.004subscript0.9200plus-or-minus0.004\bm{0.9200_{\pm{0.004}}}bold_0.9200 start_POSTSUBSCRIPT bold_± bold_0.004 end_POSTSUBSCRIPT0.6515±0.008subscript0.6515plus-or-minus0.0080.6515_{\pm{0.008}}0.6515 start_POSTSUBSCRIPT ± 0.008 end_POSTSUBSCRIPT0.7309±0.009subscript0.7309plus-or-minus0.0090.7309_{\pm{0.009}}0.7309 start_POSTSUBSCRIPT ± 0.009 end_POSTSUBSCRIPT0.6673±0.007subscript0.6673plus-or-minus0.007\bm{0.6673_{\pm{0.007}}}bold_0.6673 start_POSTSUBSCRIPT bold_± bold_0.007 end_POSTSUBSCRIPT0.6587±0.002subscript0.6587plus-or-minus0.0020.6587_{\pm{0.002}}0.6587 start_POSTSUBSCRIPT ± 0.002 end_POSTSUBSCRIPT0.7735±0.006subscript0.7735plus-or-minus0.006\bm{0.7735_{\pm{0.006}}}bold_0.7735 start_POSTSUBSCRIPT bold_± bold_0.006 end_POSTSUBSCRIPT0.7728±0.011subscript0.7728plus-or-minus0.011\bm{0.7728_{\pm{0.011}}}bold_0.7728 start_POSTSUBSCRIPT bold_± bold_0.011 end_POSTSUBSCRIPT

4.1.2 Experimental Setup.

To investigate whether generating explanations can help with the classification task, we chose three few-shot learning methods: ProtoNet [29], MAML [7], ASMAML [20]. To compare with existing self-explaining methods, we selected two state-of-the-art self-explaining models: GREA [17] and CAL [30] as baselines to compare the performance of classification and quality of generated explanations. Moreover, for fairness, we adapt meta-training to GREA [17] and CAL [30], enabling them to adapt to few-shot scenarios, which are denoted as GREA_Meta and CAL_Meta respectively.

We use GIN and GraphSAGE as GNN backbones for all methods. The performance of all models is evaluated on Dquetestsuperscriptsubscript𝐷𝑞𝑢𝑒𝑡𝑒𝑠𝑡D_{que}^{test}italic_D start_POSTSUBSCRIPT italic_q italic_u italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT. For the Synthetic and MNIST-sp with explanation ground-truth, we use Accuracy to evaluate the classification performance and AUC-ROC to evaluate the quality of the explanation selected. For the two molecule datasets, due to the absence of explanation ground-truth, we only evaluate the classification performance using Area under the ROC curve (AUC) following [17]. For meta-training, we utilize Adam optimizer for local and global updates and set local update times T𝑇Titalic_T to 5. Local learning rate η1subscript𝜂1\eta_{1}italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is set to 0.001 and global learning rate η1subscript𝜂1\eta_{1}italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is tuned over {1e-5, 1e-4, 1e-3}. γ𝛾\gammaitalic_γ in Equation 9 is tuned over {0.1, 0.2, 0.3, 0.4, 0.5}, number of GNN layers is tuned over {2,3}. We select hyper-parameters based on related works and grid searches. All our experiments are conducted with one Tesla V100 GPU.

4.1.3 Performance on Synthetic Graphs and MNIST-sp.

To explore whether MSE-GNN can achieve high performance on classification and generate high-quality explanation, we conduct 2-way 5-shot experiments on Synthetic and MNIST-sp datasets which contain ground-truth explanations for each graph. The experimental results are summarized in Table 2 and Table 3. We first compare meta-trained self-explaining baseline models (GREA_Meta, CAL_Meta) with themselves (GREA_Raw, CAL_Raw). We can observe that significant performance boosts are brought by meta-training on both classification and explanation, which indicates that meta-training can leverage the meta-knowledge learned across training tasks effectively on new tasks.

SyntheticMNIST-sp
GINGREA_Raw0.4934±0.006subscript0.4934plus-or-minus0.0060.4934_{\pm{0.006}}0.4934 start_POSTSUBSCRIPT ± 0.006 end_POSTSUBSCRIPT0.4789±0.044subscript0.4789plus-or-minus0.0440.4789_{\pm{0.044}}0.4789 start_POSTSUBSCRIPT ± 0.044 end_POSTSUBSCRIPT
CAL_Raw0.4741±0.0250subscript0.4741plus-or-minus0.02500.4741_{\pm{0.0250}}0.4741 start_POSTSUBSCRIPT ± 0.0250 end_POSTSUBSCRIPT0.4395±0.039subscript0.4395plus-or-minus0.0390.4395_{\pm{0.039}}0.4395 start_POSTSUBSCRIPT ± 0.039 end_POSTSUBSCRIPT
GREA_Meta0.6745±0.0265subscript0.6745plus-or-minus0.02650.6745_{\pm{0.0265}}0.6745 start_POSTSUBSCRIPT ± 0.0265 end_POSTSUBSCRIPT0.7855±0.013subscript0.7855plus-or-minus0.0130.7855_{\pm{0.013}}0.7855 start_POSTSUBSCRIPT ± 0.013 end_POSTSUBSCRIPT
CAL_Meta0.6201±0.0550subscript0.6201plus-or-minus0.05500.6201_{\pm{0.0550}}0.6201 start_POSTSUBSCRIPT ± 0.0550 end_POSTSUBSCRIPT0.1707±0.0243subscript0.1707plus-or-minus0.02430.1707_{\pm{0.0243}}0.1707 start_POSTSUBSCRIPT ± 0.0243 end_POSTSUBSCRIPT
MSE-GNN0.7000±0.006subscript0.7000plus-or-minus0.006\bm{0.7000_{\pm{0.006}}}bold_0.7000 start_POSTSUBSCRIPT bold_± bold_0.006 end_POSTSUBSCRIPT0.8222±0.030subscript0.8222plus-or-minus0.030\bm{0.8222_{\pm{0.030}}}bold_0.8222 start_POSTSUBSCRIPT bold_± bold_0.030 end_POSTSUBSCRIPT
GraghsageGREA_Raw0.4929±0.023subscript0.4929plus-or-minus0.0230.4929_{\pm{0.023}}0.4929 start_POSTSUBSCRIPT ± 0.023 end_POSTSUBSCRIPT0.5496±0.064subscript0.5496plus-or-minus0.0640.5496_{\pm{0.064}}0.5496 start_POSTSUBSCRIPT ± 0.064 end_POSTSUBSCRIPT
CAL_Raw0.5080±0.054subscript0.5080plus-or-minus0.0540.5080_{\pm{0.054}}0.5080 start_POSTSUBSCRIPT ± 0.054 end_POSTSUBSCRIPT0.4906±0.116subscript0.4906plus-or-minus0.1160.4906_{\pm{0.116}}0.4906 start_POSTSUBSCRIPT ± 0.116 end_POSTSUBSCRIPT
GREA_Meta0.7099±0.014subscript0.7099plus-or-minus0.0140.7099_{\pm{0.014}}0.7099 start_POSTSUBSCRIPT ± 0.014 end_POSTSUBSCRIPT0.6513±0.040subscript0.6513plus-or-minus0.0400.6513_{\pm{0.040}}0.6513 start_POSTSUBSCRIPT ± 0.040 end_POSTSUBSCRIPT
CAL_Meta0.6858±0.015subscript0.6858plus-or-minus0.0150.6858_{\pm{0.015}}0.6858 start_POSTSUBSCRIPT ± 0.015 end_POSTSUBSCRIPT0.6613±0.229subscript0.6613plus-or-minus0.2290.6613_{\pm{0.229}}0.6613 start_POSTSUBSCRIPT ± 0.229 end_POSTSUBSCRIPT
MSE-GNN0.7189±0.012subscript0.7189plus-or-minus0.012\bm{0.7189_{\pm{0.012}}}bold_0.7189 start_POSTSUBSCRIPT bold_± bold_0.012 end_POSTSUBSCRIPT0.7077±0.038subscript0.7077plus-or-minus0.038\bm{0.7077_{\pm{0.038}}}bold_0.7077 start_POSTSUBSCRIPT bold_± bold_0.038 end_POSTSUBSCRIPT

On Synthetic, MSE-GNN shows superiority to other baseline methods on the performance of classification and explanation quality. Compared to meta-trained self-explaining baselines, MSE-GNN performs better on both classification and explanation as MSE-GNN utilizes task information and effectively leverages the augmented graph through the introduction of supervised contrastive loss. Moreover, the inherent denoising capability of self-explaining models contributes to the superior classification performance of MSE-GNN compared to ProtoNet, MAML, and ASMAML.

Unexpectedly, CAL achieves the best classification performance on MNIST-sp, especially when using GIN as the backbone, surpassing MSE-GNN by over 5%. Meanwhile, the quality of explanations is significantly lower compared to GREA and MSE-GNN. By visualization in Figure 3, which reveals the internal reasoning process of models, we can find that CAL generated explanations that were opposite to our expectations, indicating that CAL infers the digit based on the shape of the background. It is also easy to understand that the digital in a picture can be inferred from the background since the number part and the background part are complementary sets. Therefore, despite the generated explanations being contrary to our expectations, CAL’s performance demonstrated that utilizing background information for digit prediction is more effective on MNIST-sp. The reason for CAL generating opposite explanations is that it lacks constraints on the size of the explanation. As a result, it tends to favor subgraphs that contain more useful information and overlook the size of the explanation subgraph. Furtherly comparing the visualization of explanations of MSE-GNN and GREA, we can find that explanations of MSE-GNN are more compact and focus more on the digital part, which is in line with the result in Table 3.

Towards Few-shot Self-explaining Graph Neural Networks (6)
Towards Few-shot Self-explaining Graph Neural Networks (7)
Towards Few-shot Self-explaining Graph Neural Networks (8)
Towards Few-shot Self-explaining Graph Neural Networks (9)

4.1.4 Performance on OGBG.

MSE-GNN achieves comparable classification performance on these two molecule datasets, demonstrating the effectiveness of its structure. Furthermore, we can observe that the self-explaining models with meta-training outperform all meta-learning models except on OGBG-molsider using GraphSAGE. This is because the process of generating explanations can potentially improve the classification task by eliminating irrelevant noise.

4.1.5 Performance with Different Size of Support Set.

Intuitively, for a classification task, the size of the training set has a significant impact on the model’s performance. Therefore, in the scenario of few-shot learning, we evaluate the performance of MSE-GNN and other self-explaining models under different support set sizes. Experimental results are shown in Figure 5. First, comparing different methods, we observe that MSE-GNN consistently outperforms other baselines across different support set sizes, which further validates the performance of MSE-GNN on both classification and explaining. Next, comparing the performance of MSE-GNN across different support set sizes, we observe that as the support set size increases, both the classification accuracy and the quality of generated explanations improve. This also demonstrates the importance of training set size on model performance.

Towards Few-shot Self-explaining Graph Neural Networks (10)

4.1.6 Ablation Study.

Table 4 demonstrates the impact of contrastive loss and task information utilized in MSE-GNN on Synthetic with GIN. When applying Contrastive Loss (CL), both the classification accuracy and the quality of generated explanations of the model are improved. This indicates that introducing contrastive loss can enhance the model’s performance and lead to better results in prediction and explanation tasks. On the other hand, when applying Task Information (TI), the model’s performance is also improved across all datasets. This suggests that incorporating task information into the model can provide additional context and guidance, thereby enhancing the model’s ability. Moreover, when both CL and TI are used together, the model excels significantly across all datasets, indicating that the combination of CL and TI can synergistically contribute to better performance on both classification and explanation tasks.

4.1.7 Sensitivity Analysis.

In MSE-GNN, the parameter γ𝛾\gammaitalic_γ is crucial in controlling the size of the selected explanation. To examine the sensitivity of the model to different values of γ𝛾\gammaitalic_γ, we conduct a sensitivity analysis on the Synthetic and OGBG-Molsider datasets with GIN. As illustrated in Figure 4, the results demonstrate that MSE-GNN achieves the best classification performance when γ𝛾\gammaitalic_γ is set to 0.1 on both datasets, while the explaining performance achieves best when γ𝛾\gammaitalic_γ equals 0.05 on Synthetic. We observe that as the value of γ𝛾\gammaitalic_γ deviates from these two optimal points, the classification performance or the quality of generated explanations decreases. We also notice that the impact of γ𝛾\gammaitalic_γ is less pronounced on the OGBG-Molsider dataset, indicating that the model is less sensitive to γ𝛾\gammaitalic_γ on OGBG-Molsider.

Furthermore, T𝑇Titalic_T, which stands for the number of local update epochs, affects both the effectiveness and efficiency of the MSE-GNN. We compared the performance of MSE-GNN with different local update epochs on the Synthetic and OGBG-Molsider datasets. The experimental results shown in Figure 6 indicate that when T𝑇Titalic_T is set to 5, MSE-GNN achieves the best classification and explaining performance on both Synthetic and OGBG-molsider. A too-small (too-large) T𝑇Titalic_T may result in underfitting (overfitting) of the model for new tasks.

CLTISyntheticOGBG-molsider
Classif.Explan.Classif.
0.8728±0.013subscript0.8728plus-or-minus0.0130.8728_{\pm{0.013}}0.8728 start_POSTSUBSCRIPT ± 0.013 end_POSTSUBSCRIPT0.6745±0.027subscript0.6745plus-or-minus0.0270.6745_{\pm{0.027}}0.6745 start_POSTSUBSCRIPT ± 0.027 end_POSTSUBSCRIPT0.6542±0.005subscript0.6542plus-or-minus0.0050.6542_{\pm{0.005}}0.6542 start_POSTSUBSCRIPT ± 0.005 end_POSTSUBSCRIPT
0.8809±0.037subscript0.8809plus-or-minus0.0370.8809_{\pm{0.037}}0.8809 start_POSTSUBSCRIPT ± 0.037 end_POSTSUBSCRIPT0.6860±0.028subscript0.6860plus-or-minus0.0280.6860_{\pm{0.028}}0.6860 start_POSTSUBSCRIPT ± 0.028 end_POSTSUBSCRIPT0.6623±0.011subscript0.6623plus-or-minus0.0110.6623_{\pm{0.011}}0.6623 start_POSTSUBSCRIPT ± 0.011 end_POSTSUBSCRIPT
0.8800±0.011subscript0.8800plus-or-minus0.0110.8800_{\pm{0.011}}0.8800 start_POSTSUBSCRIPT ± 0.011 end_POSTSUBSCRIPT0.6766±0.014subscript0.6766plus-or-minus0.0140.6766_{\pm{0.014}}0.6766 start_POSTSUBSCRIPT ± 0.014 end_POSTSUBSCRIPT0.6616±0.001subscript0.6616plus-or-minus0.0010.6616_{\pm{0.001}}0.6616 start_POSTSUBSCRIPT ± 0.001 end_POSTSUBSCRIPT
0.9103±0.004subscript0.9103plus-or-minus0.004\bm{0.9103_{\pm{0.004}}}bold_0.9103 start_POSTSUBSCRIPT bold_± bold_0.004 end_POSTSUBSCRIPT0.7000±0.006subscript0.7000plus-or-minus0.006\bm{0.7000_{\pm{0.006}}}bold_0.7000 start_POSTSUBSCRIPT bold_± bold_0.006 end_POSTSUBSCRIPT0.6673±0.007subscript0.6673plus-or-minus0.007\bm{0.6673_{\pm{0.007}}}bold_0.6673 start_POSTSUBSCRIPT bold_± bold_0.007 end_POSTSUBSCRIPT

Towards Few-shot Self-explaining Graph Neural Networks (11)
Towards Few-shot Self-explaining Graph Neural Networks (12)
Towards Few-shot Self-explaining Graph Neural Networks (13)

5 Related Works

5.0.1 Few-shot learning and Meta Learning on Graph Classification

Few-shot learning aims to learn a model with only a few samples. A promising kind of method is meta learning. Meta learning is also known as “learning to learn", which attempts to learn meta-knowledge from a variety of tasks. There two catogeries for meta-learning [44]: metric-based models [29, 3, 8, 22, 32] and optimization-based models [7, 9, 51, 20, 34]. The former focuses on computing the distance between query data and class prototypes [29]. The latter aims to learn an effective initialization of parameters, which enables rapid adaption [7]. [51] firstly applied meta learning framework to the node classification task. [20] utilize a step controller for the robustness and generalization of meta-learner. Notwithstanding the remarkable accuracy improvement achieved by these methods on few-shot learning tasks, their lack of explainability hinders their applicability in certain scenarios such as the medical and finance area.

5.0.2 Explainability in Graph Neural Network

With more attention paid to the applications of GNNs, the explainability of GNNs is more crucial. The explanation increases the models’ transparency and enhances practitioners’ trust in GNN models by enriching their understanding of why the decision is made by GNNs. Explainability of GNNs can be categorized into two classes [40, 42]: post-hoc explanations and self-explainable GNNs. Post-hoc explanations attempt to give explanations for trained GNNs with additional explainer model [39, 33, 12, 18, 1, 19, 5, 13]. However, these post-hoc explainers often fail to unveil the true reasoning process of the model due to the non-convexity and complexity of the underlying GNN models [25]. Self-explaining GNNs design specific GNN models which are interpretable intrinsically [37, 30, 17, 50, 21, 1]. They output the prediction and corresponding explanation simultaneously. DIR [37] aims to extract causal rationales that remain consistent across various distributions while eliminating unstable spurious patterns. GREA [17] is another self-explainable model that introduces a new augmentation operation called environment replacement that automatically creates virtual data examples to improve rationale identification. Another category of self-explaining models leverages the concept of prototype learning [50, 27, 1, 26, 47]. ProtGNN [50] provides explanations by selecting subgraphs that are the most relevant to graph patterns for identifying graphs of each class. However, existing self-explainable GNNs overlook the scarcity of labeled graph data in many applications. Thus, it’s important to build few-shot learning models with self-explainability.

6 Conclusion

In this paper, we proposed MSE-GNN to address the explainability of GNN in few-shot scenarios. To be specific, MSE-GNN adopted a “explainer-predictor” 2-stage self-explaining structure and a meta-training framework based on meta-learning, which improved performance in few-shot scenarios. MSE-GNN also introduced a mechanism to leverage task information to assist explanation generation and result prediction. Additionally, MSE-GNN employed graph augmentation to enhance model robustness. Extensive experimental results demonstrated that MSE-GNN achieves strong performance in classification tasks while selecting high-quality explanations in few-shot scenarios.

6.0.1 Acknowledgements.

This research was partially supported by supported by Anhui Provincial Natural Science Foundation (No. 2308085QF229), the Fundamental Research Funds for the Central Universities (No. WK2150110034), Technology Innovation Community in Yangtze River Delta (No. 2023CSJZN0200), and the Fundamental Research Funds for the Central Universities.

References

  • [1]Azzolin, S., Longa, A., Barbiero, P., Lio, P., Passerini, A.: Global explainability of gnns via logic combination of learned concepts. In: The Eleventh International Conference on Learning Representations (2022)
  • [2]Bian, T., Xiao, X., Xu, T., Zhao, P., Huang, W., Rong, Y., Huang, J.: Rumor detection on social media with bi-directional graph convolutional networks. In: Proceedings of the AAAI conference on artificial intelligence. vol.34, pp. 549–556 (2020)
  • [3]Chauhan, J., Nathani, D., Kaul, M.: Few-shot learning on graphs via super-classes based on graph spectral measures. In: International Conference on Learning Representations (2019)
  • [4]Chen, L., Wu, L., Hong, R., Zhang, K., Wang, M.: Revisiting graph based collaborative filtering: A linear residual graph convolutional network approach. In: Proceedings of the AAAI conference on artificial intelligence. vol.34, pp. 27–34 (2020)
  • [5]Duval, A., Malliaros, F.D.: Graphsvx: Shapley value explanations for graph neural networks. In: Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part II 21. pp. 302–318. Springer (2021)
  • [6]Dwivedi, V.P., Joshi, C.K., Luu, A.T., Laurent, T., Bengio, Y., Bresson, X.: Benchmarking graph neural networks. Journal of Machine Learning Research 24, 1–48 (2023)
  • [7]Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. pp. 1126–1135. PMLR (2017)
  • [8]Gao, W., Wang, H., Liu, Q., Wang, F., Lin, X., Yue, L., Zhang, Z., Lv, R., Wang, S.: Leveraging transferable knowledge concept graph embedding for cold-start cognitive diagnosis. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 983–992 (2023)
  • [9]Guo, Z., Zhang, C., Yu, W., Herr, J., Wiest, O., Jiang, M., Chawla, N.V.: Few-shot graph learning for molecular property prediction. In: Proceedings of the Web Conference 2021. pp. 2559–2567 (2021)
  • [10]Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017)
  • [11]Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., Leskovec, J.: Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33, 22118–22133 (2020)
  • [12]Huang, Q., Yamada, M., Tian, Y., Singh, D., Chang, Y.: Graphlime: Local interpretable model explanations for graph neural networks. IEEE Transactions on Knowledge and Data Engineering (2022)
  • [13]Kamal, A., Vincent, E., Plantevit, M., Robardet, C.: Improving the quality of rule-based gnn explanations. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 467–482. Springer (2022)
  • [14]Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (2016)
  • [15]Knyazev, B., Taylor, G.W., Amer, M.: Understanding attention and generalization in graph neural networks. Advances in neural information processing systems 32 (2019)
  • [16]Lin, W., Lan, H., Wang, H., Li, B.: Orphicx: A causality-inspired latent variable model for interpreting graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13729–13738 (2022)
  • [17]Liu, G., Zhao, T., Xu, J., Luo, T., Jiang, M.: Graph rationalization with environment-based augmentations. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. pp. 1069–1078 (2022)
  • [18]Lucic, A., TerHoeve, M.A., Tolomei, G., DeRijke, M., Silvestri, F.: Cf-gnnexplainer: Counterfactual explanations for graph neural networks. In: International Conference on Artificial Intelligence and Statistics. pp. 4499–4511. PMLR (2022)
  • [19]Luo, D., Cheng, W., Xu, D., Yu, W., Zong, B., Chen, H., Zhang, X.: Parameterized explainer for graph neural network. Advances in neural information processing systems 33, 19620–19631 (2020)
  • [20]Ma, N., Bu, J., Yang, J., Zhang, Z., Yao, C., Yu, Z., Zhou, S., Yan, X.: Adaptive-step graph meta-learner for few-shot graph classification. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. pp. 1055–1064 (2020)
  • [21]Müller, P., Faber, L., Martinkus, K., Wattenhofer, R.: Dt+ gnn: A fully explainable graph neural network using decision trees. arXiv preprint arXiv:2205.13234 (2022)
  • [22]Niu, G., Li, Y., Tang, C., Geng, R., Dai, J., Liu, Q., Wang, H., Sun, J., Huang, F., Si, L.: Relational learning with gated and attentive neighbor aggregator for few-shot knowledge graph completion. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 213–222 (2021)
  • [23]Posner, M.I., Petersen, S.E.: The attention system of the human brain. Annual review of neuroscience 13, 25–42 (1990)
  • [24]Pourhabibi, T., Ong, K.L., Kam, B.H., Boo, Y.L.: Fraud detection: A systematic literature review of graph-based anomaly detection approaches. Decision Support Systems 133, 113303 (2020)
  • [25]Rudin, C.: Please stop explaining black box models for high stakes decisions. stat 1050, 26 (2018)
  • [26]Seo, S., Kim, S., Park, C.: Interpretable prototype-based graph information bottleneck. Advances in Neural Information Processing Systems 36 (2024)
  • [27]Shin, Y.M., Kim, S.W., Yoon, E.B., Shin, W.Y.: Prototype-based explanations for graph neural networks (student abstract). In: Proceedings of the AAAI Conference on Artificial Intelligence. vol.36, pp. 13047–13048 (2022)
  • [28]Smuha, N.A.: The eu approach to ethics guidelines for trustworthy artificial intelligence. Computer Law Review International 20, 97–106 (2019)
  • [29]Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Advances in neural information processing systems 30 (2017)
  • [30]Sui, Y., Wang, X., Wu, J., Lin, M., He, X., Chua, T.S.: Causal attention for interpretable and generalizable graph classification. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. pp. 1696–1705 (2022)
  • [31]Vuorio, R., Sun, S.H., Hu, H., Lim, J.J.: Multimodal model-agnostic meta-learning via task-aware modulation. Advances in neural information processing systems 32 (2019)
  • [32]Wang, S., Huang, X., Chen, C., Wu, L., Li, J.: Reform: Error-aware few-shot knowledge graph completion. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management. pp. 1979–1988 (2021)
  • [33]Wang, X., Shen, H.W.: Gnninterpreter: A probabilistic generative model-level explanation for graph neural networks. In: The Eleventh International Conference on Learning Representations (2022)
  • [34]Wang, Y., Abuduweili, A., Yao, Q., Dou, D.: Property-aware relation networks for few-shot molecular property prediction. Advances in Neural Information Processing Systems 34, 17441–17454 (2021)
  • [35]Wieder, O., Kohlbacher, S., Kuenemann, M., Garon, A., Ducrot, P., Seidel, T., Langer, T.: A compact review of molecular property prediction with graph neural networks. Drug Discovery Today: Technologies 37, 1–12 (2020)
  • [36]Wu, L., Cui, P., Pei, J., Zhao, L., Guo, X.: Graph neural networks: foundation, frontiers and applications. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. pp. 4840–4841 (2022)
  • [37]Wu, Y., Wang, X., Zhang, A., He, X., Chua, T.S.: Discovering invariant rationales for graph neural networks. In: International Conference on Learning Representations (2021)
  • [38]Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2018)
  • [39]Ying, Z., Bourgeois, D., You, J., Zitnik, M., Leskovec, J.: Gnnexplainer: Generating explanations for graph neural networks. Advances in neural information processing systems 32 (2019)
  • [40]Yuan, H., Yu, H., Gui, S., Ji, S.: Explainability in graph neural networks: A taxonomic survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)
  • [41]Yue, L., Liu, Q., Du, Y., An, Y., Wang, L., Chen, E.: Dare: disentanglement-augmented rationale extraction. Advances in Neural Information Processing Systems 35, 26603–26617 (2022)
  • [42]Yue, L., Liu, Q., Liu, Y., Gao, W., Yao, F., Li, W.: Cooperative classification and rationalization for graph generalization. In: Proceedings of the ACM Web Conference. vol.2024 (2024)
  • [43]Yue, L., Liu, Q., Wang, L., An, Y., Du, Y., Huang, Z.: Interventional rationalization. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. pp. 11404–11418 (2023)
  • [44]Zhang, C., Ding, K., Li, J., Zhang, X., Ye, Y., Chawla, N.V., Liu, H.: Few-shot learning on graphs: A survey. In: The 31st International Joint Conference on Artificial Intelligence (IJCAI) (2022)
  • [45]Zhang, K., Liu, Q., Qian, H., Xiang, B., Cui, Q., Zhou, J., Chen, E.: Eatn: An efficient adaptive transfer network for aspect-level sentiment analysis. IEEE Transactions on Knowledge and Data Engineering 35(1), 377–389 (2021)
  • [46]Zhang, K., Zhang, H., Liu, Q., Zhao, H., Zhu, H., Chen, E.: Interactive attention transfer network for cross-domain sentiment classification. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol.33, pp. 5773–5780 (2019)
  • [47]Zhang, K., Zhang, K., Zhang, M., Zhao, H., Liu, Q., Wu, W., Chen, E.: Incorporating dynamic semantics into pre-trained language model for aspect-based sentiment analysis. arXiv preprint arXiv:2203.16369 (2022)
  • [48]Zhang, Z., Hu, Q., Yu, Y., Gao, W., Liu, Q.: Fedgt: Federated node classification with scalable graph transformer. arXiv preprint arXiv:2401.15203 (2024)
  • [49]Zhang, Z., Liu, Q., Hu, Q., Lee, C.K.: Hierarchical graph transformer with adaptive node sampling. Advances in Neural Information Processing Systems 35, 21171–21183 (2022)
  • [50]Zhang, Z., Liu, Q., Wang, H., Lu, C., Lee, C.: Protgnn: Towards self-explaining graph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol.36, pp. 9127–9135 (2022)
  • [51]Zhou, F., Cao, C., Zhang, K., Trajcevski, G., Zhong, T., Geng, J.: Meta-gnn: On few-shot node classification in graph meta-learning. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. pp. 2357–2360 (2019)
Towards Few-shot Self-explaining Graph Neural Networks (2024)
Top Articles
U2 - Sweetest Thing Lyrics
Carnival Row - Streams, Episodenguide und News zur Serie
Use Copilot in Microsoft Teams meetings
Elleypoint
Food King El Paso Ads
Optimal Perks Rs3
How Far Is Chattanooga From Here
Tanger Outlets Sevierville Directory Map
Bme Flowchart Psu
Uvalde Topic
Sitcoms Online Message Board
Detroit Lions 50 50
Slope Unblocked Minecraft Game
Programmieren (kinder)leicht gemacht – mit Scratch! - fobizz
Sarpian Cat
Insidekp.kp.org Hrconnect
Water Days For Modesto Ca
Willam Belli's Husband
Red Devil 9664D Snowblower Manual
Isaidup
Aes Salt Lake City Showdown
Soulstone Survivors Igg
Talk To Me Showtimes Near Marcus Valley Grand Cinema
Ecampus Scps Login
Criterion Dryer Review
Dhs Clio Rd Flint Mi Phone Number
Biografie - Geertjan Lassche
Obituaries, 2001 | El Paso County, TXGenWeb
What is Software Defined Networking (SDN)? - GeeksforGeeks
Craigs List Jax Fl
Osrs Important Letter
Experity Installer
Kacey King Ranch
Everything You Need to Know About Ñ in Spanish | FluentU Spanish Blog
Shiftwizard Login Johnston
No Hard Feelings Showtimes Near Tilton Square Theatre
Ewwwww Gif
Streameast.xy2
Noaa Marine Weather Forecast By Zone
2023 Fantasy Football Draft Guide: Rankings, cheat sheets and analysis
Simnet Jwu
Sig Mlok Bayonet Mount
Brother Bear Tattoo Ideas
Maplestar Kemono
Advance Auto.parts Near Me
Espn Top 300 Non Ppr
1990 cold case: Who killed Cheryl Henry and Andy Atkinson on Lovers Lane in west Houston?
Blippi Park Carlsbad
Who Is Nina Yankovic? Daughter of Musician Weird Al Yankovic
Research Tome Neltharus
Diamond Desires Nyc
Emmi-Sellers
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated:

Views: 5700

Rating: 4.1 / 5 (52 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.