Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in 31rd ACM International Conference on Information and Knowledge Management, 2022
Abstract: Masked autoencoder (MAE), as an effective self-supervised learner for computer vision and natural language processing, has been recently applied to molecule representation learning. In this paper, we identify two issues in applying MAE to pre-train Transformer-based models on molecular graphs that existing works have ignored. (1) As only atoms are abstracted as tokens and then reconstructed, the chemical bonds are not decided in the decoded molecule, making molecules with different arrangements of the same atoms indistinguishable. (2) Although a high mask ratio that corresponds to a challenging reconstruction task has been proved beneficial in the vision domain, it cannot be trivially leveraged on molecular graphs as there is less redundancy of information in graph data. To resolve these issues, we propose a novel framework, \textbf{M}olecular \textbf{G}raph \textbf{M}ask \textbf{A}uto\textbf{E}ncoder (MGMAE). As the first step in MGMAE, we transform each molecular graph into a heterogeneous atom-bond graph to fully use the bond attributes and design unidirectional position encoding for such graphs. Then we propose a hybrid masking mechanism that exploits the complementary nature between atoms’ attributive and spatial features. Meanwhile, we compensate for the mask embedding by a dynamic aggregation representation that exploits the correlations between topologically adjacent tokens. As a result, MGMAE can reconstruct the masked atoms, the masked bonds, and the relative distance among atoms simultaneously, with a high mask ratio. We compare MGMAE with the state-of-the-art methods on various molecular benchmarks and show the competitiveness of MGMAE in both regression and classification tasks.
Published in 33rd ACM International Conference on Information and Knowledge Management, 2024
Abstract: With the increasing application of deep learning to solve scientific problems in biochemistry, molecular federated learning has become popular due to its ability to offer distributed privacy-preserving solutions. However, most existing molecular federated learning methods rely on joint training with public datasets, which are difficult to obtain in practice. These methods also fail to leverage multi-modal molecular representations effectively. To address the above issues, we propose a novel framework, \textbf{Fed}erated \textbf{H}eterogeneous \textbf{C}ontrastive \textbf{D}istillation (\textbf{FedHCD}), which enables to jointly train global models from clients with heterogeneous data modalities, learning tasks, and molecular models. To aggregate data representations of different modalities in a data-free manner, we design a global multi-modal contrastive strategy to align the representation of clients without public dataset. Utilizing intrinsic characteristics of molecular data in different modalities, we tackle the exacerbation of local model drift and data Non-IIDness caused by multi-modal clients. We introduce a multi-view contrastive knowledge transfer to extract features from atoms, substructures, and molecules, solving the issue of information distillation failure due to dimensional biases in different data modalities. Our evaluations on eight real-world molecular datasets and ablation experiments show that FedHCD outperforms other state-of-the-art FL methods, irrespective of whether or not they use public datasets.
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.