Abstract
Graph based molecular representation learning is essential for accurately
predicting molecular properties in drug discovery and materials science;
however, it faces significant challenges due to the intricate relationships
among molecules and the limited chemical knowledge utilized during training.
While contrastive learning is often employed to handle molecular relationships,
its reliance on binary metrics is insufficient for capturing the complexity of
these interactions. Multimodal fusion has gained attention for property
reasoning, but previous work has explored only a limited range of modalities,
and the optimal stages for fusing different modalities in molecular property
tasks remain underexplored. In this paper, we introduce MMFRL (Multimodal
Fusion with Relational Learning for Molecular Property Prediction), a novel
framework designed to overcome these limitations. Our method enhances embedding
initialization through multimodal pretraining using relational learning. We
also conduct a systematic investigation into the impact of modality fusion at
different stages such as early, intermediate, and late, highlighting their
advantages and shortcomings. Extensive experiments on MoleculeNet benchmarks
demonstrate that MMFRL significantly outperforms existing methods. Furthermore,
MMFRL enables task-specific optimizations. Additionally, the explainability of
MMFRL provides valuable chemical insights, emphasizing its potential to enhance
real-world drug discovery applications.