AI RESEARCH ANALYSIS
Joint Graph Learning for Robust Causal Inference over Knowledge Graphs
Causal inference is critical for understanding cause-effect relationships in real-world domains. However, applying it over knowledge graphs (KGs) poses unique challenges due to two key issues: missing attributes caused by the Open-World Assumption and interference effects arising from complex relational dependencies among entities. Existing methods often assume fully observed data or fail to model inter-unit dependencies, leading to biased or unreliable effect estimates. We introduce BALU, a joint graph learning framework that addresses both challenges through an end-to-end solution. BALU reformulates the causal inference over KGs as two interconnected tasks: (1) attribute imputation as edge prediction between units (entities) and their attributes, and (2) treatment effect estimation as node prediction that accounts for interference through representation learning. BALU employs Graph Neural Networks (GNNs) to capture attribute similarity and relational structure, enabling both accurate imputation and interference-aware message passing. Experiments on four benchmark datasets show that BALU consistently outperforms state-of-the-art baselines—even when enhanced with strong imputation techniques—demonstrating robust performance in incomplete and relationally complex KGs. These results demonstrate that BALU offers a principled and practical solution for robust causal inference in knowledge-driven domains, empowering data-driven decision-making under real-world conditions of incompleteness and relational complexity.
Authors: Hao Huang, Maria-Esther Vidal
Publication: WSDM '26: Proceedings of the Nineteenth ACM International Conference on Web Search and Data Mining (February 2026)
Executive Impact & Strategic Value
The paper introduces BALU, a graph learning framework for robust causal inference over Knowledge Graphs (KGs). It addresses two key challenges: attribute incompleteness (missing data) and relational interference (dependencies between entities). BALU unifies attribute imputation via edge prediction and treatment effect estimation via node prediction, using Graph Neural Networks (GNNs). Experimental results demonstrate BALU's superior performance over state-of-the-art baselines on various datasets, even with strong imputation techniques. This framework provides accurate and reliable causal effect estimation in incomplete and relationally complex KGs, supporting data-driven decision-making.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Problem Statement
Applying causal inference over Knowledge Graphs (KGs) is challenged by attribute incompleteness (missing data due to Open-World Assumption) and relational interference (complex dependencies between entities). Traditional methods assume fully observed data and unit independence, leading to biased estimates. This work aims to estimate Individual Treatment Effect (ITE) and Average Treatment Effect (ATE) for units in KGs while addressing these challenges. A motivating example highlights how missing data and interference can lead to underestimated treatment effects and biased populations. The proposed solution, BALU, jointly performs attribute imputation and causal effect estimation using graph learning.
Methodology
BALU (Bipartite Attribute and Link-based Unit learning) is a graph learning framework with three main components:
1. Graph Representation: Models units and contextual attributes into a bipartite graph, with attribute values as edge labels. Defines observed attributes, treatment (PT), and outcome (PY).
2. Data Imputation Component: An L-layer neural network learns node embeddings for units and attributes, and edge embeddings for attribute edges. It uses Unit-Attribute Message Passing and Relational Message Passing, followed by Edge Embedding Updating. A Feedforward Neural Network (FNN) predicts missing attribute values, integrating them into contextual representations.
3. Causal Estimation Component: Takes enriched contextual representations and estimates ITEs. Interference is modeled using a K-layer Graph Neural Network (GNN) that aggregates causal influence from neighbors. It involves node prediction tasks for treatment assignment and potential outcome estimation, optimized by a joint loss function including cross-entropy and Wasserstein-1 distance.
Enterprise Process Flow
Experimental Results
BALU was evaluated on synthetic (Instagram, YouTube) and semi-synthetic (BlogCatalog, Flickr) datasets under data-complete (pmiss=0.0) and data-incomplete (pmiss>0.0) scenarios.
Q1: Unit relatedness & CI performance: BALU consistently outperforms all baselines (T-/X-/R-Learner, CausalForest, GNN-HSIC, SAGE-HSIC, NetDeconf, SPNet) in data-complete scenarios, achieving RMSE and MAE approximately one order of magnitude lower on synthetic data, confirming that modeling unit similarity via relationships enhances CI.
Q2: Imputation & CI enhancement: Imputation generally improves performance (5-20% gains in RMSE, 10-40% in MAE). BALU and BALU(-edge) consistently outperform all baselines, even with strong imputation techniques.
Q3: Relational signals & Imputation: Relational signals significantly boost imputation performance. While edge embeddings are helpful, BALU maintains strong performance without them; removing relationships in imputation leads to noticeable drops on semi-synthetic datasets. Statistical significance tests confirm BALU's superior performance for EMAE across all datasets and scenarios, and for RMSE in most comparisons.
| Feature | Traditional CI Methods | BALU Framework |
|---|---|---|
| Missing Data Handling |
|
|
| Interference Modeling |
|
|
| Data Completeness Assumption |
|
|
| Performance on KGs (with missing data) |
|
|
Conclusion
BALU is a novel framework for robust causal inference over KGs, integrating data imputation and interference-aware causal estimation. It addresses attribute incompleteness and relational interference, outperforming state-of-the-art baselines. Future work includes handling missing relationships, other missingness patterns (MAR, MNAR), tabular data settings, multi-type entities, multi-hop causal effects, and deeper exploitation of KG semantics.
Calculate Your Potential ROI with BALU
Estimate the impact of robust causal inference on your enterprise operations.
Your BALU Implementation Roadmap
A typical phased approach to integrate robust causal inference into your existing knowledge graph infrastructure.
Data Preparation & Graph Construction
Construct bipartite graph from KG, initialize node and edge embeddings.
Attribute Imputation
Train L-layer GNN for message passing, predict missing attributes using FNN.
Contextual Representation & Interference Modeling
Form enriched representations, model interference via K-layer GNN.
Causal Effect Estimation
Predict treatment assignments and potential outcomes using FNNs, calculate ITE and ATE.
Evaluation & Refinement
Assess performance using RMSE and MAE, refine model based on empirical results.
Ready to Transform Your Data-Driven Decisions?
Leverage the power of robust causal inference with BALU to gain unparalleled insights from your knowledge graphs. Our experts are ready to guide you.