Skip to main content
Enterprise AI Analysis: LegisSearch: navigating legislation with graphs and large language models

Enterprise AI Analysis

LegisSearch: navigating legislation with graphs and large language models

This analysis explores LegisSearch, an innovative system that combines Knowledge Graph technology, state-of-the-art embedding models, and Large Language Models (LLMs) to enhance the navigation and retrieval of complex legislative information. Specifically implemented for Italian legislation, LegisSearch demonstrates superior performance compared to traditional search methods, providing more precise and context-aware results across diverse thematic areas.

Executive Impact

LegisSearch significantly enhances legal information retrieval, offering a sophisticated approach to navigate complex legislative systems. Its graph- and LLM-based architecture provides superior recall and precision, crucial for legal professionals.

0.71 Recall (R@50)
2.63 DCG (DCG@50)
0.62 Average Precision (AP@20)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Introduction
Graph of Italian Legislation
Graph-enhanced embeddings
LLM-based user input expansion
Retrieval performances

Introduction

A country's legislation comprises large amounts of complex documents, i.e., laws composed of various articles connected through references, a widely used approach to recall, in various forms, previous relevant legislation. Retrieving such large, unstructured collections of documents is a long-standing challenge for information retrieval or recommendation and search systems, whose task is to suggest relevant items to users based on an input query (Van Meteren et al. 2000; Thorat et al. 2015). Traditional applications of information retrieval systems have been developed for datasets of news (Raza and Ding 2022), websites, or books (Mathew et al. 2016), with relatively fewer applications in the legislative area (Bellandi et al. 2022; Wehnert et al. 2024), especially when considering more modern retrieval systems, based on large text embeddings from large language models (LLMs) (Deng 2022; Kanwal et al. 2021) and/or graph technologies, which demonstrated their power in enhancing retrieval results (Zhang et al. 2018). Text embeddings are used to calculate distances between the initial input query and each point in the embedding space – representing textual documents – offering the closest point as a recommendation to the user. Knowledge graphs provide additional context in the retrieval task, allowing the system to have more context in detecting whether a document is relevant to the input query.

Graph of Italian Legislation

Recently, a comprehensive and high-quality knowledge graph of the Italian legislation has been presented in Colombo et al. (2025) and shared in a Zenodo repository. The graph is built on top of an internationally adopted standard for representing legal documents, i.e., the Akoma Ntoso XML standard (Barabucci et al. 2009), which was adopted by the Italian parliament (Palmirani 2021). It adopts the Property Graph data model in the legislative domain, allowing an efficient navigation approach through queries adopting the recently standardized Graph Query Language (International Organization for Standardization 2024). Using this resource as a foundational use case, we propose LegisSearch, a powerful search system that leverages the graph's semantics and structure to enhance information retrieval for laws. In this system, we combine state-of-the-art universal text embeddings with graphs and Large Language Models, which can play a critical role in expanding the user query (Wang et al. 2023b), especially considering in a highly specialized domain as the legal one, whose documents can be significantly more complex than news, articles, or books (Matsyupa et al. 2022).

Graph-enhanced embeddings

The synergy of text embeddings with an underlying Knowledge Graph is a powerful approach that combines structured knowledge with semantic flexibility (Wang et al. 2018; Syed et al. 2022). In our context, the graph can be leveraged to examine the relationships connecting an article or a law, thus complementing the often vague or implicit meanings embedded in raw text that may cite other laws (resp. articles), hence potentially becoming relevant for a target subject. In this work, we propose to create graph node embeddings by leveraging graph query tools to create context-aware representations of articles and laws. Since we use textual embeddings, we adopt a natural language template technique, as introduced in Liu et al. (2024), which utilizes natural language labels to segment distinct fields before creating the embeddings. In particular, we create embeddings for law nodes by querying the Property Graph to derive granular topics for each article that composes the law and by navigating its legal foundation laws and articles to extract additional context.

LLM-based user input expansion

Our system inputs a textual query from a user looking for relevant legislation in a certain thematic area. One of the most promising applications of Large Language Models in information retrieval systems is data augmentation (typically of users' inputs), with additional, strongly related knowledge, which is useful for searching over the vector space (Liu et al. 2024). This is especially beneficial in specific domains employing specialized words and terms highly related to the input but rare/uncommon for the general users. To this aim, we adopt an LLM-based understanding and expansion approach, inspired by the recent work from Wang et al. (2023b), that aims to enrich the textual user input with additional/derived content helpful to a more effective search. We designed a two-step LLM intervention: first, we derive the main topics from the text input by the user. Then, a second LLM expands the list of topics by adding highly related topics.

Retrieval performances

To evaluate the performance of LegisSearch, we compare it against two widely used baseline models in information retrieval: BM25 and TF-IDF. While we know other domain-specific document retrieval approaches, we think they are hardly adaptable to the specific domain. For instance, ASKE (Bellandi et al. 2022) has been adapted to information retrieval in Italian legal court decisions, but (i) its primary aim is multi-label classification, and (ii) it relies on a first step of Elasticsearch (Elasticsearch 2025) repository queries, which should be manually customized according to the thematic area of interest of the user. The results demonstrate that LegisSearch outperforms BM25 and TF-IDF across most evaluation metrics, making it the most effective method for searching legislative acts. Our system achieves the highest recall (R@5, R@20, and R@50) and discounted cumulative gain (DCG@5, DCG@20, and DCG@50), indicating its superior ability to retrieve relevant items and rank them effectively. While TF-IDF performs slightly better in the average precision at shorter thresholds, as reflected by its highest AP@5, LegisSearch is more consistent at higher thresholds. BM25 lags behind both methods, with lower precision, recall, and ranking quality scores. Thus, LegisSearch is optimal for scenarios requiring high recall and ranking effectiveness, as it was the goal of our designed architecture.

0.71 LegisSearch R@50 vs. Baselines

Enterprise Process Flow

User Input & LLM Expansion
Universal Embedding Model
Graph Filtering & Query
Cosine Similarity & Rec Score
Search Results
Feature Traditional IR LegisSearch
Contextual Understanding
  • Keyword-based limitations
  • Graph semantics for deep context
Handling Hidden References
  • Ineffective
  • Lacks support for flexible exploration
  • Navigates multiple citations
  • Reveals hidden relevant rules
Scalability & Complexity
  • Struggles with growing body of laws
  • Manages network complexity effectively
  • Property Graph data structure
Query Refinement
  • Requires fine-tuning input queries
  • LLM-based query expansion
  • Context-aware vector representations

Case Study: Golden Power Legislation Navigation

Understanding and monitoring the implications of Italy's “golden power” regulations is a practical problem. These rules govern state intervention in corporate transactions critical to national interests, often dispersed across various legislative texts and evolving through amendments. A traditional keyword-based search might fail to capture the relationships between legal documents. With LegisSearch, the analyst can explore an interconnected graph of legislative documents to uncover relevant provisions, amendments, and cross-referenced laws, thereby improving the search process and the user's productivity.

Outcome: LegisSearch enabled the expert legal professional to retrieve mostly all relevant laws related to Golden Power, with graph clusters resembling their own groupings, showcasing superior retrieval and contextual insights.

Key Benefit: Improved Search Accuracy for complex, interconnected legal domains like Golden Power.

Calculate Your Potential ROI with AI

Estimate the time and cost savings your enterprise could achieve by implementing advanced AI solutions for legal document processing and retrieval.

AI Efficiency Estimator

Estimated Annual Savings $0
Reclaimed Employee Hours 0

Your AI Implementation Roadmap

A structured approach to integrating LegisSearch and similar AI capabilities into your enterprise workflows.

Phase 1: Discovery & Strategy

Initial consultation to understand your specific legal document challenges, data landscape, and strategic objectives. Define scope, KPIs, and success criteria for AI integration.

Phase 2: Data Preparation & Graph Construction

Assist with data extraction, cleaning, and transformation of legislative documents into the Property Graph data model. Implement LLM-guided topic extraction and embedding generation.

Phase 3: System Customization & Integration

Tailor LegisSearch components, including query expansion, graph-enhanced embeddings, and recommendation scoring, to align with your organization's specific needs and existing systems.

Phase 4: Testing, Validation & Training

Conduct rigorous testing with real-world datasets and legal experts to validate performance. Provide comprehensive training for your team to maximize adoption and utilization.

Phase 5: Continuous Improvement & Scaling

Ongoing monitoring, performance optimization, and updates to the AI models and graph structure. Explore expansion to broader legislative scopes (e.g., European, regional laws).

Ready to Transform Your Legal Research?

Unlock the full potential of your legislative data. Schedule a free 30-minute consultation with our AI experts to explore how LegisSearch can be tailored for your enterprise.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking