- Jun 2023
-
www.sciencedirect.com www.sciencedirect.com
-
Learning heterogeneous graph embedding for Chinese legal document similarity
The paper proposes L-HetGRL, an unsupervised approach using a legal heterogeneous graph and incorporating legal domain-specific knowledge, to improve Legal Document Similarity Measurement (LDSM) with superior performance compared to other methods.
-
China's increasing digitization of legal documents has led to a focus on using information technology to extract valuable information efficiently. Legal Document Similarity Measurement (LDSM) plays a vital role in legal assistant systems by identifying similar legal documents. Early approaches relied on text content or statistical measures, but recent advances include neural network-based methods and pre-trained language models like BERT. However, these approaches require labeled data, which is expensive and challenging to obtain for legal documents. To address this, the authors propose an unsupervised approach called L-HetGRL, which utilizes a legal heterogeneous graph constructed from encyclopedia knowledge. L-HetGRL integrates heterogeneous content, document structure, and legal domain-specific knowledge. Extensive experiments show the superiority of L-HetGRL over unsupervised and even supervised methods, providing promising results for legal document analysis.
-