  1. Jun 2023
    1. Learning heterogeneous graph embedding for Chinese legal document similarity

      The paper proposes L-HetGRL, an unsupervised approach using a legal heterogeneous graph and incorporating legal domain-specific knowledge, to improve Legal Document Similarity Measurement (LDSM) with superior performance compared to other methods.

    2. China's increasing digitization of legal documents has led to a focus on using information technology to extract valuable information efficiently. Legal Document Similarity Measurement (LDSM) plays a vital role in legal assistant systems by identifying similar legal documents. Early approaches relied on text content or statistical measures, but recent advances include neural network-based methods and pre-trained language models like BERT. However, these approaches require labeled data, which is expensive and challenging to obtain for legal documents. To address this, the authors propose an unsupervised approach called L-HetGRL, which utilizes a legal heterogeneous graph constructed from encyclopedia knowledge. L-HetGRL integrates heterogeneous content, document structure, and legal domain-specific knowledge. Extensive experiments show the superiority of L-HetGRL over unsupervised and even supervised methods, providing promising results for legal document analysis.

  2. Oct 2021
  3. Feb 2017
    1. By most contemporary standards the document is an object (physical or electronic) on which information is recorded. It would thus have two dimensions, the medium and the content. But this dual presentation is insufficient: it obscures the social function that lends the documentary function to both medium and contents. A good illustration of this ambiguity can be found in the legal framework for information technology of Quebec.4 Quebec law is interesting in this respect because, it tries to define a document beyond the medium it uses by paying attention to information. We can read in Article 3 of the 2001 law this definition: Information inscribed on a medium constitutes a document. The information is delimited and structured, according to the medium used, by tangible or logical features and is intelligible in the form of words, sounds or images. On the face of it, this passage defines a document only in terms of its medium and of its contents. These contents, moreover, are viewed as independent of the medium. But the appearance is deceptive. On the one hand, it is precisely because the document has a function-that of transmission of evidence-that we need a law to define it. We must, indeed, be sure that the object we are talking about will perform this function in the new digital environment. On the other hand, it is indeed because the content can pass from one medium to another that Quebec has tried to define in law the link between one and the other to ensure that the documentary function is preserved.

      very interesting contemporary legal view of what is a document