LeNER-Br: a Dataset for Named Entity Recognition in Brazilian Legal Text

This page holds the dataset and source code described in the paper below, which was generated as a collaboration between two institutions of the University of Brasília: NEXT (Núcleo de P&D para Excelência e Transformação do Setor Público) and CiC (Departamento de Ciência da Computação).

      @InProceedings{luz_etal_propor2018,
          author = {Pedro H. {Luz de Araujo} and Te\'{o}filo E. {de Campos} and
          Renato R. R. {de Oliveira} and Matheus Stauffer and
          Samuel Couto and Paulo Bermejo},
          title = {LeNER-Br: a Dataset for Named Entity Recognition in Brazilian Legal Text},
          booktitle = {International Conference on the Computational Processing of Portuguese
          ({PROPOR})},
	  publisher = {Springer},
	  series = {Lecture Notes on Computer Science ({LNCS})},
	  pages = {313--323},
          year = {2018},
          month = {September 24-26},
          address = {Canela, RS, Brazil},	  
	  doi = {10.1007/978-3-319-99722-3_32},
	  url = {https://cic.unb.br/~teodecampos/LeNER-Br/},
	  }	  
      

We also provide the LSTM-CRF model described in the paper, which achieved an average f1-score of 92.53% on the test set.

The sections below describe the requirements and the dataset and model files.

We kindly request that users cite our paper in any publication that is generated as a result of the use of our source code, our dataset or our pre-trained models.

Dowload

Follow this link to download all files linked from this page (93.6MB). Links to individual files are available below.

Requirements

  1. Python 3
  2. pip

LeNER-Br Dataset

The directory structure is as follows:

Model

The model code is adapted from this repo and implements a NER model using Tensorflow (LSTM + CRF + chars embeddings). All code files modified are marked as such at the beginning. The section below summarizes the use of the model. For more in depth explanations of how to use the model and change its configurations refer to the README of the original implementation.

Evaluation