Cross-lingual Annotation Projection in Legal Texts

Abstract

We study annotation projection in text classification problems where source documents are published in multiple languages and may not be an exact translation of one another. In particular, we focus on the detection of unfair clauses in privacy policies and terms of service. We present the first English-German parallel asymmetric corpus for the task at hand. We study and compare several language-agnostic sentence-level projection methods. Our results indicate that a combination of word embeddings and dynamic time warping performs best.

 

Publication available at

https://doi.org/10.18653/v1/2020.coling-main.79

 

Additional material

Poster for COLING2020: Poster 

Code and corpus repository

 

Cite as

Galassi, A., Drazewski, K., Lippi, M., & Torroni, P. (2020). Cross-lingual Annotation Projection in Legal Texts. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 915-926), DOI: 10.18653/v1/2020.coling-main.79