Revue semestrielle de linguistique et littératures romanes

Écho des études romanes 2022, 18(1):49-65 | DOI: 10.32725/eer.2022.004

Expressions référentielles et chaînes de référence en français : le projet Democrat et son exploration des rapports entre linguistique textuelle et linguistique de corpusFrench

Frédéric LANDRAGIN
CNRS, Laboratoire Lattice

Mots clés: expressions référentielles, coréférence, chaînes de référence, corpus annoté

Referring expressions and coreference chains in French: the Democrat project and its exploration of links between textual linguistics and corpus linguistics

We present the Democrat project, “Description, modelling and automatic detection of coreference chains in French,” and its four objectives, that is to provide: (i) an integrated, discursive, diachronic and inter-genre description of coreference chains; (ii) a corpus of written French texts with annotated coreference chains; (iii) several tools for visualizing and exploring the coreference chains; (iv) two NLP systems that are able to process raw text written in French and to extract referring expressions as well as coreference chains – which have also brought innovations to the field of deep learning. We present the main results of Democrat and we describe the work steps that made it possible to obtain them, in particular the corpus, which was manually annotated by forty members of the project.

Keywords: referring expressions, coreference, coreference chains, annotated corpus

Accepted: October 14, 2022; Published: November 14, 2022  Show citation

ACS AIP APA ASA Harvard Chicago Chicago Notes IEEE ISO690 MLA NLM Turabian Vancouver
LANDRAGIN, F. (2022). Expressions référentielles et chaînes de référence en français : le projet Democrat et son exploration des rapports entre linguistique textuelle et linguistique de corpus. Écho des études romanes18(1), 49-65. doi: 10.32725/eer.2022.004
Download citation

References

  1. CHAROLLES Michel (2002), La référence et les expressions référentielles en français, Paris-Gap, Ophrys.
  2. CHASTAIN Charles (1975), Reference and Context, in : GUNDERSON Keith (éd.), Language Mind and Knowledge, Minneapolis, University of Minnesota Press, p. 194-269.
  3. CORBLIN Francis (1995), Les formes de reprise dans le discours. Anaphores et chaînes de référence, Rennes, Presses Universitaires de Rennes.
  4. DELABORDE Marine, LANDRAGIN Frédéric (2019), En quoi le pronom on a-t-il une valeur anaphorique ? Le cas des successions d'occurrences de on, Cahiers de praxématique 72, p. 1-19. Go to original source...
  5. DÉSOYER Adèle, LANDRAGIN Frédéric, TELLIER Isabelle, LEFEUVRE Anaïs, ANTOINE Jean-Yves, DINARELLI Marco (2018), Coreference resolution for French oral data: Machine learning experiments with ANCOR, in : Computational Linguistics and Intelligent Text Processing, Seventeenth International Conference (CICLing 2016, Konya, Turquie), Berlin, Springer Verlag, p. 507-519. Go to original source...
  6. GROBOL Loïc (2019), Neural Coreference Resolution with Limited Lexical Context and Explicit Mention Detection for Oral French, in : Second Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC19-NAACL), Minneapolis, United States. Go to original source...
  7. HEIDEN Serge (2010), The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme, in : Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, Waseda University, Sendai, Japan, p. 389-398.
  8. KANTOR Ben, GLOBERSON Amir (2019), Coreference Resolution with Entity Equalization, in : Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, p. 673-677. Go to original source...
  9. KARTTUNEN Lauri (1976), Discourse Referents, in : MCCAWLEY James D. (éd.), Syntax and Semantics 7, New York, Academic Press, p. 363-385. Go to original source...
  10. LANDRAGIN Frédéric (2016), Conception d'un outil de visualisation et d'exploration de chaînes de coréférences, in : Proceedings of the Thirteen International Conference on Statistical Analysis of Textual Data (JADT 2016), Nice, France, p. 109-120.
  11. LANDRAGIN Frédéric (éd.) (2019), Democrat Corpus, https://hdl.handle.net/11403/democrat.
  12. LANDRAGIN Frédéric (2021), Méthodologie pour la préparation d'une campagne d'annotation manuelle d'expressions référentielles, in : FREROT Cécile, PECMAN Mojca (éds), Des corpus numériques à l'analyse linguistique en langues de spécialité, Grenoble, UGA Éditions, p. 37-60. Go to original source...
  13. LANDRAGIN Frédéric, POIBEAU Thierry, VICTORRI Bernard (2012), ANALEC: a New Tool for the Dynamic Annotation of Textual Data, in : Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, p. 357-362.
  14. LANDRAGIN Frédéric, SCHNEDECKER Catherine (éd.) (2014), Les chaînes de référence, Langages 195, Paris, Larousse. Go to original source...
  15. LEE Kenton, HE Luheng, LEWIS Mike, ZETTLEMOYER Luke (2017), End-to-end Neural Coreference Resolution, in : Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, p. 188-197. Go to original source...
  16. LEE Kenton, HE Luheng, ZETTLEMOYER Luke (2018), Higher-Order Coreference Resolution with Coarse-to-Fine Inference, in : Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, ACL, New Orleans, Louisiana, Vol. 2, p. 687-692. MUZERELLE Judith, LEFEUVRE Anaïs, ANTOINE Jean-Yves, SCHANG Emmanuel, MAUREL Denis, VILLANEAU Jeanne & ESHKOL Iris (2013), ANCOR, premier corpus de français parlé d'envergure annoté en coréférence et distribué librement, in : Actes de la vingtième Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2013), Les Sables-d'Olonne, p. 555-563. Go to original source...
  17. OBERLÉ Bruno (2018), SACR: A Drag-and-Drop Based Tool for Coreference Annotation, in : Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), Miyazaki, Japan, p. 389-394.
  18. OGRODNICZUK Maciej, G£OWIÑSKA Katarzina, KOPEÆ Mateusz, SAVARY Agata, ZAWIS£AWSKA Magdalena (2015), Coreference in Polish: Annotation, Resolution and Evaluation, Berlin, Walter De Gruyter.
  19. QUIGNARD Matthieu, HEIDEN Serge, LANDRAGIN Frédéric, DECORDE Matthieu (2018), Textometric Exploitation of Coreference-annotated Corpora with TXM: Methodological Choices and First Outcomes, in : Fourteenth International Conference on the Statistical Analysis of Textual Data (JADT 2018), Roma, Italy, p. 610-615.
  20. RECASENS Marta (2010), Coreference: Theory, Annotation, Resolution and Evaluation, PhD thesis, Barcelona, University of Barcelona.
  21. ROUSIER-VERCRUYSSEN Lucie, LANDRAGIN Frédéric (2019), Interdistance et instabilité au sein des chaînes de référence : indices textuels ?, Discours 25, p. 3-32. Go to original source...
  22. SCHNEDECKER Catherine (1997), Nom propre et chaînes de référence, Paris, Klincksieck.
  23. SCHNEDECKER Catherine, GLIKMAN Julie, LANDRAGIN Frédéric (éd.) (2017), Les chaînes de référence en corpus, Langue française 195, Paris, Armand Colin. Go to original source...
  24. VAN DEEMTER Kees, KIBBLE Roger (2000), On Coreferring: Coreference Annotation in MUC and Related Schemes, Computational Linguistics 26(4), p. 615-623. Go to original source...
  25. WIDLÖCHER Antoine (2008), Analyse macro-sémantique des structures rhétoriques du discours : cadre théorique et modèle opératoire, thèse de doctorat, Caen, Université de Caen.
  26. WILKENS Rodrigo, OBERLÉ Bruno, LANDRAGIN Frédéric, TODIRASCU Amalia (2020), French Coreference for Spoken and Written Language, in : Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020), Marseille, France, p. 80-89.

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.