Hauptregion der Seite anspringen

Parsing von natürlichen Sprachen im Sommersemester 2018

Um einen Satz in einer natürlichen Sprache maschinell zu verarbeiten, muss dieser in einer geeigneten Form im Computer repräsentiert werden. Diese Vorlesung befasst sich mit der Darstellung von natürlichsprachlichen Sätzen als sogenannte Hybridbäume. Es wird gezeigt, warum ein Hybridbaum eine geeignete Datenstruktur darstellt und mit welchen formalen Modellen ein Satz automatisch in einen Hybridbaum überführt werden kann. Zudem behandelt die Vorlesung, wie verschiedene Grammatikformalismen aus einer repräsentativen Menge von Hybridbäumen gewonnen werden können.

Termine

  • Montags, 2. DS (09:20 – 10:50 Uhr), APB/E006: Vorlesung
  • Donnerstags, 2. DS (09:20 – 10:50 Uhr), APB/E007: Vorlesung
  • Mittwochs, 4. DS (13:00 – 14:30 Uhr), APB/E006: Übung

Die letzte Vorlesung findet am Montag den 02.07.2018 statt. Alle Übungen ab dem 04.07.2018 entfallen; jedoch besteht am 11.07.2018 die Möglichkeit, Fragen zu Vorlesungs- und Übungsstoff zu stellen.

Termine für mündliche Modulprüfungen werden voraussichtlich Ende Juli und Ende September angeboten – Details werden in der Vorlesungen bekannt gegeben.

Wir wollen auf den Machine Translation Marathon hinweisen, einer Art Summer School/Workshop Anfang September in Prag. Hier werden Grundlagen und aktuelle Ansätze der maschinellen Übersetzung vermittelt und durch Lab-Kurse praktisch vertieft. Vorkenntnisse sind nicht erforderlich. Die Teilnahme ist kostenlos aber man muss sich vorab anmelden.

Material

Nur aus dem Netz der TU abrufbar; ggf. über VPN herunterladen.

Übungsaufgaben
Korpora

Weitere Materialien werden im Laufe der Vorlesung zur Verfügung gestellt. Sie können sich vorab anhand des vorherigen Vorlesungsdurchlaufs einen Überblick verschaffen.

Literatur

  1. Abeillé, A., Schabes, Y., and Joshi, A.K. 1990. Using lexicalized TAGs for machine translation. Proc. 13th CoLing, University of Helsinki, Finland, 1–6.
  2. Arnold, A. and Dauchet, M. 1976. Bi-transductions de forêts. Proc. 3rd Int. Coll. Automata, Languages and Programming, Edinburgh University Press, 74–86.
  3. Büchse, M., Nederhof, M.-J., and Vogler, H. 2011. Tree parsing with synchronous tree-adjoining grammars. Proc. 12th Int. Conf. on Parsing Technologies (IWPT 2011), Association for Computational Linguistics, 14–25.
  4. Büchse, M., Geisler, D., Stüber, T., and Vogler, H. 2010. N-best Parsing Revisited. Proceedings of the 2010 Workshop on Applications of Tree Automata in Natural Language Processing, Association for Computational Linguistics, 46–54. [url]
  5. Birkhoff, G. and Lipson, J.D. 1970. Heterogeneous Algebras. Journal of Combinatorial Theory 8(1), 115–133. [doi]
  6. Black, E., Abney, S.P., Flickinger, D., et al. 1991. A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. Proceedings of the Workshop on Speech and Natural Language, Association for Computational Linguistics, 306–311. [doiurl]
  7. Boullier, P. 2000. Range concatenation grammars. Proc. of 6th Int. Workshop on Parsing Technologies (IWPT 2000). [url]
  8. Brainerd, W.S. 1969. Tree generating regular systems. Inform. and Control 14, 217–231. [doi]
  9. Brants, S., Dipper, S., Eisenberg, P., et al. 2004. TIGER: Linguistic Interpretation of a German Corpus. Res. Lang. Comput. 2(4). [doi]
  10. Buchholz, S. and Marsi, E. 2006. CoNLL-X Shared Task on Multilingual Dependency Parsing. Proceedings of the Tenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, 149–164. [url]
  11. Chiang, D. 2007. Hierarchical phrase-based translation. Computational Linguistics 33(2), 201–228.
  12. Deransart, P. and Maluszynski, J. 1985. Relating logic programs and attribute grammars. J. Logic Programming 2, 119–155.
  13. Drewes, F., Gebhardt, K., and Vogler, H. 2016. EM-Training for Weighted Aligned Hypergraph Bimorphisms. Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata, Association for Computational Linguistics, 60–69. [url]
  14. Engelfriet, J. and Schmidt, E.M. 1978. IO and OI.II. J. Comput. System Sci. 16, 1, 67–99.
  15. Fischer, M.J. 1968. Grammars with macro–like productions. .
  16. Forst, M., Bertomeu, N., Crysmann, B., Fouvry, F., Hansen-Schirra, S., and Kordoni, V. 2004. Towards a dependency-based gold standard for German parsers. Proceedings of the 5th Workshop on Linguistically Interpreted Corpora.
  17. Gebhardt, K., Nederhof, M.-J., and Vogler, H. 2017. Hybrid grammars for parsing of discontinuous phrase structures and non-projective dependency structures. Computational Linguistics. [doi]
  18. Gebhardt, K. 2018. Generic refinement of expressive grammar formalisms with an application to discontinuous constituent parsing. Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018). [url]
  19. Giegerich, R. 1988. Composition and evaluation of attribute coupled grammars. Acta Inform. 25, 355–423.
  20. Goguen, J.A., Thatcher, J.W., Wagner, E.G., and Wright, J.B. 1977. Initial algebra semantics and continuous algebras. J. ACM 24, 68–95. [doi]
  21. Graehl, J., Knight, K., and May, J. 2008. Training tree transducers. Computational Linguistics 34, 3, 391–427.
  22. Huang, L. and Chiang, D. 2005. Better K-best Parsing. Proceedings of the Ninth International Workshop on Parsing Technology, Association for Computational Linguistics, 53–64. [url]
  23. Joshi, A.K. and Schabes, Y. 1997. Tree-adjoining grammars. In: G. Rozenberg and A. Salomaa, eds., Handbook of Formal Languages. Springer-Verlag, 69–123.
  24. Kübler, S., McDonald, R., and Nivre, J. 2009. Dependency parsing. Morgan and Claypool Publishers. [doi]
  25. Kallmeyer, L. 2010. Parsing beyond context-free grammars. Springer. [doi]
  26. Kallmeyer, L. and Maier, W. 2010. Data-driven parsing with probabilistic linear context-free rewriting systems. 23rd International Conference on Computational Linguistics, Beijing, China, 537–545. [url]
  27. Kallmeyer, L. and Maier, W. 2013. Data-driven parsing using probabilistic linear context-free rewriting systems. Computational Linguistics 39(1), 87–119. [doi]
  28. Knuth, D.E. 1968. Semantics of context–free languages. Math. Systems Theory 2, 127–145.
  29. Koller, A. and Kuhlmann, M. 2011. A Generalized View on Parsing and Translation. Proceedings IWPT 2011. [url]
  30. Kuhlmann, M. and Satta, G. 2009. Treebank Grammar Techniques for Non-projective Dependency Parsing. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 478–486. [url]
  31. Kuhlmann, M., Gómez-Rodríguez, C., and Satta, G. 2011. Dynamic Programming Algorithms for Transition-based Dependency Parsers. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, Association for Computational Linguistics, 673–682. [url]
  32. Lewis, P.M. and Stearns, R.E. 1968. Syntax-directed transduction. J. ACM 15, 3, 465–488.
  33. Maier, W. and Søgaard, A. 2008. Tree-banks and mildly context-sensitivity. Proc. of Formal Grammar 2008, 61–76. [url]
  34. Maletti, A., Graehl, J., Hopkins, M., and Knight, K. 2009. The Power of Extended Top-down Tree Transducers. SIAM J. Comput. 39, 2, 410–430.
  35. Marcus, M.P., Santorini, B., and Marcinkiewicz, M.A. 1994. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 2, 313–330. [url]
  36. Matsuzaki, T., Miyao, Y., and Tsujii, J. 2005. Probabilistic CFG with Latent Annotations. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 75–82. [doiurl]
  37. Nederhof, M.-J. 2003. Weighted deductive parsing and Knuth’s algorithm. Computational Linguistics 29(1), 135–143.
  38. Nederhof, M.-J. and Vogler, H. 2014. Hybrid Grammars for Discontinuous Parsing. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin City University and Association for Computational Linguistics, 1370–1381. [url]
  39. Nivre, J. 2008. Algorithms for Deterministic Incremental Dependency Parsing. Computational Linguistics 34(4), 513–553. [doi]
  40. Nivre, J. 2009. Non-Projective Dependency Parsing in Expected Linear Time. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Association for Computational Linguistics, 351–359. [url]
  41. Petrov, S., Barrett, L., Thibaux, R., and Klein, D. 2006. Learning Accurate, Compact, and Interpretable Tree Annotation. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 433–440. [url]
  42. Rounds, W.C. 1970. Mappings and grammars on trees. Math. Systems Theory 4, 3, 257–287.
  43. Schabes, Y. 1990. Mathematical and computational aspects of lexicalized grammars. .
  44. Seki, H., Matsumura, T., Fujii, M., and Kasami, T. 1991. On multiple context-free grammars. Theoretical Computer Science 88, 191–229.
  45. Shieber, S.M. and Schabes, Y. 1990. Synchronous tree-adjoining grammars. Proc. 13th CoLing, ACL, 253–258.
  46. Skut, W., Krenn, B., Brants, T., and Uszkoreit, H. 1997. An annotation scheme for free word order languages. Fifth Conference on Applied Natural Language Processing, 88–95. [doi]
  47. Marneffe, M.-C. de and Manning, C.D. 2008. Stanford typed dependencies manual. Stanford University. [url]
  48. Bar–Hillel, Y., Perles, M., and Shamir, E. 1961. On formal properties of simple phrase structure grammars. Z. Phonetik. Sprach. Komm. 14, 143–172.
  49. Vijay-Shanker, K., Weir, D.J., and Joshi, A.K. 1987. Charaterizing structural descriptions produced by various grammatical formalisms. Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 104–111. [doi]

Kontakt

  • Prof. Dr.-Ing. habil. Dr. h.c./Univ. Szeged
    Heiko Vogler
    Tel.: +49 (0) 351 463-38232
  • Dr.-Ing. Kilian Gebhardt
    Tel.: +49 (0) 351 463-38237
Stand: 29.06.2018 08:50 Uhr