Seminar: Text Modelling


Text modelling is the basis for many applications in the fields of natural language processing and information retrieval. This seminar provides an introduction into a handful of basic methods for text modelling.

This seminar will be held in English.

For more information, please register in the Learnweb course, once it exists.


  1. Research a given topic with an academic publication as starting point
    • Depending on the number of participants, alone or in teams
    • Literature search for related work or further developments
  2. Present two talks, each around 30 minutes (plus discussion)
    • First talk on basics and first publication
    • Second talk on a further development based on one or two papers (from the literature search)
  3. Compile an invidiual written report
    • Around 8 pages in the ijcai format (double column) without references
    • Description of the concepts from the talks
    • Including results of literature search
  4. Attendance during all presentations, participation during discussions


  1. Eigenvalue-based representation:
    Drikvandi & Lawal (2020): Sparse Principal Component Analysis for Natural Language Processing

  2. Probabilistic modelling:
    Blei et al. (2003): Latent Dirichlet Allocation

  3. Word embeddings:
    Mikolov et al. (2013): Distributed Representations of Words and Phrases and Their Compositionality

  4. Language model:
    Devlin et al. (2019): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

The corresponding source articles can be found by searching for the titles and authors mentioned above. Possibly, you need to be in the WWU network for access.