System for Processing Formation Patterns and Restrictions (PPR)

PPR (version 19.2) is a sample implementation in SetlX of the Pattern-and-Restriction Theory of word formation (PR) (Nolda 2012 a, 2018 a). It currently provides selected word-formation patterns and a very limited lexicon for spoken and written German systems. PPR’s primary use is a grammar writer’s testbed for the soundness of his theoretical and empirical hypotheses. By no means, PPR is a production-scale system.

DWDSmor

DWDSmor is a toolbox for creating and applying a set of finite-state automata for morphological analysis and generation in written German. The automata are compiled from an SMOR-style grammar in SFST format and a lexicon which is derived at build time from XML sources of the online dictionary Digitales Wörterbuch der deutschen Sprache” (DWDS). The compiled automata can be called from two supplied Python scripts for analysing tokenized corpus data or for generating of inflectional paradigms.

EXMARaLDA’s Dulko tools

The Dulko tools of the EXMARaLDA Partitur-Editor provide transformation scenarios (actually, XSLT 2.0 stylesheets) for the annotation of data in learner corpora and beyond. They support tokenisation, part-of-speech tagging, lemmatisation, sentence-span computation, editing of target hypotheses, detection of differences between target hypotheses and the learner text, error analysis, and metadata management (Hirschmann and Nolda 2019, Nolda 2019 b).

Prior to release version 1.7 of the EXMARaLDA Partitur-Editor, the Dulko toolset was developed separately from EXMARaLDA mainline under the name of “EXMARaLDA (Dulko)” for the Dulko learner-corpus project at the University of Szeged.

For this work, I was awarded the Innovation Prize 2018 in the engineering category from the University of Szeged.

makeDulko

makeDulko (version 1.1) is a build system for generating ANNIS data from EXMARaLDA sources annotated with the EXMARaLDA (Dulko) tools.

XGrep

The Python 3 script xgrep.py (version 2.12) searches XML files for patterns specified in terms of XPath 1.0 expressions. Its options mimic the behaviour of GNU grep.

XDiff

The Python 3 script xdiff.py (version 2.4) compares XML files for structural or textual differences; differences in attribute order or whitespace formatting are ignored. Its output mimics the unified format of GNU diff.

PSGML-Utils

PSGML-Utils (version 2.1) is a set of extensions for Emacs’ PSGML mode. They provide additional editing functions, functions for running validation and transformation scenarios, as well as an XML mode derived from PSGML’s SGML mode.

TEI2X

TEI2X (version 2.16) provides XSLT 1.0 stylesheets for the generation of files as well as DOCX files and HTML files from legacy TEI P4 source files, in a customised version with some P5 additions. The stylesheets are geared towards ‘born-digital’ documents, in particular technical documents in linguistics and other scientific fields.

TEIP4to5

The XSLT 1.0 stylesheet teip4to5.xsl (version 1.4) converts legacy TEI P4 documents (such as the sample TEI files in TEI2X) to TEI P5 documents.

Overlays

The overlays package (version 2.12) for allows to write presentations with incremental slides. It does not presuppose any specific document class. Rather, it is a lightweight alternative to full-fledged presentation classes like beamer.

Tagpair

The tagpair package (version 1.1) for provides environments and commands for pairing lines, bottom lines, and tagged lines, intended to be used in particular for word-by-word glosses, translations, and bibliographic attributions, respectively.

Hang

The hang package (version 2.1) for provides environments for hanging paragraphs and list items. In addition, it defines environments for labeled paragraphs and list items.

Lingua Franca

The Lingua Franca OpenType and Web Open fonts (version 1.20) are a modified version of the Heuristica font family, which in turn is based on the Utopia Type 1 fonts, designed by Robert Slimbach for Adobe and licensed to the Users Group (TUG) for free modification and redistribution. The Lingua Franca fonts are particularly useful for documents in linguistics. The regular typeface includes all characters of the Unicode IPA extensions as well as many spacing or combining diacritics. In addition, the typefaces support various typographic features such as ligatures, proportional figures, etc.; a stylistic set provides longer slashes, matching the parentheses in height and depth.

Goal Column

The Goal Column macro bundle (version 1.0) for JEdit is inspired by Emacs’ set-goal-column function.

MAgenda

The Python 3 script magenda.py (version 2.0) creates an agenda of task-list items in GitHub Flavored Markdown files.

Latin Square

The Bash script latin-square (version 1.1) prints lines from a file according to the Latin square. It is intended for distributing experimental items over groups of subjects in Latin-square form.