The Programming Historian 2: Getting Started with Topic Modeling and MALLET

(View Complete Item Description)

In this lesson you will first learn what topic modeling is and why you might want to employ it in your research. You will then learn how to install and work with the MALLET natural language processing toolkit to do so. MALLET involves modifying an environment variable (essentially, setting up a short-cut so that your computer always knows where to find the MALLET program) and working with the command line (ie, by typing in commands manually, rather than clicking on icons or menus). We will run the topic modeller on some example files, and look at the kinds of outputs that MALLET installed. This will give us a good idea of how it can be used on a corpus of texts to identify topics found in the documents without reading them individually.

Material Type: Diagram/Illustration

Authors: Scott Weingart and Ian Milligan, Shawn Graham

The Programming Historian 2: Output Keywords in Context in HTML File

(View Complete Item Description)

This lesson builds on Keywords in Context (Using N-grams), where n-grams were extracted from a text. Here, you will learn how to output all of the n-grams of a given keyword in a document downloaded from the Internet, and display them clearly in your browser window.

Material Type: Diagram/Illustration

Author: William J. Turkel and Adam Crymble

The Programming Historian 2: Keywords in Context (Using n-grams)

(View Complete Item Description)

Like in Output Data as HTML File, this lesson takes the frequency pairs collected in Counting Frequencies and outputs them in HTML. This time the focus is on keywords in context (KWIC) which creates n-grams from the original document content – in this case a trial transcript from the Old Bailey Online. You can use your program to select a keyword and the computer will output all instances of that keyword, along with the words to the left and right of it, making it easy to see at a glance how the keyword is used. Once the KWICs have been created, they are then wrapped in HTML and sent to the browser where they can be viewed. This reinforces what was learned in Output Data as HTML File, opting for a slightly different output. At the end of this lesson, you will be able to extract all possible n-grams from the text. In the next lesson, you will be learn how to output all of the n-grams of a given keyword in a document downloaded from the Internet, and display them clearly in your browser window.

Material Type: Diagram/Illustration

Author: William J. Turkel and Adam Crymble

Schritt-für-Schritt zu eigenen Regulären Ausdrücken Ein Einführungskurs

(View Complete Item Description)

Drei Regex-Tutorien auf einer Homepage? Da kratzt man sich am Kopf "Was soll das?" Ok, zwei Tutorien sind eigentlich gleich: eine deutsch Variante und ihre englische Übersetzung. Beide zum Gebrauch mit TB!, dem Mail-Client, geschrieben. Da aber die Frage nach einem Tutorial ohne Fokus auf Mail aufkam, setzte ich mich hin und bearbeitete die ursprüngliche TB-Fassung. Sie ist nun mit etwas weniger Mail-bezogenen Beispielen versehen und hat auch keinen TB!-Ballast. Außerdem wird diese Fassung -zumindest einigermaßen- gepflegt.

Material Type: Reading

Author: Gerd Ewald

Grammix

(View Complete Item Description)

Die Grammix-VM ist eine Virtuelle Maschine (VM), die mit VirtualBox gestartet werden kann und die ein komplettes Grammatikentwicklungssystem (das TRALE-System) und Beispiel-Grammatiken enthält, die den jeweiligen Kapiteln aus dem Buch Einführung in die Head-Driven Phrase Structure Grammar entsprechen. Außerdem enthält sie das Babel-System und Grammatiken für das Chinesische, Maltesische und Deutsche, die einen gemeinsamen Kern haben und als Semantikrepräsentation Minimal Recursion Semantics verwenden.

Material Type: Interactive

Author: Sektion Computerlinguistik der Deutschen Gesellschaft für Sprachwissenschaft

SWI-Prolog documentation (with Tutorials)

(View Complete Item Description)

SWI-Prolog documentation

Material Type: Lecture Notes, Unit of Study

EinfÃ¼hrung in die Korpuslinguistik: Praktische Grundlagen und Werkzeuge

(View Complete Item Description)

EinfÃ¼hrung in die Korpuslinguistik: Korpustypen, Erstellung, Annotationen, Anfragesysteme Web als Korpus: Wo liegen die Chancen und Risiken der Nutzung des Internets als linguistisches Korpus? Ãœberarbeitet: DeReKo/COSMAS II: Das Deutsche Referenzkorpus DeReKo des Instituts fÃ¼r Deutsche Sprache (IDS) ist eines der wichtigsten Korpora deutscher Sprache. EinfÃ¼hrung in die Bedienung mit COSMAS II. Weitere Korpora: Kurze EinfÃ¼hrungen in weitere wichtige deutschsprachige Korpora. Eigenes Korpus: Hilfe und Tipps zur Erstellung eines eigenen Korpus, neu mit einem Modul zu maschinellem Wortarten-Tagging (POS-Tagging). Corpus Workbench: EinfÃ¼hrung in die IMS Open Corpus Workbench und CQPweb zur Verwaltung von bestehenden und eigenen annotierten Korpora. Anwendungen: Beispiele fÃ¼r die Arbeit mit Korpora Statistik: Statistik fÃ¼r die Korpusanalyse Visualisierung: EinfÃ¼hrung in die MÃ¶glichkeiten der Visualisierung von Sprachdaten. Anhang: Informationen zu korpuslinguistischer Software, kleine EinfÃ¼hrungen in grundlegende Unix-Befehle und in RegulÃ¤re AusdrÃ¼cke, sowie Literaturhinweise und ein Lexikon.

Material Type: Unit of Study

Author: Noah Bubenhofer

Vorlesung: Einführung in die Computerlinguistik

(View Complete Item Description)

Die Veranstaltung gibt einen Überblick über Ziele und Methoden der Computerlinguistik. Verschiedene Anwendungen bzw. potentielle Anwendungen werden vorgestellt, es wird gezeigt, wo die Herausforderungen und Probleme bei der Sprachverarbeitung liegen und von welcher Seite man sich ihnen nähern kann. Bei maschineller Übersetzung braucht man Komponenten, die den Aufbau von Wörtern analysieren (morphologische Komponente), Komponenten, die die Struktur von Sätzen analysieren (syntaktische Komponente) und man muss die Bedeutung eines Satzes ermitteln, um ihn adäquat übersetzen zu können. Es wird das Konzept der endlichen Automaten erklärt und gezeigt, wie sich solche Automaten für die morphologische Analyse benutzen lassen. Es wird gezeigt, wie man syntaktische Gesetzmäßigkeiten formalisieren kann, und wie die entsprechenden Grammatiken verarbeitet werden können. Bedeutungsrepräsentationen können entweder parallel mit der Erzeugung der syntaktischen Strukturen aufgebaut werden oder in einer der syntaktischen Analyse nachgeordneten Komponente berechnet werden.

Material Type: Unit of Study

Authors: "Institut für Deutsche und Niederländische Philologie; Deutsche Grammatik, Prof. Dr. Stefan Müller"

Vorlesung/Hauptseminar: Computationelle Semantik

(View Complete Item Description)

In der Veranstaltung wird in die Computationelle Semantik eingeführt. Es wird gezeigt, wie sprachlichen Ausdrücken eine Bedeutung zugewiesen werden kann, die sich kompositional aus der Bedeutung der Teile ergibt. Die folgenden Punkte werden besprochen: Logik erster Stufe Lambda-Kalkül Skopusambiguitäten und unterspezifizierte Repräsentationen Propositionale Inferenz Inferenz erster Stufe und Unifikation Diskursrepräsentationstheorie Präsupposition

Material Type: Unit of Study

Author: Stefan Müller

Corpus Linguistics: Method, theory and practice

(View Complete Item Description)

Corpus Linguistics: Method, theory and practice is a new textbook introducing corpus linguistics, published by Cambridge University Press, and written by Tony McEnery and Andrew Hardie.

Material Type: Reading

Author: Tony McEnery & Andrew Hardie

Head-Driven Phrase Structure Grammar (HPSG) für das Deutsche

(View Complete Item Description)

In dieser Veranstaltung wird ein Modell der deutschen Sprache entwickelt. Die Vorlesung gibt eine Einführung in die wesentlichen Konzepte: Repräsentation von Valenzinformation und semantischer Information, Grammatikregeln Lexikonregeln

Material Type: Lesson Plan, Unit of Study

Authors: "Institut für Deutsche und Niederländische Philologie; Deutsche Grammatik, Prof. Dr. Stefan Müller"

Natural Language Processing and Rule-based Information Extraction with UIMA 3rd UIMA@GSCL Workshop

(View Complete Item Description)

This is the material from the UIMA tutorial held in conjunction with the 3rd UIMA@GCSL workshop.

Material Type: Reading

Authors: Averbis, TU Darmstadt, Uni Würzburg

TeLeMaCo – The Linguistics Teaching Resources Hub – CLARIN

(View Complete Item Description)

TeLeMaCo – The Linguistics Teaching Resources Hub – CLARIN

Material Type: Activity/Lab, Unit of Study

Author: Angewandte Sprachwissenschaft sowie Übersetzen und Dolmetschen

Methods Commons

(View Complete Item Description)

Computation has produced new and exciting ways of studying texts. Many of these methods do not require the use of expensive programs or detailed programming knowledge, but only the know-how to combine freely accessible resources to perform various tasks. This site describes common or interesting sequences of actions, or recipes. They are organized according to the objective of the recipe. Recipes fall into the three major categories of location and identification of ideas, themes or specific terms; analysis of textual devices or themes; or the construction of new entities or corpora. The Methods Commons community benefits from shared experience and learning how others make use of recipes. You can share your experience by adding your own recipes to the collection. More information about recipe and exercise structure and authoring is available on the Recipe Structure page. We also have a Glossary that we hope you will add to.

Material Type: Unit of Study

Author: Stéfan Sinclair & Geoffrey Rockwell

Digitize Me, Visualize Me, Search Me

(View Complete Item Description)

Digitize Me, Visualize Me, Search Me takes as its starting point the so-called ‘computational turn’ to data-intensive scholarship in the humanities. What Digitize Me, Visualize Me, Search Me endeavours to show is that such data-focused transformations in research can be seen as part of a major alteration in the status and nature of knowledge. It is an alteration that, according to the philosopher Jean François Lyotard, has been taking place since at least the 1950s, and involves nothing less than a shift away from a concern with questions of what is right and just, and toward a concern with legitimating power by optimizing the social system’s performance in instrumental, functional terms. This shift has significant consequences for our idea of knowledge.

Material Type: Lecture, Reading, Textbook

Author: Gary Hall

Digitale Textedition mit TEI

(View Complete Item Description)

Das Tutorial Digitale Textedition mit TEI besteht aus einer Reihe von Kapiteln, die aufeinander aufbauend in die Kodierung und Edition von Texten nach den Guidelines der Text Encoding Initiative (TEI) einführen. Das Tutorial ist für den Einsatz in der Lehre konzipiert, kann aber auch im Selbststudium eingesetzt werden. Jedes Kapitel behandelt einen bestimmten Aspekt des Themas und besteht jeweils aus drei Elementen: erstens aus einem Foliensatz für ein Inputreferat, das in die wichtigsten Begriffe und Elemente von TEI einführt; zweitens aus einem oder mehreren Aufgabenblättern, die zur praktischen Einübung des gelernten dienen; und drittens aus den diversen Materialien, die für die Bearbeitung der Aufgaben notwendig sind, bspw. digitale Faksimiles, XML-Dateien, und mehr.

Material Type: Activity/Lab, Full Course, Lecture Notes, Primary Source

Author: Christof Schöch

Doing Digital Humanities - A DARIAH Bibliography

(View Complete Item Description)

This is the collaborative group library of the DARIAH-DE project. It contains several collections of bibliographic items relevant to specific aspects of the Digital Humanities. Please feel free to contact us with any questions, comments or suggestions! Items in the collection are tagged using a closed vocabulary of activities (what research activity is being treated) and objects (to what research objects is it being applied), following the TaDiRAH taxonomy.

Material Type: Lecture Notes, Primary Source, Reading

Paläographie Online

(View Complete Item Description)

Sie können hier lernen, alte Schriften zu lesen, und verfolgen, wie sich unsere Schrift seit der Antike entwickelt hat.

Material Type: Lecture Notes, Unit of Study

MACHINE TRANSLATION: An Introductory Guide

(View Complete Item Description)

Here you can access PostScript/PDF and HTML versions of D.J. Arnold, Lorna Balkan, Siety Meijer, R.Lee Humphreys and Louisa Sadler Machine Translation: an Introductory Guide, Blackwells-NCC, London, 1994, ISBN: 1855542-17x.

Material Type: Reading

Author: Doug Arnold et al.

DARIAH-DE-Videotutorials: Collection Registry

(View Complete Item Description)

Die Collection Registry ist eine einfache Web-Anwendung, die Informationen über Forschungsdatensammlungen vereint, die für die geisteswissenschaftliche Forschung relevant sind. Der Begriff Sammlung bezieht sich auf verschiedenste Entitäten wie Bücher, Urkunden, Texte, Dateien, Bilder oder Statuen. Eine Sammlungsbeschreibung in der Collection Registry enthält allgemeine Informationen wie Standort und Zugriffspunkte der Sammlung. Die Tutorials zeigen den Einstieg und die Arbeit in der Collection Registry.

Material Type: Lecture

Author: Philip Dürholt on behalf of DARIAH-DE

DARIAH

All resources in DARIAH