Search Results (26)

View
Selected filters:
  • Center for History and New Media
The Programming Historian 2: From HTML to List of Words (part 1)
Conditions of Use:
No Strings Attached
Rating

In this two-part lesson, we will build on what you’ve learned about ...

In this two-part lesson, we will build on what you’ve learned about Working with Webpages, learning how to remove the HTML markup from the webpage of Benjamin Bowsey’s 1780 criminal trial transcript. We will achieve this by using a variety of string operators, string methods and close reading skills. We introduce looping and branching so that programs can repeat tasks and test for certain conditions, making it possible to separate the content from the HTML tags. Finally, we convert content from a long string to a list of words that can later be sorted, indexed, and counted.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
William J. Turkel and Adam Crymble
The Programming Historian 2: Keywords in Context (Using n-grams)
Conditions of Use:
No Strings Attached
Rating

Like in Output Data as HTML File, this lesson takes the frequency ...

Like in Output Data as HTML File, this lesson takes the frequency pairs collected in Counting Frequencies and outputs them in HTML. This time the focus is on keywords in context (KWIC) which creates n-grams from the original document content – in this case a trial transcript from the Old Bailey Online. You can use your program to select a keyword and the computer will output all instances of that keyword, along with the words to the left and right of it, making it easy to see at a glance how the keyword is used.

Once the KWICs have been created, they are then wrapped in HTML and sent to the browser where they can be viewed. This reinforces what was learned in Output Data as HTML File, opting for a slightly different output.

At the end of this lesson, you will be able to extract all possible n-grams from the text. In the next lesson, you will be learn how to output all of the n-grams of a given keyword in a document downloaded from the Internet, and display them clearly in your browser window.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
William J. Turkel and Adam Crymble
The Programming Historian 2: Creating New Vector Layers in QGIS 2.0
Conditions of Use:
No Strings Attached
Rating

In this lesson you will learn how to create vector layers based ...

In this lesson you will learn how to create vector layers based on scanned historical maps. In Intro to Google Maps and Google Earth you used vector layers and created attributes in Google Earth. We will be doing the same thing in this lesson, albeit at a more advanced level, using QGIS software.

Subject:
Arts and Humanities
History
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
Niche Canada
Roy Rosenzweig
The Programming Historian 2: Creating an Omeka.net Exhibit
Conditions of Use:
No Strings Attached
Rating

In the lesson Up and Running with Omeka.net, you added items to ...

In the lesson Up and Running with Omeka.net, you added items to your Omeka.net site and grouped them into collections. Now you are ready for the next step: taking your users on a guided tour through the items you have collected.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
Miriam Posner
The Programming Historian 2: Working with Text Files
Conditions of Use:
No Strings Attached
Rating

In this lesson you will learn how to manipulate text files using ...

In this lesson you will learn how to manipulate text files using Python. This includes opening, closing, reading from, and writing to .txt files.

The next few lessons will involve downloading a web page from the Internet and reorganizing the contents into useful chunks of information. You will be doing most of your work using Python code written and executed in Komodo Edit.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
William J. Turkel and Adam Crymble
The Programming Historian 2: Creating and Viewing HTML Files with Python
Conditions of Use:
No Strings Attached
Rating

This lesson uses Python to create and view an HTML file. If ...

This lesson uses Python to create and view an HTML file. If you write programs that output HTML, you can use any browser to look at your results. This is especially convenient if your program is automatically creating hyperlinks or graphic entities like charts and diagrams.

Here you will learn how to create HTML files with Python scripts, and how to use Python to automatically open an HTML file in Firefox.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
William J. Turkel and Adam Crymble
The Programming Historian 2: Output Keywords in Context in HTML File
Conditions of Use:
No Strings Attached
Rating

This lesson builds on Keywords in Context (Using N-grams), where n-grams were ...

This lesson builds on Keywords in Context (Using N-grams), where n-grams were extracted from a text. Here, you will learn how to output all of the n-grams of a given keyword in a document downloaded from the Internet, and display them clearly in your browser window.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
William J. Turkel and Adam Crymble
The Programming Historian 2: Code Reuse and Modularity
Conditions of Use:
No Strings Attached
Rating

Computer programs can become long, unwieldy and confusing without special mechanisms for ...

Computer programs can become long, unwieldy and confusing without special mechanisms for managing complexity. This lesson will show you how to reuse parts of your code by writing Functions and break your programs into Modules, in order to keep everything concise and easier to debug. Being able to remove a single dysfunctional module can save time and effort.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
William J. Turkel and Adam Crymble
The Programming Historian 2: Transliterating non-ASCII characters with Python
Conditions of Use:
No Strings Attached
Rating

This lesson shows how to use Python to transliterate automatically a list ...

This lesson shows how to use Python to transliterate automatically a list of words from a language with a non-Latin alphabet to a standardized format using the American Standard Code for Information Interchange (ASCII) characters. It builds on readers’ understanding of Python from the lessons “Viewing HTML Files,” “Working with Web Pages,” “From HTML to List of Words (part 1)” and “Intro to Beautiful Soup.” At the end of the lesson, we will use the transliteration dictionary to convert the names from a database of the Russian organization Memorial from Cyrillic into Latin characters. Although the example uses Cyrillic characters, the technique can be reproduced with other alphabets using Unicode.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
Seth Bernstein
The Programming Historian 2: Georeferencing in QGIS 2.0
Conditions of Use:
No Strings Attached
Rating

In this lesson, you will learn how to georeference historical maps so ...

In this lesson, you will learn how to georeference historical maps so that they may be added to a GIS as a raster layer. Georeferencing is required for anyone who wants to accurately digitize data found on a paper map, and since historians work mostly in the realm of paper, georeferencing is one of our most commonly used tools. The technique uses a series of control points to give a two-dimensional object like a paper map the real world coordinates it needs to align with the three-dimensional features of the earth in GIS software (in Intro to Google Maps and Google Earth we saw an ‘overlay’ which is a Google Earth shortcut version of georeferencing).

Subject:
Arts and Humanities
History
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
Niche Canada
Roy Rosenzweig
The Programming Historian 2: Cleaning Data with OpenRefine
Conditions of Use:
No Strings Attached
Rating

Don’t take your data at face value. That is the key message ...

Don’t take your data at face value. That is the key message of this tutorial which focuses on how scholars can diagnose and act upon the accuracy of data. In this lesson, you will learn the principles and practice of data cleaning, as well as how OpenRefine can be used to perform four essential tasks that will help you to clean your data:
1. Remove duplicate records
2. Separate multiple values contained in the same field
3. Analyse the distribution of values throughout a data set
4. Group together different representations of the same reality

These steps are illustrated with the help of a series of exercises based on a collection of metadata from the Powerhouse museum, demonstrating how (semi-)automated methods can help you correct the errors in your data.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
Ruben Verborgh and Max De Wilde
Seth van Hooland
The Programming Historian 2: Intro to Beautiful Soup
Conditions of Use:
No Strings Attached
Rating

This tutorial assumes basic knowledge of HTML, CSS, and the Document Object ...

This tutorial assumes basic knowledge of HTML, CSS, and the Document Object Model. It also assumes some knowledge of Python. For a more basic introduction to Python, see Working with Text Files.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
Jeri Wieringa
The Programming Historian 2: Viewing HTML Files
Conditions of Use:
No Strings Attached
Rating

When you are working with online sources, much of the time you ...

When you are working with online sources, much of the time you will be using files that have been marked up with HTML (Hyper Text Markup Language). Your browser already knows how to interpret HTML, which is handy for human readers. Most browsers also let you see the HTML source code for any page that you visit. The two images below show a typical web page (from the Old Bailey Online) and the HTML source used to generate that page, which you can see with the Tools -> Web Developer -> Page Source command in Firefox.

Subject:
Arts and Humanities
History
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
Adam Crymble
William J. Turkel
The Programming Historian 2: Applied Archival Downloading with Wget
Conditions of Use:
No Strings Attached
Rating

Now that you have learned how Wget can be used to mirror ...

Now that you have learned how Wget can be used to mirror or download specific files from websites like ActiveHistory.ca via the command line, it’s time to expand your web-scraping skills through a few more lessons that focus on other uses for Wget’s recursive retrieval function. The following tutorial provides three examples of how Wget can be used to download large collections of documents from archival websites with assistance from the Python programing language. It will teach you how to parse and generate a list of URLs using a simple Python script, and will also introduce you to a few of Wget’s other useful features. Similar functions to the ones demonstrated in this lesson can be achieved using curl, an open-source software capable of performing automated downloads from the command line. For this lesson, however, we will focus on Wget and building your Python skills.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
Kellen Kurschinski
The Programming Historian 2: Python Introduction and Installation
Conditions of Use:
No Strings Attached
Rating

Downloading a single record from a website is easy, but downloading many ...

Downloading a single record from a website is easy, but downloading many records at a time – an increasingly frequent need for a historian – is much more efficient using a programming language such as Python. In this lesson, we will write a program that will download a series of records from the Old Bailey Online using custom search criteria, and save them to a directory on our computer. This process involves interpreting and manipulating URL Query Strings. In this case, the tutorial will seek to download sources that contain references to people of African descent that were published in the Old Bailey Proceedings between 1700 and 1750.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
Adam Crymble
The Programming Historian 2: Understanding Regular Expressions
Conditions of Use:
No Strings Attached
Rating

In this exercise we will use advanced find-and-replace capabilities in a word ...

In this exercise we will use advanced find-and-replace capabilities in a word processing application in order to make use of structure in a brief historical document that is essentially a table in the form of prose. Without using a general programming language, we will gain exposure to some aspects of computational thinking, especially pattern matching, that can be immediately helpful to working historians (and others) using word processors, and can form the basis for subsequent learning with more general programming environments.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Homework/Assignment
Provider:
Center for History and New Media
Author:
Doug Knox
The Programming Historian 2: Working With Web Pages
Conditions of Use:
No Strings Attached
Rating

This lesson introduces Uniform Resource Locators (URLs) and explains how to use ...

This lesson introduces Uniform Resource Locators (URLs) and explains how to use Python to download and save the contents of a web page to your local hard drive.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
William J. Turkel and Adam Crymble
The Programming Historian 2: Getting Started with Topic Modeling and MALLET
Conditions of Use:
No Strings Attached
Rating

In this lesson you will first learn what topic modeling is and ...

In this lesson you will first learn what topic modeling is and why you might want to employ it in your research. You will then learn how to install and work with the MALLET natural language processing toolkit to do so. MALLET involves modifying an environment variable (essentially, setting up a short-cut so that your computer always knows where to find the MALLET program) and working with the command line (ie, by typing in commands manually, rather than clicking on icons or menus). We will run the topic modeller on some example files, and look at the kinds of outputs that MALLET installed. This will give us a good idea of how it can be used on a corpus of texts to identify topics found in the documents without reading them individually.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
Scott Weingart and Ian Milligan
Shawn Graham
The Programming Historian 2: Introduction to Google Maps and Google Earth
Conditions of Use:
No Strings Attached
Rating

Google Maps and Google Earth provide an easy way to start creating ...

Google Maps and Google Earth provide an easy way to start creating digital maps. With a Google Account you can create and edit personal maps by clicking on My Places. In the new Google Maps interface, click on the gear menu [icon] at the upper right of the menu bar, and select My Places. The new (as of summer 2013) interface provides a new way of creating custom maps: Google Maps Engine Lite allows users to import and add data onto the map to visualize trends.

Subject:
Arts and Humanities
History
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
Niche Canada
Roy Rosenzweig
The Programming Historian 2: Normalizing Data
Conditions of Use:
No Strings Attached
Rating

The list that we created in the From HTML to a List ...

The list that we created in the From HTML to a List of Words (2) needs some normalizing before it can be used further. We are going to do this by applying additional string methods, as well as by using regular expressions. Once normalized, we will be able to more easily analyze our data.

Subject:
Computer Science
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Author:
William J. Turkel and Adam Crymble