Week 5 – Spatial Analysis

How does digital mapping relate and compare to other digital humanities issues and methods? This lab will build on what you’ve already learned about your college collections, humanities data cleaning, and text analysis by asking you to extract geographic data from a college newspaper and collectively contribute to a map of your findings using open source tools. 

Goals

  • Learn how to extract geographic data from plain text
  • Practice cleaning data to meet the needs of spatial analysis
  • Develop practical knowledge of when and how to use data distribution and story maps
  • Contribute to a small scale demonstration collaborative mapping project
  • Reflect on the pros and cons of mapping for digital humanities scholarship

Specs

  • Report of approximately 500 words
  • Report addresses the questions in the “Reflection” section below 
  • Report applies ideas from this week’s readings to the questions
  • Author makes their points in clear and concise ways
  • Includes links to or screenshots of contributions to group maps
  • Upload PDF to Canvas

Lab Instructions

There are a number of open source and proprietary mapping tools out there and which you use will depend on your goals, your access, your financial resources, and a number of other factors. The aim of this lab is to give you a sense of the steps involved in any digital mapping process by using a suite of easily accessible web based tools. Follow the steps below as best you can, but focus on the big picture. You will be graded on your reflections on the issues raised by this methodology rather than your technical expertise with any single step below.

Overview

  1. Obtain a digitized copy of a recent issue of your college newspaper
  2. Extract a geographic dataset of places mentioned in the newspaper issue using https://recogito.pelagios.org 
  3. Clean and organize your data to optimize for mapping
  4. Add your data to a collective point map of places mentioned by each college newspaper using the Leaflet Maps with Google Sheets template
  5. Begin to explore the meaning of your data by adding a chapter about one place mentioned in your issue to a story map using the Leaflet Storymaps with Google Sheets template

Output Maps

These two maps will update with our collective information as you contribute your data to them.


Sources

  • Get the text of a recent college newspaper as your source
  • Go to your college’s newspaper collection and choose the one of the most recent student newspaper issues you can find [because there are several students from the same colleges, there may be overlap, which is fine]
  • Download the OCRd text, if available, or download the PDF. 
  • Extract the text from the PDF. The easiest way is to Select All in the PDF and paste into a new document in a text editor (if you don’t have one, I recommend the free Atom editor).
  • Save as a plain text file and make sure to set the encoding to UTF-8 in the save dialogue or Recogito won’t accept it.

Process

Named Entity Recognition and Geocoding

Our goal is to identify all the places mentioned in your issue (or at least on the first page, if the issue is very long) and add geographic coordinates to them (geocode them). You could manually read for these, add them to a spreadsheet and manually find latitude and longitude coordinates for each, but we are going to speed the process by having an algorithm find named entities and compare them to place names dictionaries (called gazetteers) using a tool called Recogito.

  • Sign up for a free account at recogito.pleagios.org.
  • Read through the Quick 10 Minute tutorial paying particular attention to Step 3: Identify and map places
  • Upload the .txt version of your college newspaper you saved above
  • Click the document once, then choose Named Entity Recognition from the Options drop-down menu in the top right corner
  • In the NER dialogue, choose Stanford CoreNLP en to use the English language recognition engine
  • Uncheck all available Authority Files, and then check only the GeoNames gazetteer to search for modern place names in English
  • Click Start NER
  • Once it finishes, double click the document name to open the annotation viewer
  • You should see identified Persons in blue and Places in green
  • Click on each green highlighted word to confirm if the locations are correct and change them if not
  • On hitting OK, if you are prompted to Re-Apply, choose Yes & merge existing annotations to update all references to that place
  • Once you have corrected the locations, switch to the Map View in the top toolbar to view and verify the results
  • You should have 5-10 confirmed locations for this demo, so feel free to stop if you get many more
Data Cleaning

If you were just doing a project for yourself, the recogito map might be all the exploratory data analysis you need, but we are going to try to combine your individual data into a collaborative project. For that, we need to export our data, and recogito offers many export formats. Our goal is a clean list of each unique place mentioned with its lat/long coordinates.

  • Choose Download Options from the top menu
  • Note that you could get GeoJSON or KML files if you are using an advanced GIS software, but we are going to download annotations as CSV to get a list of all persons and places identified
  • Practice your OpenRefine skills (or just use Excel) to make the following changes
    • Filter to select only Places
    • Filter to remove any blank lat/long rows
    • Filter for only unique entries (instructions for Excel)
  • Save your CSV

Presentation

Point map: compare data trends

I have set up a template map to compare the places mentioned in our newspapers in order to perform some data distribution analysis. You add data to this map by contributing to the google sheet below and adding additional attribute values so your points are colored the same as your college and contain an informational pop up.

DATA ENTRY GOOGLE SPREADSHEET

LIVE INTERACTIVE MAP (click to open full screen)

Instructions: Tutorial with much more information 

  • Import data into the map by going to the POINTS tab in the DATA ENTRY sheet linked above
  • Copy the list of Places from your cleaned Recogito export into the Names column
  • Copy the corresponding Lat and Long columns into their equivalents
  • Click on the LIVE INTERACTIVE MAP link above to refresh and see your data points!
  • Now you’ll add additional meaning through symbology to make patterns more easily recognizable.
  • Set the following values on the first record, and copy/paste them to fill down all your rows
    • Group = Location (so you can toggle off the two layers)
    • Marker Color = [the value for the college whose newspaper you used] (so you can see at a glance which points are from which college)
    • Description = Your newspaper title and date of issue (so point clicks will show the source of the information)
  • What patterns do you see? Any unexpected clusters or outliers?
Story map: make arguments and make meaning

I have also set up a template story map to dig into some of the meaning behind these places’ inclusion in the campus newspapers. As above, you add data to this map by contributing to the different google sheet below. 

DATA ENTRY GOOGLE SPREADSHEET FOR STORYMAP

LIVE INTERACTIVE STORYMAP 

Instructions: Tutorial with much more information 

  • Choose one location mentioned in your issue that is most interesting to you
  • Import data into Story Map Google Sheets Leaflet map by adding a new row with the following information
    • Required
      • Chapter: The title of your story
      • Description: A brief (<500 characters) summary of the reason the place was mentioned in the newspaper
      • Location: Human readable place name
      • Lat/Long: Computer mappable coordinates
    • Optional
      • Media/Media Credit/Media Credit Link: Visual content to enhance your chapter
      • Play around with zoom levels and other options to consider what makes most sense for your story
  • What types of project would this style of map lend itself to?

CONGRATULATIONS!

Reflections

  • Describe your experience working with extracting and visualizing geographic data. How does it build on, contrast with or compare to working with other types of data you’ve explored so far?
  • Describe your experiences working with the specific tools used here. Were you able to complete all the exercises? What problems did you encounter?
  • What benefits can you see of using mapping for exploratory data analysis (as in the Recogito map) and/or explanatory data analysis as in the Google sheets template maps?
  • What kinds of research questions or projects could you imagine using these tools for, especially regarding social justice collections? Give at least one example of a project idea.

Submission Details

  • Submit the lab report as a PDF to Canvas by the end of the day (local time) on Saturday, July 17
  • You can write the report in Google Docs, Word, Pages, or another application. Just be sure to save as a PDF