Week 3 – Working with Data

Now that we’ve learned a bit more about how humanities data is created, collected, and structured, we’re going to explore some tools for making sense of large datasets and for “tidying” them up. We’ll try out two tools that are handy for data-driven DH projects: WTFcsv and OpenRefine. The latter requires a software install. Please see the documentation for installing OpenRefine in this tutorial and message me on Slack, visit me during office hours, or contact one of the course TAs for help if you need it.

In this lab we will:

  • Gain familiarity with a couple tools for making sense of and manipulating data
  • Follow a workshop video produced by Haverford Libraries to learn these tools with sample data
  • Examine a CSV (comma-separated value text file) export of the Omeka item metadata from last week’s lab and identify one point of interest in the collection metadata

Specs

  • Approximately 750 words
  • Report applies ideas from this week’s readings to the questions
  • Author offers meaningful research questions for the datasets 
  • Author makes their points in clear and concise ways
  • The work contains no more than 3 grammatical, spelling, or other “mechanical” errors.
  • The work contains no more than 2 minor factual inaccuracies and no major factual inaccuracies.
  • Upload PDF to Canvas

Lab Instructions

  1. Download this CSV of song data from the Free Music Archive
  2. Watch and follow this video of a workshop run by Haverford College Libraries in the summer of 2020, referring to the tutorial that includes step-by-step instructions as well as links to sample data for the exercises within. A good place to start is around the 8:30 mark (everything before that is just idle chit-chat and waiting for the workshop to start), and the OpenRefine portion ends at the 1:03 mark. You may continue on past the OpenRefine section, as the video introduces another tool called Palladio that is also useful for DH projects, but it is not required!
  3. Follow the “Exercises” section in the OpenRefine tutorial to get more familiar with the tool.
  4. Download this CSV of item metadata from last week’s Omeka exercise.
  5. Create a new project in OpenRefine and import the Omeka export data, and use the tools and strategies you practiced in the exercises to manipulate the data.
  6. Finally, go to https://databasic.io/en/wtfcsv/ to try out a tool called WTFcsv. In the box that says “use a sample” or “upload a file,” choose the latter and upload the Omeka collection CSV. What (if any) conclusions can you draw from the result? 

Report (about 750 words)

Please address these prompts in your report:

  • Describe your experience working with OpenRefine. Were you able to install it successfully? Were you able to create a new project and import the data? Were you able to complete all the exercises? What problems did you encounter?
  • Describe at least one notable observation of the Omeka export data after working with it in WTFcsv or OpenRefine. If nothing is apparent, imagine one data-driven question you might ask of one of the featured collections from Week 2.
  • What sorts of possibilities do large datasets and tools for manipulating them create? What kind of research could you imagine while working with humanities data?
  • What kinds of research questions or projects could you imagine using the collections you engaged with last week and the data refining tools you worked with this week? Give at least one example of a project idea.

Submission Details

  • Submit the lab report as a PDF to Canvas by the end of the day (local time) on Saturday, July 3
  • You can write the report in Google Docs, Word, Pages, or another application. Just be sure to save as a PDF