Introduction to Python workshop for DSRI 2025
Python is a powerful and versatile programming language used widely for data analysis, web development, machine learning, and more. In this workshop, we’ll explore the basics of Python programming by analyzing metadata about novels – titles, authors, publication years, languages, genres, and more. We’ll work with both numerical and textual data, learning how to clean, organize, and analyze information using Python. We’ll then shift our focus toward textual analysis exploring patterns and word frequencies across individual novels. Finally, we’ll discuss how to visualize and communicate our findings in accessible and compelling ways.
Top 500 'Greatest' Novels (1021-2015)
We'll be exploring a dataset focused on the 500 novels most widely held by OCLC libraries, compiled and curated by Anna Preus and Aashna Sheth as part of the Responsible Datasets in Context project. It features titles, bibliographic metadata, author demographic information, as well as OCLC holdings, GoodReads data, and Wikipedia and Project Gutenberg urls. We encourage you to read their data essay explaining the methodology, decisions, bias, and highlights.
Jupyter Notebook: Introduction to Python
We're going to be using Google Colab to run Python during this workshop. To open the workshop files in Colab, navigate to https://colab.research.google.com/ and select "GitHub" on the "Open Notebook" menu. You will be prompted to paste in a link: use the url for this repository, https://github.com/tri-cods/python and click the search button. Select the first notebook that appears: introduction_to_python.ipynb to open the exercise files.
If you get stuck and want a hint, use this link to view a copy of the notebook with completed exercises: Introduction to Python jupyter notebook with answers
- Read about the Python design principles known as the "Zen of Python"
- W3 Schools Python Tutorial
- Melanie Walsh, Introduction to Cultural Analytics & Python
- Al Sweigart, Automate the Boring Stuff with Python
Author: Alyssa Pivirotto. Contributing author: Alice McGrath. Reviewer: Sean Keenan.
Tri-Co Digital Scholarship Research Institute by Tri-Co Digital Scholarship Group is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. When sharing this material or derivative works, preserve this paragraph, changing only the title of the derivative work, or provide comparable attribution.
