Data audit, visualization, and harm analysis
How is the data collected? Describe the collection methods, tools, and processes.
What data is collected? List the specific variables, fields, and types of information gathered.
How is the data analyzed? Explain the analytical methods and techniques used.
How is the data used? Describe how the analysis informs decisions, policies, or actions.
Where does the training data come from?
What's included or excluded?
Who labeled the data? Consider how labeling decisions shape the algorithm.
Document your technical analysis of the dataset. Include:
Describe the steps you took to clean and prepare the data:
# Example: Show key data cleaning steps
# You can include screenshots of your code or embed it here
import pandas as pd
df = pd.read_csv('data.csv')
# ... your analysis code ...
| Variable | N | Mean/Mode | Std Dev | Min | Max | Missing (%) |
|---|---|---|---|---|---|---|
| Variable 1 | -- | -- | -- | -- | -- | -- |
| Variable 2 | -- | -- | -- | -- | -- | -- |
Reflect critically on the decisions you made during data processing:
Choose one: Either critically examine 3 existing visualizations OR create 2 new visualizations
Figure 1: [Description of visualization]
Analyze what narrative or message this visualization communicates.
Critically examine what is obscured, minimized, or excluded from this visualization.
Figure 2: [Description of visualization]
Analyze what narrative or message this visualization communicates.
Critically examine what is obscured, minimized, or excluded from this visualization.
Figure 3: [Description of visualization]
Analyze what narrative or message this visualization communicates.
Critically examine what is obscured, minimized, or excluded from this visualization.
If this data was fed into an algorithm, who would be harmed?
Describe a specific way this data could cause harm if used algorithmically.
Connection to readings: Link to course concepts (e.g., data feminism principles, power structures, etc.)
Describe another potential harm.
Connection to readings: Link to course concepts.
Describe a third potential harm.
Connection to readings: Link to course concepts.
How does the system make decisions? Describe the algorithmic logic and decision rules.
What is it being optimized for? Identify the objective function or goal the algorithm pursues.
Figure: Decision-making flowchart showing how the algorithm processes inputs and generates outputs
Analyze how this system creates compounded harm across multiple dimensions of identity and power.
Describe a specific harm and how it compounds across intersecting identities (e.g., race + gender, class + disability, etc.)
Connection to readings: Link to course concepts about intersectionality and data justice.
Describe another intersectional harm.
Connection to readings: Link to course concepts.