Last winter, Rahul Shukla and seven of his peers came together to create Northwestern’s Open Data Initiative, affectionately known as NODI. While working on a class project to find data on Northwestern,the Weinberg fourth-year grew frustrated ashe combed through inaccessible PDFs in the campus data ecosystem. So, he took on the task of creating a centralized portal for all of Northwestern’s data.
By creating a portal that is both accessible and user-friendly, Shukla and his team say they hope to create more engagement with data across the Northwestern community.
Finding and sorting through data is time-consuming, Shukla says. Most university data at Northwestern is stored as PDFs rather than Excel sheets, making it harder to conduct analyses.
The portal aggregates data sets such as Northwestern’s operating budget for fiscal year 2018, total enrollment by ethnicity, gender and campus for fall 2019 and tuition and fees by program (from 1998 to 2020).
Currently, these data sets are available to certain offices within the University. Assistant Vice President for Information and Analytics within Northwestern’s Office of Administration and Planning Amit Prachand calls these the "data stewards." These include administrative offices like Human Resources, Student Affairs and the Office of the Registrar, but NODI would make these public data accessible to students as well.
This project was partially inspired by Shukla’s friendship with Nik Marda, a fourth-year at Stanford University. Marda is one of the co-founders of Stanford’s Open Data Project and supports creating a campus culture of data engagement.
Marda and his co-founder, Stanford fourth-year Arjun Ramani, realized the need for an open data portal after working for The Stanford Daily. Ramani, who has a keen interest in data journalism, echoed an idea from a Stanford professor that “the plural of anecdote is data.”
“The tool of the journalist is oftentimes the interview,” Ramani says. “It would obviously be enhanced a lot by having many, many different anecdotes all pulled together in the data set.”
Ramani realized there were two likely reasons why such a centralized data portal didn’t already exist: The University may not have invested enough in the idea, or it may have intentionally decided not to publicize some data. In either case, Ramanai and Marda set to work. By September 2019, their site was fully functional.
The Stanford duo’s open data project was directly inspired by their journalistic endeavors. According to Shukla, the Northwestern team had similar motivations, hoping to provide resources for on-campus publications’ use.
“In an age of misinformation, data is everything. You can ground arguments in data, and data effectively becomes the ground truth for a lot of essays or articles or policies,” Shukla says. “There's another point to consider, which is that data may not often tell the entirety of the story.”
Both groups are advocating for better open data policy on their campuses and nationally, and pursuing partnerships with students organizations and other stakeholders.
Data governance entails looking at the potential for harm, weighing risks versus benefits and verifying the reliability of data. One college-specific problem arises with the use of self-reported data, which is often collected from campus surveys. Since it is harder to verify the reliability of this data, they need to make additional risk assessments, according to Ramani.
“I do think it's important to have [usable] information ... but I also think it's critical that there are questions that precede the use of the data,” Prachand says. “What is it that the data might help answer? My hope is that this [project] may spark additional interest in asking the question and finding out what data can be available more publicly or in a way or format that is more digestible.”
Some frameworks for student privacy protection already exist. For example, Prachand’s office will not publish demographic data if fewer than five to seven students hold a particular identity and the specificity of the data could put them at risk of identification.
For Marda, the mission of the project goes beyond the data of any one student, reflecting a larger need to safeguard and empower those who are more vulnerable.
“Good data governance isn't just about protecting the privacy of individuals, which is often how it's framed,” Marda says. “It’s not just about harm reduction but also about how we can make sure that open data serves the interests of communities that are often not served by existing systems and stuctures.
In that vein, the open data projects aim to engage student activists, allowing them to organize petitions or letters addressed to the universities around data sets available through the portal. They would also include a website listing data sets that have been requested by community members that the University has failed to provide.
“It’s also about making sure that the concerns of communities at the greatest risk of harm are considered throughout the entire data life cycle, from when data is collected to how it’s cleaned to where it’s published and then how it’s used afterwards,” says Marda.
Dealing with open data can raise ethical concerns, especially in a university context. Questions surrounding the integrity of the data, what data is available, who has access to it and how this can be controlled need to be addressed before any data can be published for public use.
In an effort to collaborate, some students on Northwestern and Stanford’s data teams are working on a handbook for other students to start their own campus data portals, planning to meet once a week in spring quarter to develop the guide. It will include a data governance section that details how to select data and the considerations that need to be taken. While much of the content is borrowed from existing data governance literature, according to Marda and Ramani, the principles had to be distilled to be useful for creating data portals at universities.
Having an institution-wide approach to cleaning up and defining data puts a standard structure in place to catch and rectify errors, Prachand says. He warns, “if we each have our own version, there’s no truth.”