MOD Sandbox Workshop: Planning, Regulatory, and Privacy Perspectives on Data from MOD Partnerships

By June 27, 2018July 2nd, 2018Events, MOD Sandbox, Transit

The MOD Sandbox Innovation & Knowledge Accelerator workshop concluded on March 13, 2018 with a panel session on issues and potential solutions around data sharing in MOD partnerships.

Moderated by Prashanth Gururaja of SUMC, he framed the panel discussion around the challenges that have been encountered in a number of the MOD Sandbox pilot projects. The transit agency’s data needs are driven by four basic principles – planning, operations, accounting, and auditing. Depending on the need, the level of detail can vary. For example, courser data is often sufficient for planning purposes but for accounting and auditing, finer data is more useful. Prashanth went on to note that providers’ concerns around data sharing are often related to trade secrets, competitiveness, rider privacy, and public records disclosures. Thus, a tension around data sharing has arisen due to these seemingly dichotomous sets of concerns. However, a variety of possible pathways to overcome this tension are starting to be explored.

With the topic laid out, Bob Sheehan from US DOT then spoke about the importance of creating a standard accessible data platform and a set of standard metrics to evaluate MOD projects, including examples from research from universities.

Adam Cohen, from the University of California – Berkeley, talked about the independent evaluation that UC Berkeley is tasked to do for MOD Sandbox program. He spoke about the process that they undertake starting with a logic model where they outline the desired outcomes, identify each project’s guiding principles, including the hypothesis, establish the performance metrics, identify and collect data, and define the methodological approach. Data privacy and protections are highly valued in this process and are stored on a secured server with institutional review board (IRB) oversight. Once in place, the data are then aggregated and sanitized for use in reports that will be shared with the DOT. Breaking down the decision-making procedures to successfully evaluate project metrics is a necessary step for entities to hit the target when striving for optimal project performance.

Marla Westervelt, from Los Angeles Metro, mentioned how LA’s project with Via was created to provide first and last mile connections for residents between transit stops and the end destination. Her presentation elaborated on data collection and the need for qualitative and quantitative analysis for new mobility. Marla stressed that it is important for transit agencies to have access to data to fully grasp ridership behavior and work with shared mobility providers, treating challenges around data sharing as problems to solve rather than impenetrable barriers.

LA Metro is planning to use the data for understanding how the pilot program can be scaled to a full program, to understanding customer behavior, service auditing, and research. The research presents a series of data needs where they might not fully understand what all the needs are at this point or what questions they would like to answer. Having access to the data itself allows for greater flexibility and could help program facilitators answer the questions that arise as a pilot project matures. Identifying your questions and your project goals are critical to having a constructive data conversation.

Marla also offered different ways to share data under consideration by LA Metro, including third-party aggregators, creating a data warehouse with pre-selected queries, and also called for a set of public data standards. The first two options offer potential streamlined solutions but at the same time, are limited to the flexibility that is needed because of the underlying difficulty from underdeveloped results in the early stages in the program timeline. Possession of the actual data with legal protections against public disclosure is the most powerful approach LA Metro is using to analyze and learn from their project.

Meg Young, from the University of Washington (UW), stated the importance of having data protection measures in place similar to something that is used by HIPAA, given the sensitive nature of the origin and destination data that are now being collected. Understanding that location data is sensitive and can be used to identify individual users is a top priority, and it was a resounding theme by Meg and other presenters that this data should be protected.

Todd Plesko, from Dallas Area Rapid Transit (DART), discussed why transit agencies need to have access to MOD partnership data. As app-based services become more widely available, so does the level of detailed data that are available–including detailed origin and destination trips by date and time of day. DART’s data needs stem from a necessity to validate that a contracted trip occurred and for service planning purposes. DART’s position is that they have a right to access to data for TNC trips that they are subsidizing. On the other hand, TNC’s are concerned about sharing personal data for privacy and competitive edge considerations, despite assurances by DART assuring privacy protocols are in place. DART took proactive steps to encourage the Texas legislature to protect transit riders’ privacy by exempting data collected through their GoPass app from Freedom of Information disclosures. The legislation, which became effective in 2015, protects personally identifiable and transactional information, such as the name, email address, and location information.

Todd acknowledged that there is also a gap in terms of transit agencies be able to process large datasets, so it often sits there and is not used. While transit agencies are good at collecting data, capacity constraints often limit how much they can learn from it. Having access to data scientists could help to change this.

Todd concluded his presentation showing a map microtransit destinations at a detailed level that later sparked a debate on whether or not it is appropriate to show that level of detail given it can be used to identify who is actually using the service.

Matt Daus of the International Association of Transportation Regulators (IATR) and former member of the NYC Taxi & Limousine Commission (TLC) spoke about methods and examples of governments having access to private data, including agreements between government entities and companies, regulations, and Freedom of Information Laws/Acts (FOIL/FOIA).

He explained that while FOILs are designed to create transparency in governments by requiring them to release information upon request from a member of the public, exemptions typically exist to protect releasing certain data that could result in an unwarranted invasion of personal privacy, are needed for law enforcement, or could reveal trade secrets. These exemptions can vary from state to state and at the federal level.

Matt then highlighted several examples of cities using regulations to require data from TNCs for specific legitimate government purposes, along with corresponding exemptions. Examples from Boston, Chicago, NYC, and California showed specific but varied data reporting requirements around trip information, spatial aggregation levels, wait times, driver information, vehicle information, and other information that were necessary for safety and enforcement purposes.

Matt stated that protection of passenger data is an important issue in Sandbox projects. Important legal questions are whether cities and states can enter into agreements that contradict FOIA laws, whether federal grant programs or funding authorizations should require ways to protect data, and whether FOIA laws apply to third-party entities who hold the data. In the end, Matt speculated that perhaps the concept of a third-part lockbox is where this may end up, and recommended that state and federal legislatures amend FOIA laws to address the issues discussed.

In the ensuing discussion, the panelists talked about how the Sandbox program is a great opportunity to evaluate the assortment of data sharing approaches being considered or implemented – ranging from course aggregation to detailed data sharing to third-party repositories – for their effectiveness in delivering information about the project performance for the transit agency. This could potentially inform a national standard or best practice around MOD data sharing agreements.