Abstract
Most data governance frameworks are designed to protect the individuals from whom data originates. However, the impacts of digital practices extend to a broader population and are embedded in significant power asymmetries within and across nations. Further, inequities in digital societies impact everyone, not just those directly involved. Addressing these challenges requires an approach which moves beyond individual data control and is grounded in the values of equity and a just contribution of benefits and risks from data use. Solidarity-based data governance (in short: data solidarity), suggests prioritising data uses over data type and proposes that data uses that generate public value should be actively facilitated, those that generate significant risks and harms should be prohibited or strictly regulated, and those that generate private benefits with little or no public value should be ‘taxed’ so that profits generated by corporate data users are reinvested in the public domain. In the context of global health data governance, the public value generated by data use is crucial. This contribution clarifies the meaning, importance, and potential of public value within data solidarity and outlines methods for its operationalisation through the PLUTO tool, specifically designed to assess the public value of data uses.
Introduction
Researchers and others are responding to the increased use of artificial intelligence (AI) in more and more areas of public and personal life (see e.g. Berryhill et al., Citation2019; Farah, Citation2023) with criticism. Sharon and Gellert (Citation2023) criticise the ability of big tech companies to access and use public data to develop proprietary products and services without providing fair compensation to the public sector. In health and healthcare, for instance, large datasets held by government actors (e.g. NHS in the United Kingdom) have been shared with private companies to build large models to increase the speed and accuracy of disease diagnosis (see Powles & Hodson, Citation2017) leading to criticisms around the conclusion of the data-sharing agreement with insufficient transparency, a lack of safeguards to ensure private companies do not use such data to improve other unrelated products and no consideration of whether this project would create public value (see definition below; see also Kickbusch et al., Citation2021; Shilton et al., Citation2021). Researchers also criticise aspects that extend beyond data-sharing. Academics and civil society actors have noted that the development of large language models, often trained using large amounts of scraped data from the internet, primarily benefits US – and UK English speakers and generally persons inhabiting the Global North (see Nicholas & Bhatia, Citation2023). Underlying both problems are questions related to data governance. These questions include: Whose data is shared with whom? For which purpose and under which condition is this data being shared? Who do benefits accrue to and who is bearing the costs?
As these questions gain importance, this article critically examines the underlying data governance regimes enabling and regulating data practices. We understand data governance in the broad sense of the word as the laws, policies, and informal rules that shape how data are created, used, and protected. We begin by exploring the core logic of existing regimes and highlighting their shortcomings, particularly in generating equitable outcomes. To address these limitations, we introduce the concept of data solidarity, a framework proposed by Prainsack et al. (Citation2022). Data solidarity revolves around public value as its guiding principle. Discussing how public value can be conceptualised and operationalised, we present PLUTO (Public VaLUe Assessment TOol), a tool designed to measure the public value of specific instances of data use. Overall, this article provides a critical analysis of how current data governance regimes perpetuate injustices and inequalities while offering data solidarity as a practical and equitable alternative. By outlining the development process of PLUTO, this paper aims to invite critical scrutiny, encourage collaborative refinement, and enable others to adapt or replicate the tool for diverse contexts and data governance challenges. Our focus, here, lies on data governance’s evaluative dimension: assessing the public value of specific data uses by weighing benefits and risks, and ensuring equitable outcomes. This means considering not only individual rights but broader societal impacts, asking: Does this specific data use generate public value? If so, how do we support it; if not, how do we mitigate harms and redistribute benefits?
Rather than giving an ultimate and definitive answer of what public value is and how different aspects of public value should be weighed, the PLUTO tool is meant to represent a starting point for a wide-ranging discussion about these questions, and give organisations (companies, public bodies, etc.) a prototype that they can adjust and adapt to meet their own specific needs in terms of the questions they focus on and the weights that are given to different responses.
Background: Data governance regimes and their shortcomings
As ever more aspects of people’s lives, environments, and bodies become datafied, how we govern data has become a pressing concern. Categorising data based on data types (personal v. non-personal, human v. environmental, etc.) has been the fundamental logic underpinning data governance in both the United States (US) and the European Union (EU) (see Fazlioglu, Citation2019 for discussion). This model derives from the belief that data type directly correlates with the potential risks posed by its usage. For instance, personal or sensitive information, such as religious beliefs or political affiliations, is deemed to carry particularly high risks if exploited by malicious actors (see e.g. Beldad et al., Citation2011).
Fazlioglu (Citation2019) points to a number of primary challenges that the existing categorization-based data governance model encounters:
-
Emergence of New Data Types: The proliferation of digital data collection has led to the generation of novel data types that do not fit neatly within predefined categories. Consequently, the current framework struggles to address data with evolving attributes (see also Einav & Levin, Citation2014).
-
Triangulation and Context Sensitivity: Convergence of data sources and advanced analytics techniques often transform seemingly innocuous data into sensitive information capable of identifying and potentially harming individuals. The boundary between sensitive and non-sensitive data has become increasingly tenuous (see e.g. Office of the President, Citation2015).
To address these shortcomings, scholars advocate for a paradigm shift towards context-centric data governance (see also Gratton, Citation2013; Nissenbaum, Citation2010; Winter & Davidson, Citation2019). This approach acknowledges that each data point, regardless of its category, can potentially contribute to identifying an individual when combined with other publicly available information (Fazlioglu, Citation2019). Fazlioglu suggests focusing on assessing the likelihood of data linkage, the potential disclosure of sensitive information to unauthorised parties, and the context of data usage when determining necessary legal safeguards. Beyond that, some of us have previously argued that a data-types-based approach is blind to the risks that emerge from the linkage of different (types of) datasets that, cumulatively, determine the risks to people. Finally, the data-types-based approach cannot take into account whether the use of such data generates public value (Prainsack et al., Citation2022).
As discussed above, both the EU and the US data governance regimes focus on the distinction of data by type, but there is a key difference between these regimes. A few exceptions aside, the EU’s General Data Protection Regulation (GDPR) applies universally throughout all sectors, meaning that whether or not a point of data is viewed as personal data does not depend on the context in which it is being used (Solove & Schwartz, Citation2011). Meanwhile, the US follows a strategy where sectoral laws regulate the use of personal data in ‘specific industries or specific contexts’ (Schwartz & Solove, Citation2014, p. 881; see also Flaherty, Citation1992). Thus, while the US approach at least acknowledges the problem of precisely delineating the sensitivity of information, it still does not focus on data use and the risks and benefits associated with specific instances of data use.
The concept of data solidarity emerges as a response to the challenges posed by conventional categories and approaches to data governance. Drawing inspiration from context-centric principles, data solidarity emphasises evaluating data use rather than its category (Gratton, Citation2013). A context-centric perspective categorises data uses as positive, negative, or neutral based on their impact on data subjects (Gratton, Citation2013). Data solidarity goes beyond Gratton’s model to propose that the evaluation of data use should play a more important role in data governance. European policymakers have started acknowledging the significance of contextual analysis in data governance (Article Citation29 Data Protection Working Party, Citation2014). They recognise the need to consider the context and purpose of data processing to effectively address sensitivity and emerging technological trends (Article Citation29 Data Protection Working Party, Citation2014).
Specifically, the proposed data solidarity perspective emphasises evaluating data uses based on their potential and likelihood to generate public value, which we operationalise as the ‘net result’ of weighing risks and benefits to people and communities (we will say more about this in the next section). We now turn to explaining data solidarity in detail.
How data solidarity foregrounds data use over data type
The way in which data solidarity focuses on data use instead of data type becomes most apparent when we examine the three fundamental pillars that form the core of data solidarity (see ). Pillar I states that data uses that create significant public value should be actively facilitated. Pillar II mandates that harms must be proactively prevented wherever possible – such as by prohibiting data uses that are unlikely to yield benefits for people while entailing significant risks. Where prevention is not feasible or not successful, harms must be effectively mitigated. Pillar III argues that where corporate data use has no discernible public value but is conducive to profit generation, the profiteer should be required to share the benefits (including financial profits) with the public.
In the following sections, we will explore each of these pillars and the concept of public value in more detail, on Pillars I and III. In addition, we will highlight the central role that public value plays within these pillars.
Pillar I: Facilitating data use that creates significant public value
The first pillar of data solidarity is to facilitate the use of data that generates substantial public benefits while maintaining an acceptable level of risk. Identifying the criteria for such an assessment is not a straightforward task. In the broader literature, public value is an inherently ‘fuzzy concept’ (Rutgers, Citation2015, p. 40) that has multiple meanings and uses. One discussion on public value focuses on specific values deemed essential for a thriving public sphere and society, such as justice, democracy, and integrity. Another employs public value as a broad, generic concept which refers to the whole pantheon or range of possible public values. The literature gives limited guidance to how the concept could be applied to specific data uses. Moreover, it is impossible to simply assume that data uses likely to generate significant benefits possess a high degree of public value. Even seemingly clear-cut cases of data use promising to benefit public(s) can carry significant risks.
Consider Google’s Sidewalk Labs project, which aimed to build a technologically advanced and sustainable smart city in Toronto, Canada (Wylie, Citation2018). Sidewalk Toronto sought to address urban challenges such as traffic congestion, affordable housing, and climate change through data and technology, positioning itself as a model for future cities. Central to the project was the collection of data via sensors and the exchange of personal information for tailored services. While its goals of improving quality of life and fostering urban innovation appeared promising, significant risks emerged. Key concerns included insufficient details about data usage and privacy protections, raising fears of ‘surveillance capitalism’ (Zuboff, Citation2019), and unclear governance regarding data ownership and decision-making. These issues highlight the challenges of defining and measuring public value, which we address in the next section.
Pillar II: Harm prevention and mitigation
Data solidarity requires strict legal measures to prevent data usage that poses unacceptable risks. It also acknowledges that the employment of data, even for the benefit of people and communities, carries the potential to cause harm to individuals and groups. Data solidarity does not disregard this fact; rather, it aims to ensure that individuals who have been harmed receive the necessary support. As harm may occur without any legal violations or without the possibility to identify the exact act or omission that caused the harm, the present legal recourse is deemed inadequate (Prainsack et al., Citation2022). To strengthen collective responsibility for consequences arising from data use, it is crucial to enhance tools not just for minimising risk, but also for mitigating harm. Some of us have previously described how this might be done in detail (McMahon et al., Citation2020)
Pillar III: Sharing corporate profits with publics
The third pillar of data solidarity seeks to ensure that some of the profits generated by corporate data users come back into the public domain, particularly in cases where data use does not generate significant public value. Such reinvestment could be realised through benefit sharing efforts under legally binding agreements that make corporations accountable to the public and transfer control to the affected community. Taxation of data and data usage presents another means of addressing issues in the digital economy. Currently, services obtained through sharing one’s personal data, rather than paying money, benefit from a de facto tax exemption (see e.g. Marian, Citation2022). This exemption causes concerns about increasing social and economic inequities, particularly when digital businesses reap significant profits without contributing adequately to public revenue.
Proposed solutions involve implementing binding regulations for a corporate minimum tax to combat tax evasion, specifically by digital platform enterprises (see OECD, Citation2021). Such a global corporate minimum tax would guarantee that gains derived from digital personal data receive sufficient taxation, even if tax amounts are not based on data usage intensity. In light of the current debate surrounding data usage, it has been suggested that data taxes could be implemented, either on the data itself or on its use by corporations that do not pose unacceptable risks but fail to significantly contribute to public value. The resulting revenue could then be allocated towards various initiatives, such as compensating citizens for potential risks associated with their data and addressing digital inequalities to promote equity.
Conceptualising and measuring public value (in pillars I and III)
A crucial tenet that cuts across all three pillars of data solidarity is the concept of public value, an aspect that data solidarity sees as intricately linked to the use of data, rather than to the inherent characteristics of data types themselves. The significance of public value within the data solidarity framework is evident throughout, but the effective implementation of Pillar I and Pillar III hinges on a comprehensive conceptual grasp and practical interpretation of public value. In contrast, for Pillar II, a profound understanding of harm emerges as the central concept to guide its implementation. For Pillar I we cannot know which data use to facilitate unless we can somehow evaluate its public value, while for Pillar III we cannot know which data use to ‘tax’.
In brief, public value in the context of data solidarity emphasises the fair distribution of benefits and risks, prioritising collective benefit while minimising harm. This approach diverges from traditional frameworks by embedding collective control, accountability, and equity into digital practices, and it requires that data use aligns with broadly accepted societal goals, such as sustainability and social equity. In summary, public value within data solidarity can be characterised along the following four dimensions:
-
The value of data is contingent on how it is intended to be used, as highlighted by Wilson et al. (Citation2020) and Bozeman (Citation2007), which is why data solidarity focuses on the public value of specific data uses rather than the value of different types of data.
-
Public value results from the weighing of risks and benefits that are expected from specific instances of data use. Public value is attained when it can be reasonably assumed that the use of data will result in clear benefits for the public, and when there is a minimal risk of significant and undue harm to any person or group (Prainsack et al., Citation2022).
-
In the context of the data solidarity framework, the creation of public value requires attention to the power imbalances inherent in the digital data economy. Consequently, in alignment with the core principle of data solidarity, public value is higher when instances of data use mitigate these imbalances. For example, larger corporate entities hold a greater responsibility for public value due to their significant power in the public sphere. The focus on fairness and equity in digital practices leads to increased public value when benefits are expected to reach underprivileged groups, given their lower starting point and potential for substantial impact (Prainsack et al., Citation2022).
-
Given the lasting and global impact of digital technologies on people’s lives across multiple generations, it is essential to have a practical and adaptive notion of public value that is not based solely on the current assessment of risks and benefits (Bozeman, Citation2019; Huijbregts et al., Citation2022). Instead, it should be flexible enough to take into account foreseeable future benefits, especially in the context of the long-term impact of digital technologies and the urgent need for sustainability in the face of global challenges such as climate change.
In summary, our operationalisation of public value looks at whether or not specific instances of data use address power imbalances, maximise benefits and minimise risks, distribute benefits and risks fairly and equitably, and ensure data use is environmentally sustainable. PLUTO can address this measurement of public value. The aim of the tool is to assign each instance of data use into one of four categories (A-D), see below. When risks are low and benefits are high (type A data use), data use should be supported and facilitated. When the risks are low and the benefits are also low (Type B), financial gain must benefit the community and be redirected there; when the risks are high and the benefits are also high (Type C), the risks must be reduced; and when the risks are high and the benefits are low (Type D), the use of data must be strictly prohibited and sanctioned. Public value is highest in the Type A quadrant of data use and towards both edges.
PLUTO was designed to operationalise the principles of data solidarity, focusing on public value generated by data use. By assessing and categorising data utilisation based on its public value implications, PLUTO serves as a practical tool for implementing data solidarity. Beyond this framework, PLUTO’s balanced evaluation of benefits and risks, attention to power dynamics, and emphasis on sustainability make it valuable for other data governance approaches. For example, it can help rights-based organisations assess impacts on fundamental rights or aid accountability-focused strategies in identifying areas for transparency and oversight. Thus, PLUTO is a versatile tool for promoting ethical and responsible data practices across various frameworks.
We developed a questionnaire consisting of 25 questions in four categories. This tool aims to balance benefits and risks in determining the public value of data use, positioning the result in one of the quadrants in . Below, we cover the process of engagement and feedback relied on for devising the questionnaire (and the accompanying tool), before discussing the individual sections and questions in more detail.
PLUTO: Process
In our quest to evaluate the public value of data use, our journey commenced with the creation of a set of questions in early 2023. Initially, we focused on three key dimensions: risks, benefits, and power dynamics within the data landscape. To seed the development of these questions, we drew inspiration from several sources. We examined existing ethical frameworks for data governance, looking for common themes and points of emphasis. We also analyzed real-world case studies of both positive and negative impacts of data use, seeking to identify recurring patterns and potential pitfalls. Furthermore, we engaged in preliminary discussions with colleagues and experts in related fields to gather diverse perspectives and insights. It was imperative for us to understand who the data-using entity was. For instance, we recognised that a small non-profit organisation might face different challenges and opportunities in maximising public value compared to a multinational corporation. Furthermore, we tried to acknowledge and reward those entities that actively pursued risk mitigation, harm prevention, and rigorous impact assessments for their products and models. This led us to consider questions related to institutional safeguards and responsible data practices. Conversely, a lack of such responsible practices would be penalised. The initial question set was deliberately broad, aiming to capture a wide range of relevant factors before refining and focusing the questions through subsequent testing and feedback.
The development of the tool benefited from the contributions of fellow academics, policy makers, and in particular from our collaboration with the Visualisation and Data Analysis Research Group at the Faculty of Computer Science (University of Vienna). Working together, we transformed these initial questions into an online tool for assessing public value.
In the spirit of transparency and collaboration, we released a first version of the questionnaire in the spring of 2023. We shared this version with 25 experts who had made substantial contributions to the field of data ethics or data governance and were interested in the concept of data solidarity. We introduced the tool to them and provided thorough step-by-step presentations at two separate events: a summer school and a winter school. During these sessions, we engaged with graduate students who were also interested in the topic.
Our engagement with this group of experts and graduate students revolved around the following questions:
-
Was the purpose of the tool clear?
-
How did you experience the structure of the tool?
-
Which questions stood out and why?
-
Were any questions missing, and if so, which ones?
One recurring theme in the feedback that we received was the need for clarity regarding whether questions were intended for individuals or organisations. Suggestions for improvement included incorporating a question about the data users’ identity or providing illustrative scenarios at the outset of the tool. Another common concern focused on transparency in the final result calculation, especially concerning the weighting of questions. Respondents suggested either introducing a separate methodology page or providing more information at the beginning or end of the tool.
The terminology and concepts within the tool, such as dual use, risk versus harm, and benefits, required clearer explanations. Respondents also highlighted challenges related to accessing information and recommended guidance on suitable evidence sources for each question. They also raised concerns about the trustworthiness of information provided by companies about their data practices and questioning if a simple ‘tick-box’ approach was sufficient.
In summary, the feedback we received highlighted the importance of clarity, transparency, and flexibility in the tool’s design. Respondents emphasised the need to consider diverse perspectives, including ethical expertise and environmental impact, while also recognising the difficulty of ensuring accurate information in a rapidly evolving data landscape.
On the basis of this feedback, we produced a second iteration of the tool. We also decided that question weighting might need to change over time, so we developed a more dynamic and flexible approach to adjusting PLUTO. Rather than aiming at the creation of a perfect tool, we decided to produce a prototype that communities or organisations seeking to build the tool into specific workflows could adjust and amend for their own purposes. Our collaborators helped us build a system that allows us to fine-tune the way questions are weighted and to determine how each question affects the assessment of benefits and risks. We also added ‘info’ boxes to provide supplementary information for users who might have difficulty understanding the questions. When piloting the tool, we considered whether it should be broadly applicable or tailored to specific domains. The adjustable question weighting in the backend not only facilitates the use of PLUTO but also serves as a blueprint for customising the tool to specific domain needs. Thus, our tool is both adaptable and configurable.
After the pilot phase, we conducted a second feedback round with experts in data ethics, including professionals from philosophy, political science, science and technology studies (STS), sociology, law, and medicine. Based on their input, we improved the tool’s clarity and usability. Technical feedback focused on refining question wording, adding contextual information boxes, and enabling PDF downloads of results – all easily addressed through our Content Management System (CMS). We also manually tested PLUTO with real-life cases to ensure its quantitative scores aligned with qualitative assessments of data use benefits and risks. By iterating question weightings, we ensured the tool accurately reflected varying levels of benefits and risks. Further reviews from experts enhanced the questionnaire’s relevance, clarity, and comprehensiveness.
The content of PLUTO
PLUTO tool comprises 25 questions organised into four categories. The four categories are:
-
Information about the Data User: This section investigates the identity, motivations, and intentions of the data user to ensure that users who tend to use data in ways that benefit public value (e.g. a climate NGO) are treated differently from actors that do not (e.g. a large advertising company).
-
Benefits of Data Use: Within this category, we explore the positive impacts that data can be used for and who these benefits manifest for within and across societies.
-
Risks of Data Use: PLUTO takes a detailed look at potential challenges and pitfalls associated with data usage, and who these risks are highest for within and across society.
-
Institutional Safeguards: We also examine the measures data users put in place to minimise risks, maximise benefits, monitor and react to potential emergent harms.
For an in-depth exploration of each category and access to the complete set of questions, please refer to the full questionnaire in the appendix on the PLUTO website. Below we outline our reasoning behind the inclusion of each category along with a number of example questions.
Information about the data user
In this category, the fundamental inquiry revolves around the identity and intentions of data users. The guiding logic underscores the varying capacities and resources of entities engaging in data use, recognising that small NGOs, with their limited resources, may have distinct objectives compared to large corporations. For example, small NGOs may primarily analyze data to improve police accountability (see e.g. Grillo, Citation2021; Invisible Institute, Citationn.d.) whereas large corporations often harness data to enhance operational efficiency. Given the discrepancy in resources, it is unrealistic to expect identical outcomes in terms of maximising the benefits of data use, particularly concerning underrepresented groups.
The questions in this category are designed to elucidate key aspects of data users, including their nature (e.g. individual researchers, private companies, small NGOs, etc.), their intended data use (e.g. basic research, product development, etc.), their financial structure (reliance on numerous small backers, one significant donor, etc.), and their track record in acting transparently. These inquiries collectively contribute to a nuanced understanding of data users and the potential impact of their data-driven endeavours, especially considering resource disparities.
Benefits of the data use
Our conception of public value relates to the weighing of benefits and risks. What constitutes a benefit depends on two primary considerations. Firstly, we place importance on identifying the beneficiaries. Our objective is to acknowledge and reward data users with a proven track record of ensuring that the benefits of data use extend beyond high-income countries. This aims to address the current bias, where many data applications disproportionately favour wealthy nations. Specifically, our focus is on groups from low and middle-income countries and individuals protected under anti-discrimination laws. We are still determining the specific anti-discrimination law and jurisdiction that apply, and how conflicts between different requirements ought to be resolved.
The second aspect in assessing benefits pertains to sustainability. Given the global impact of climate change on ecosystems and lives, the developers of this tool consider data uses that contribute positively to environmental conservation and the well-being of future generations as beneficial and therefore increasing public value. This underscores the broader societal implications of data use beyond immediate gains. Our assessment questions are carefully designed to delve into the track records of data users. By examining past actions and impacts, we aim to move beyond subjective self-reporting, ensuring a more objective evaluation. Moreover, we have adjusted the weighting of criteria to ensure fairness, particularly for new organisations that may not have a substantial track record. This approach prevents undue penalties for smaller or younger entities. In summary, our assessment framework for public value in data use seeks to address disparities in data benefits. We focus on equitable distribution to marginalised and underserved populations and environmental sustainability. Through rigorous examination and judicious weighting, our approach seeks a nuanced and fair evaluation of data users’ contributions to the broader public good.
Risks of the data use
In assessing the risks associated with data use, we focus on who carries how much and which type of risk. One crucial aspect is evaluating whether the risks disproportionately affect individuals or groups who are already vulnerable, marginalised, underserved, or susceptible to multiple risks. If such disparities exist, it is considered a significant drawback and is penalised in the assessment process. Ensuring equity and fairness in the distribution of benefits and risks is a key priority. Environmental considerations are integrated into the risk assessment. The potential negative impacts on the environment, as well as consequences for future generations, are considered. Sustainable and responsible data use is emphasised to mitigate these risks. Open communication of the risks associated with data use is a central element. Data users are expected to be transparent about the potential risks, allowing stakeholders and the public to make informed decisions and take appropriate precautions. The risk of ‘dual use’ was included in the assessment and later removed, for two reasons: First, because most people filling in the tool are not familiar with the concept of dual use, which seriously limits the use of including this question. Second, even for those users that are familiar with the concept, the substantive criteria that need to be met for an instance of data use to be considered dual use vary between jurisdictions. Instead of asking about dual use in these terms, we included a question asking about steps being taken to ‘to limit the likelihood that the data use creates negative, unintended consequences’. This wording materially captures dual use.
By incorporating these questions into the public value assessment, we examine whether data use is not only beneficial to people and communities (rather than merely creating financial profits) but also responsible, equitable, and what kind of environmental and social impact it is likely to have. Balancing the risks and benefits of data usage is essential for making informed decisions and promoting ethical data practices.
Institutional safeguards
Institutional safeguards are crucial for mitigating risks in PLUTO’s data usage. We assess whether users have comprehensive risk assessments to address issues like data security, privacy, and misuse. PLUTO rewards assessments with binding outcomes to ensure recommendations are taken seriously. We also evaluate stakeholder diversity in the process to incorporate multiple perspectives. Additionally, we check for mechanisms to monitor emergent harms and the ability to respond swiftly to minimise their impact. In summary, institutional safeguards ensure responsible data use by focusing on risk assessments, binding outcomes, stakeholder involvement, proactive monitoring, and responsiveness to new risks.
Following the development and refinement of PLUTO, we are now testing its value for different types of users through a series of workshops and pilot implementations. These initiatives involve collaborating with diverse stakeholders across various sectors to assess the tool’s usability in real-world scenarios, identify any remaining ambiguities or practical challenges, and gather feedback on its efficacy. Our initial focus is on the health data sphere, recognising both the significant potential benefits and risks associated with health data use and the existing support for data solidarity principles in this area (e.g. DTH Lab, Citation2023). However, we are also actively exploring PLUTO’s application in other domains, including the legal field and the development sector. This broadened scope reflects the flexible and adaptable nature of the tool, designed to accommodate diverse values and priorities. The positive feedback received during the tool’s development, along with the versatility afforded by its modular design and adjustable scoring system, gives us confidence that PLUTO can be successfully implemented across various sectors. These diverse engagements will inform further refinements to PLUTO and contribute to the development of context-specific versions tailored to the unique needs and priorities of different sectors.
Limitations of PLUTO and data solidarity
While Data Solidarity and the PLUTO tool offer valuable insights and opportunities for enhanced data governance, they also have limitations. These limitations stem from the complexity of the data landscape, the evolving nature of the concept of public value, and the normative choices made in their development.
One of the primary limitations of the PLUTO tool is the reductionism inherent in any tool that transforms complex qualitative assessments into quantified inputs. The tool may not fully capture the intricate nuances and multifaceted aspects of real-world data uses, which are often influenced by factors beyond the four categories prioritised in PLUTO. We acknowledge that a tool focusing on four categories cannot encapsulate the full complexity of data use. At the same time, we believe this focused approach provides a valuable starting point for assessing public value. For instance, an argument could be made that a large company using data for targeted advertising creates public value if personalised ads lead people to consume more sustainable products. PLUTO’s categories may not adequately account for such complex and context-specific scenarios. This highlights the need for ongoing refinement and adaptation of the tool.
There is a great diversity of data uses in contemporary societies. The effects of data use are often influenced by factors that extend well beyond the categories employed by PLUTO. While PLUTO does not encompass every possible data use context, its current categories represent a much-needed foundational framework. We anticipate that future iterations of the tool will address a broader range of applications as the tool evolves. In the current data-driven world, the range of applications and their implications are constantly expanding. Rather than attempting to capture every single use case immediately, we have prioritised creating a flexible tool that can adapt to these changes. This is an area where further development and user feedback will be crucial. As a result, the tool may struggle to comprehensively evaluate data uses outside the predefined categories. Having said this, the tool’s nature as a prototype that can be adjusted and amended to suit changing data practices, or the particular concerns and needs of specific organisations or communities (e.g. an organisation working on the green transition could place specific emphasis on climate-related and other environmental impacts of data use), mitigates this concern.
To facilitate the concept of data solidarity, it is essential to attach numerical or relative values to different data uses. While there are inherent challenges in quantifying something as complex as public value, this is a common challenge within the field of data governance and one that PLUTO, in its current form, does not fully resolve. However, we believe that the structured approach offered by PLUTO provides a more consistent basis for evaluating public value than relying solely on qualitative judgements. Furthermore, answering the questions in the tool can be influenced by subjective judgments, which may result in disputes over the assigned scores. We acknowledge this subjectivity and are exploring ways to enhance objectivity and transparency in future versions.
Public value is a dynamic and evolving concept, which means its interpretation may vary across different contexts, stakeholders, and time periods. PLUTO’s normative choices in defining public value may not align with every perspective, and disagreements over the very meaning of public value persist. The evolving nature of this concept introduces a level of uncertainty into the tool’s assessments. Public value, as a concept, is not static; it evolves in response to societal changes, technological advancements, and shifting values. As a result, PLUTO will need to adapt its criteria and definitions over time to remain relevant. Also for this reason we decided to keep PLUTO at a general level (i.e. the questions asked are applicable to different types of uses in different contexts). This widely applicable version of PLUTO can hence serve as a basis for context-specific versions of the tool that reflect contemporary or particular understandings of public value.
The question of how PLUTO can be institutionalized and whether its use can be made mandatory raises further complexities. Implementing PLUTO as a mandatory framework may require changes in existing laws and regulations, and this process could be met with resistance or difficulties in enforcement. The potential legal implications and the need for institutional support are important aspects to consider when contemplating the widespread adoption of PLUTO. We believe that public value assessments with a tool based on the PLUTO concept have a place in various fields of data use (e.g. public bodies could decide to prioritise data access requests by applicants whose suggested data use is likely to generate a lot of public value; regulators could decide not to grant licenses to data users who propose data use with negative public value, etc). While we advocate for regulators and policy makers to pay more attention to the public value of data use, we continue to improve the PLUTO prototype. The data gathered from real-world applications will inform further refinements, ensuring the tool’s effectiveness and reliability before widespread implementation. PLUTO’s flexible design allows adaptation to diverse contexts and priorities, while its focus on transparency and stakeholder engagement fosters a more inclusive decision-making process. These features, we believe, make PLUTO a valuable tool for navigating the complex data landscape. To operationalise data solidarity, it is essential to define and distribute responsibilities clearly. Policy interventions, such as implementing data taxes or establishing Harm Mitigation Bodies, require action from policymakers at national and supranational levels. Conversely, the practical implementation and daily operation of harm mitigation mechanisms is the responsibility of the data-using organisations themselves. Collaborative partnerships between public bodies (e.g. Findata), research institutions, and nonprofit organisations are often the most effective way to facilitate beneficial data uses. The development and application of assessment tools like PLUTO, which are crucial for evaluating public value, can be spearheaded by academic or civil society groups, potentially with public funding support. This distribution of labour aligns responsibilities with the relevant expertise and capacities of diverse actors, fostering a collaborative approach to realising data solidarity.
PLUTO is envisioned as an evolving tool. We recognise its current limitations but believe that releasing it in its current form, and actively soliciting feedback, will enable faster development and refinement compared to waiting for a ‘perfect’ version. Domain-specific versions of the tool may have different weightings and adapt to unique contexts and interpretations of public value. This iterative approach to development allows us to respond to user needs and evolving understandings of public value. The tool’s creators anticipate ongoing refinement and expect the broader research community to contribute to its development, thereby addressing some of the challenges identified. We are actively seeking input and collaboration to enhance the tool’s functionality and address the identified limitations. In response to these challenges, several measures have been taken to enhance the transparency and adaptability of PLUTO. At the website where the tool is available, an appendix is provided for each new iteration of the tool (on the publicly available website), offering detailed explanations of the weightings assigned to various factors, promoting transparency and allowing for weightings to be contested. Additionally, the content management system (CMS) feature enables easy adjustments to be made to weights, questions, answers, and additional information by authorised people, even if they have only limited computer or programming knowledge. It is also important to treat PLUTO as an aide in making a structured assessment – and supporting a structured debate – about public value is, and not as a tick-box exercise yielding an infallible outcome. Moreover, the use of PLUTO, and the results that it yields, should be seen in the context of the values that underpin the data solidarity approach.
In summary, while data solidarity and the PLUTO tool hold promise for advancing data governance, they are not without limitations. Their reductionist nature, challenges in quantifying data value, evolving interpretations of public value, and potential legal and institutional hurdles all require careful consideration. As data governance continues to evolve, addressing these limitations is essential to develop a more comprehensive and adaptable framework for evaluating the public value of data uses. To see how PLUTO could apply in a real-world scenario, we return to the example of Google’s Sidewalk Labs project mentioned earlier. This ‘smart city’ project, intended to use comprehensive data collection to improve urban infrastructure and services, raised significant concerns about privacy, data governance, and public accountability despite its potential for public value. By using PLUTO’s structured approach – which assesses data user information, benefit distribution, risk evaluation, and necessary safeguards – stakeholders could systematically evaluate the advantages and limitations of such data practices. PLUTO, for instance, would allow stakeholders to clarify how benefits from data collection might be equitably distributed across the population and highlight high-risk practices, such as continuous geolocation tracking, that would benefit from mitigative measures. We aim to expand on this with comprehensive case studies in future work.
Conclusion and outlook
In this paper, we have highlighted the inadequacy of existing data governance regimes, which predominantly rely on categorising data by type. These conventional frameworks have become increasingly obsolete in the era of digital data proliferation and the potential for data linkage. Such developments have rendered the traditional boundaries used to segregate different data types brittle, prompting the need to adopt a more context-driven approach.
Our proposed solution, data solidarity, represents a significant paradigm shift. It emphasises the significance of data uses over data types in determining the appropriate policy responses. In essence, our framework advocates that data uses generating high public value, characterised by high benefits and low risks, should be actively facilitated. Conversely, those data uses involving high risks and high benefits should have their risks mitigated to maximise public value. Data uses with high risks and low benefits must be prohibited, while those with low risks and low benefits should be subject to taxation through financial levies or benefit-sharing agreements.
At the core of our approach lies the notion of public value, which is instrumental in guiding our actions. Public value considerations, in our view, encompass both environmental sustainability and the well-being of future generations. It is essential to recognise that the concept of public value transcends short-term gains and encompasses the long-term welfare of society and the planet. Realising data solidarity requires being able to assess the public value of data use. For this reason, PLUTO, a tool that can be employed by individuals and organisations to evaluate the public value of specific instances of data use, was developed. We are keen on providing insight into the process of developing this tool, with the aim of enabling others to either replicate our approach or learn from our experiences and challenges. We understand that the concept of data solidarity and the PLUTO tool may have limitations. Collaborating with diverse stakeholders can help us identify and address these limitations effectively, ensuring that our approach remains robust and adaptable to changing circumstances.
Looking ahead, we have outlined three key plans for the future. First, we are committed to refining the PLUTO prototype to make it as user-friendly and transparent as possible. This entails fine-tuning the weightings and response mechanisms to ensure that the tool can be easily understood and applied by a broad range of users. Second, it is key to consider versions of PLUTO that are tailored to the needs of specific communities, institutions, or use contexts. We aim to identify specific domains in which specialised versions of PLUTO can be most effectively employed. PLUTO will first be tested and refined in the health domain. This choice is based on two primary reasons. Firstly, this approach has found support within the global health community (see e.g. DTH Lab, Citation2023). Secondly, there are concerns about the inequitable use of health data and the high potential benefits and risks associated with its use. PLUTO aims to address these concerns, collaborating with experts and stakeholders to ensure responsible and ethical deployment in the realm of healthcare. Relatedly, we are looking to explore opportunities to develop PLUTO certificates (similar to Secure Sockets Layer certificates) that indicate the public value of a number of standard data uses. Additionally, we will explore the idea of assigning custodial responsibilities for PLUTO in these sectors and support organisations in its development. We are also committed to maintaining control over the basic PLUTO tool while ensuring that the tool is not misappropriated. Lastly, it is important to study the legal changes necessary to institutionalize data solidarity. We acknowledge the need for legal changes to enhance the impact of data solidarity on current data governance approaches. This includes advocating for legal reforms and normative frameworks that align with the principles of data solidarity and support its implementation in various contexts.
In conclusion, our paper underscores the significance of data solidarity in the evolving landscape of data governance. It emphasises a shift from data types to data uses, with a keen focus on public value and sustainability. The development of the PLUTO tool represents a critical step toward practical implementation, and we are committed to refining and expanding its utility. As we move forward, we invite collaboration, feedback, and the collective effort of diverse communities to advance the cause of data solidarity and its vital role in shaping a more equitable and sustainable data-driven future.
Transparency
While one of the authors is currently employed in industry, the development of the PLUTO tool and the insights presented in this paper were strictly within the academic context.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Discover more about DTH-Lab:
Digital transformations are shaping all aspects of our lives, including health. Digital innovations can help to improve young people’s health and well-being and achieve Universal Health Coverage (UHC) through their application in health systems, public health promotion and prevention, and personal self-management of health status and behaviours. Without good governance, digital transformations can undermine health and exacerbate inequality. As levels of connectivity increase, concerted efforts are required to ensure that digital technologies are harnessed in support of better health and well-being for all and the attainment of the Sustainable Development Goals (SDGs) by 2030.
The DTH-Lab is a global consortium of partners working to drive implementation of The Lancet and Financial Times Commission on Governing Health Futures 2030’s recommendations for value-based digital transformations for health co-created with young people.