top of page

AI will Transform Cities. Can Cities Transform AI?

Jinhua Zhao, J. Phillip Thompson, Ross Gittell, Michael Leong, and Kevin F. Hsu

2. How do cities articulate “good” (and “bad”) use cases of AI?

This section focuses on how cities determine where and when it is appropriate to adopt AI in the public sector. In an environment filled with rapid innovation and vendor-driven enthusiasm, cities must be able to “cut through the noise” and avoid pursuing solutions in search of problems. They need a clear and disciplined way to distinguish between use cases that advance public goals, and those that introduce unnecessary risk or complexity. Clear problem definitions and expectations also benefit the private sector, which will develop many of these applications but does not face the same legal, political, and reputational risks as governments when systems fail or cause harm.

To support more deliberate decision-making, we propose a structured, four-step approach that helps governments identify genuine needs, match technology to purpose, assess risks, and ensure alignment with broader civic priorities. These steps are:

Four-step approach for governments to identify AI use cases

  • Step I: Identify the real problems facing public agencies that AI could help solve

  • Step II: Determine the appropriate AI technology and human-AI persona to solve the problem

  • Step III: Evaluate the risks associated with the city deploying an AI tool

  • Step IV: Determine if the AI tool advances the city’s broader vision of “success”

These steps are intentionally structured to begin with whether there is a core municipal problem that AI can meaningfully address (Step I). Next, they assess whether a technically feasible solution exists (Step II), whether that solution presents an acceptable level of risk (Step III), and finally whether it aligns with the city’s broader “north star” for AI (Step IV).

This sequencing matters because it distinguishes between empirical questions and normative judgments. The first two steps focus primarily on facts: the existence of a real problem and the technical feasibility of a solution. The latter steps introduce value-based considerations, including risk tolerance, public accountability, and alignment with civic priorities. By separating these layers, cities can make decisions that are both analytically grounded and democratically accountable.

Step I: Identify the real problems facing public agencies that AI could help solve

Figure 3 presents a typology of information and data-intensive tasks common to public agencies. These are functions that depend on collecting, processing, and acting on large volumes of structured and unstructured data, such as reviewing applications, allocating resources, forecasting demand, routing services, monitoring compliance, and generating reports. Many of these processes are repetitive and constrained by staff capacity, yet require pattern recognition, judgment, and coordination across datasets. Because AI systems are designed to analyze patterns, generate predictions, classify information, and synthesize complex inputs, such tasks are natural candidates for AI augmentation.

The agency can be conceptualized as comprising a front-end (public-facing interactions) and a back-end (employees, internal workflows, and assets). Data from both sides feed into two broad classes of decision-making: long-term planning and short-term operations. Planning often involves forecasting and resource allocation, while operations require timely, case-level decisions. Outputs from these decisions generate new data that flow back through the agency, creating an iterative feedback loop. AI can potentially improve performance at multiple points in this cycle.

AI and Cities 20251216 White Paper Presentation (2).png

While this framework is general, its underlying processes can be applied across a wide range of public agencies. For instance, a sanitation department could use AI to support real-time route navigation, extract insights from waste-sorting data, optimize collection schedules, and assist in training new drivers. Similarly, a public transit agency could deploy AI to recommend real-time service interventions based on crowding conditions, plan routes using observed demand patterns, and improve reliability through predictive maintenance informed by vehicle condition monitoring. In the education sector, AI could be used to analyze student responses to instruction, adapt curricula over time in response to evolving contexts, and provide personalized, interactive feedback to support student learning.

 

The table below summarizes just some of these examples:​​

Screenshot 2026-02-20 at 2.45.40 PM.png

Step II: Determine the appropriate AI function and human-AI persona to solve the problem

You may have noticed that in Step I, each task had a very specific verb which defined the nature of the AI that could help it - sense, predict, assist, control, or train. By now, it should be evident that not a single tool, but a broad set of techniques that perform different kinds of tasks. This next framework, in Figure 4, allows governments and agencies to identify the right kind of AI implementation for the problem that is articulated, based on two factors: the AI function and the human-AI persona.

 

Down the rows, AI function refers to the core technical purpose that the underlying AI technology is used for. At the simplest level, some systems are designed to describe data or recognize patterns, such as summarizing text, classifying images, or detecting anomalies. Others are built to predict future outcomes, such as forecasting demand, estimating risk, or predicting the likelihood of an event. More advanced systems may be used to guide or control actions, such as optimizing traffic signals, allocating resources in real time, or automating routine workflows. Finally, generative AI systems produce new content, including text, images, code, or synthetic data, based on learned patterns.

Across the columns, human-AI persona refers to how the AI technology is used in relation to human oversight. In a Sensor role, AI compiles, filters, or summarizes information, while humans retain full responsibility for interpretation and decision-making. In an Assistant role, AI provides recommendations or draft outputs that may influence human reasoning, but humans generally maintain the full capacity to make decisions. In contrast, in an Executor persona, AI systems make decisions autonomously as subordinates or peers of humans. In the Expert persona, the AI system is thought to be more knowledgeable than an average human user in a specific domain; the AI tool is used as a consultant or instructor by a human user charged with the task.

Screenshot 2026-02-11 at 7.31.20 PM.png

Table 2 shows which of these functions and personas are best suited for the use cases identified in Step I, for the purpose of trash collection. For example, an AI application that monitors asset conditions (such as trucks and trash facilities) would primarily perform a descriptive function in a sense persona, compiling and reporting information to human operators. In theory, there could be multiple appropriate combinations each use case, depending on the exact problem definition. If that same system analyzes historical performance data to anticipate breakdowns and recommend maintenance schedules, it shifts to a predictive function in an assist persona. If maintenance decisions are automated within certain predefined rules, the application could move further into a control function in an executor persona.

Screenshot 2026-02-20 at 2.45.50 PM.png

These distinctions matter because governments must be clear about the specific problem they are trying to solve before selecting or procuring an AI system. The appropriate combination depends not only on technical capability, but also on acceptable levels of autonomy and oversight. In the next section, we introduce risk as a structured way to help governments determine which combinations are appropriate for different public-sector contexts.

Step III: Evaluate the risks associated with the city deploying an AI tool

As we mentioned in the previous step, it is possible that multiple combinations of AI technologies and human-AI personas could be used to solve a single problem. This is where we can use risks to deployment as a way to move from what is a possible application of AI to what is a responsible application of AI. By evaluating the risks associated with deployment, governments can move from identifying feasible AI applications to determining which uses are responsible in a public-sector context.

In this framework, we define seven categories of risk that arise across three broad areas: how the problem is defined, the likelihood of mistakes, and the consequences of those mistakes. Together, these dimensions help cities calibrate oversight and safeguards to the specific stakes of each use case.

Defining the Problem

 

1. The scope of implementation. Is AI being used to perform a narrowly defined task with a limited and well-specified set of outcomes, or to support open-ended functions that require human-like synthesis, judgement, or use of general knowledge? AI applications with broader implementation scopes, particularly those involving unconstrained generation, carry higher risks.

2. The technical AI function. Is AI being used in the more classical sense to describe and predict patterns seen in data, or is it being used to generate completely new content, recommendations, or actions that do not directly correspond to observed inputs? Generative uses can be more powerful, but introduce greater risk and uncertainty due to less traceability and specifiability.

3. The human-AI persona. Is AI being used to provide information to humans who retain full control over decision-making, or is AI making decisions or even telling humans what decisions to make? Human-AI personas that automate decision making without human accountability carry greater risks. In some cases, adding a ‘human in the loop’ to monitor AI decisions can reduce this risk.

 

Risks of Mistakes

4. Accuracy of AI results. Are the outputs from the AI application consistent with the ground truth, and/or the expected performance of a human performing the same task? This is usually reported in models as a percentage accuracy score for discriminative AI, but is more difficult to assess for Generative AI. AI applications that perform less accurately than human counterparts for comparable tasks run higher risks of degrading overall system performance.

 

5. Potential for biased results. Do the outputs of the AI application perform systematically better under some conditions than others, such as across demographic groups, geographic settings, or environmental contexts? Even when an AI system matches or exceeds human performance in aggregate, differences in the distribution and pattern of errors can produce biased outcomes. Such bias can often occur unknowingly due to disparities in the data used to train AI models. AI applications that exhibit larger performance disparities pose higher risks of inequitable impacts.

Consequences of Mistakes

 

6. Severity of mistakes. If an AI system makes an error, what is the potential magnitude of harm? Are the consequences minor and easily tolerated, or could the outcome negatively affect one’s access to life opportunities? AI applications in the public sector warrant heightened scrutiny in this respect compared to those in the private sector, due to the higher stakes of decisions that individuals cannot easily avoid. AI applications with higher severity of mistakes pose greater risks.

 

7. Reversibility of mistakes. produces an incorrect outcome, how easily can the error be detected and corrected? Is the mistake an isolated outcome, or does the mistake propagate through many downstream decisions and systems? AI applications with less reversible and more path- dependent outcomes pose higher risks, even if errors are rare or modest in severity.

Figure 5 summarizes this seven-point risk assessment framework which characterizes different sources of risks that governments may encounter when deploying AI systems.

Screenshot 2026-02-12 at 12.01.06 AM.png

While each of the seven dimensions introduces potential risk, not all risks are inherently disqualifying. Research from other high-stakes sectors, such as aviation and banking, suggests that systems rarely fail because of a single weakness. Rather, major failures occur when multiple vulnerabilities align and allow risk to compound. For this reason, risk should be assessed cumulatively rather than in isolation.

In this framework, a proposed AI system may be deemed too risky if it presents high risk across several dimensions simultaneously. By contrast, systems with one or two elevated risk factors may be appropriate if sufficient mitigation measures and oversight capacity are in place. For example, an AI system used to optimize internal vehicle scheduling may tolerate higher technical uncertainty if its outputs are reversible and do not affect individual rights. In contrast, an AI system that determines eligibility for public benefits requires far stricter safeguards, given the combination of automated authority and potentially severe consequences of error. Table 3 outlines a possible risk mitigation table:

Screenshot 2026-02-20 at 2.46.00 PM.png

Step IV: Determine if this AI tool advances the city’s broader vision of “success”

Now that we have enumerated and evaluated the risks associated with an AI application, the final step is to determine whether its deployment advances the broader economic and social objectives the city seeks to achieve, by returning to the city’s original definition of “success” for AI. This is where risk assessment is translated into normative judgment, which may look different for different cities depending on the goals set in the rational interests of their own constituents. Using the example we developed in the previous section, these are the final questions a government that would be asked:

1. Does it improve efficiency and productivity? This prospect motivates many potential AI applications, but needs to be re-assessed after the entire infrastructure behind the AI application, including software, hardware, data pipelines, and oversight processes to adequately mitigate the risks of the technology are established. In some cases, if additional complexity introduced by these requirements offsets anticipated efficiency gains, then AI may not be the correct tool.

2. Does it protect privacy and preserve human agency? While many AI applications rely on expanded data collection and automated decision-making, these systems must be evaluated in light of their implications for individual privacy, autonomy, and the ability of humans to understand, contest, and override automated outcomes. If constituents are likely to take issue with how AI uses their data or how AI makes decisions, then AI may not be the correct tool.

 

3. Does it advance community interests and democratic governance? Beyond individual outcomes, does the AI application contribute overall to broader societal effects such as improving human welfare, narrowing the wealth gap, and enhancing public accountability? If AI applications alter power dynamics in ways that reduce transparency, public participation, or the ability of residents and local governments to ensure that decision-making reflects local values (rather than the priorities or assumptions of external developers), then AI may not be the correct tool.

If any of these three questions are not satisfied, this would indicate that despite being a real municipal problem (Step I), a technically feasible AI solution (Step II), and a solution of acceptable risk (Step III), AI is still not the right solution for the problem because it does not align with the ‘north star’ of AI in cities. This final evaluative layer distinguishes technological viability from public legitimacy, ensuring that AI deployments meaningfully advance civic values, democratic accountability, and long-term societal benefit.

Applying this evaluative test over four stages may be demanding, particularly for high-stakes use cases that affect rights, livelihoods, or core public services. In such cases, however, the burden of justification should be high, and governments should be prepared to delay or redesign deployments until these criteria are clearly met. For lower-stakes or more routine applications, a lighter application of these tests may be appropriate, provided risks remain low, contained, and reversible.

 

Initiative for Responsible AI

responsibleai.mit.edu

Accessibility info: http://accessibility.mit.edu

GET UPDATES

Sign up to receive the latest news on our research and upcoming events

© 2026 by the Massachusetts Institute of Technology

bottom of page