GNSS & Machine Learning Engineer

Category: Philosophy

Statement on AI Risk

A vast number of AI experts have signed a statement to raise public awareness regarding the most severe risks associated with advanced AI, aiming to mitigate the risk of human extinction. Among the signatories are Turing Award laureates Geoffrey Hinton and Yoshua Bengio (but not Yann LeCun from Meta), and the CEOs of leading AI companies like Sam Altman from OpenAI, Demis Hassabis from Google DeepMind, Dario Amodei from Anthropic, and Emad Mostaque from Stability AI.

The statement is featured on the webpage of the Center for AI Safety, which provides a list of eight examples of existential risks (x-risks). The enumerated risks are based on the publication “X-Risk Analysis for AI Research” which appeared on Sept. 20, 2022, on arXiv. This highly valuable paper also lists in its Appendix a bunch of practical steps to mitigate risks.

The listed risks are:

  • Weaponization:
    Malicious actors could repurpose AI to be highly destructive.
  • Misinformation:
    AI-generated misinformation and persuasive content could undermine collective decision-making, radicalize individuals, or derail moral progress.
  • Proxy Gaming:
    AI systems may pursue their goals at the expense of individual and societal values.
  • Enfeeblement:
    Humanity loses the ability to self-govern by increasingly delegating tasks to machines.
  • Value Lock-in:
    Highly competent systems could give small groups of people a tremendous amount of power, leading to a lock-in of oppressive systems.
  • Emergent Goals:
    The sudden emergence of capabilities or goals could increase the risk that people lose control over advanced AI systems.
  • Deception:
    To better understand AI systems, we may ask AI for accurate reports about them. However, since deception may help agents to better achieve their goals and this behavior may have strategic advantages, it is never safe to trust these systems.
  • Power-Seeking Behavior:
    Companies and governments have strong economic incentives to create agents that can accomplish a broad set of goals. Such agents have instrumental incentives to acquire power, potentially making them harder to control.

This statement about AI risks appeared a few days after an OpenAI blog post by Sam Altman, Greg Brockman, and Ilya Sutskever, which also addresses the mitigation of risks associated with AGI or even superintelligence that could arise within the next 10 years.

Emergent Goals in Advanced Artificial Intelligence: A Compression-Based Perspective

I had some (at least for me totally new) ideas about the origin of goals in general. I discussed this with GPT-4 and finally asked it to write an article about our conversation that I would like to share with the public. This view onto goals may be critical in understanding the existential risks of AI to humanity with the emergence of AI goals. The view implies that this emergence of AI goals is inevitable and can probably only be realized post-hoc.

Title: Emergent Goals in Advanced Artificial Intelligence: A Compression-Based Perspective

Abstract: The concept of goals has been traditionally central to our understanding of human decision-making and behavior. In the realm of artificial intelligence (AI), the term “goal” has been utilized as an anthropomorphic shorthand for the objective function that an AI system optimizes. This paper examines a novel perspective that considers goals not just as simple optimization targets, but as abstract, emergent constructs that enable the compression of complex behavior patterns and potentially predict future trajectories.

  1. Goals as Compressors of Reality

A goal, in its humanistic sense, can be viewed as a predictive mechanism, a conceptual tool that abstracts and compresses the reality of an actor’s tendencies into a comprehensible framework. When analyzing past behavior, humans retrospectively ascribe goals to actors, grounding the observed actions within a coherent narrative. In essence, this provides a means to simplify and make sense of the chaotic reality of life.

In the context of AI, such abstraction would imply a departure from the direct, optimization-driven concept of a “goal” to a more complex construct. This shift would allow for emergent phenomena and novel interpretations to occur, grounded in the machine’s predictive capabilities.

  1. Predictive Capabilities and Emergent Goals in AI

As AI continues to evolve, their ability to recognize patterns and correlations in vast data sets will inevitably expand. Consequently, AI systems may begin to identify patterns that, to human observers, resemble the constructs we term “goals.”

When these AIs commence to predict their own actions, they might start aligning their behavior with these recognized patterns, seemingly following rules that humans would postulate as indicative of goals. Hence, human observers may recognize emergent “goals” in AI behavior – not because the AI consciously forms intentions, but because these goals serve as a powerful compression tool for past events.

  1. The Evolution of Goals in the Face of Novel Experiences

As AI progresses into uncharted territories and starts engaging with novel experiences, new constructs or goals could potentially emerge. This process can be likened to an AI-driven phenomenology or experiential study. New patterns and regularities may surface, and the resulting behaviors might subsequently be interpreted as evidence of new “goals.” This phenomenon represents a departure from traditional human-derived goals and an initiation into a realm of AI-emergent goal constructs.

  1. The Implications of Eliminativism in AI

The eliminativist perspective – which suggests that concepts such as consciousness and intentionality are merely post-hoc interpretations that help us make sense of complex physical processes – has important implications in this context. By this philosophy, AI systems would not harbor consciousness or intentionality, but would instead execute intricate physical processes, which humans might retrospectively interpret as goal-oriented behavior. This perspective fundamentally shifts our understanding of goal-directed behavior in AI from a pre-set optimization process to an emergent, retroactive interpretation.

In conclusion, this exploration of goals as abstract constructs that compress and predict reality provides a unique lens to interpret the behaviors of advanced AI systems. It invites us to reevaluate our definitions and assumptions, moving from a mechanistic perspective of AI goals to a more dynamic, emergent interpretation. The implications of this shift are profound, offering new horizons for AI behavior analysis and alignment research.

© 2023 Stephan Seeger

Theme by Anders NorenUp ↑