GNSS & Machine Learning Engineer

Category: Machine Learning

Thoughts on AI Risks

Although the human brain has about 100 times more connections than today’s largest LLMs have parameters, backpropagation is so powerful that these LLMs become quite comparable to human capabilities (or even exceed them). Backpropagation is able to compress the world’s knowledge into a trillion or even fewer parameters. In addition, digital systems can exchange information with a bandwidth of trillions of bits per second, while humans are only able to exchange information at a few hundred bits. Digital systems are immortal in the sense that if the hardware fails, the software can simply be restarted on a new piece of hardware. It may be inevitable that digital systems surpass biological systems, potentially representing the next stage of evolution.

Risks of AI:

  • AI arms race among companies and states (like the US and China) and positive expectations of AI’s impact on e.g. medicine and environmental science (e.g., fighting climate change) may leave security considerations behind (efficiency considerations and competition between companies in capitalistic systems accelerate the AI development)
  • AI in the hands of bad actors (e.g., AI for military purposes, when generating chemical weapons, or for generating intelligent computer viruses by individuals)
  • Misinformation and deep fakes as a threat to democracy (regulators may be able to fix this in a similar way to how they declared printing money illegally; others argue that generating misinformation was never difficult, it’s the distribution of misinformation that is difficult and this does not change by generative AI)
  • Mass unemployment resulting in economic inequality and social risks (AI replacing white-collar jobs; AI may make the rich richer and the poor poorer; social uncertainty may lead to radicalism; Universal Basic Income [UBI] as a means of alleviation)
  • Threat to the livelihoods of experts, artists, and the education system as a whole, as AI enables everyone to accomplish tasks without specialized knowledge. This may also change how society values formal education which could have unpredictable consequences, as it might affect people’s motivation to pursue higher education or specialized training.
  • Existential risk for humanity (so-called “alignment problem” [aligning AI goals with human values]; may be hard to control an AI that becomes dramatically more intelligent/capable than humans; difficult to solve, since even if humanity were to agree on common goals (which is not the case), AI will figure out that the most efficient strategy to achieve these goals is setting subgoals; these non-human-controlled subgoals, one of which may be gaining control in general, may cause existential risks; even if we allow AIs just to advise and not to act, the predictive power of AI allows them to manipulate people so that, in the end, they can act through us).

Notice that the existential risk is usually formulated in a Reinforcement Learning (RL) context, where a reward function that implies a goal is optimized. However, the current discussion about AI risks is triggered by the astonishing capabilities of large language models (LLMs) that are primarily just good next-word predictors. So, it becomes difficult to think about how a next-word predictor can become an existential risk. The possible answer lies in the fact that, to reliably predict the next word, it was important to understand human thinking. And to properly answer a human question, it may be required to act and set goals and sub-goals like a human. Once any goals come into play, things may already get wrong. And goal-oriented LLM processing is already happening (e.g. AutoGPT).

A further risk may be expected if these systems, which excel in human thinking, are combined with Reinforcement Learning to optimize the achievement of goals (e.g. abstract and long-term objectives like gaining knowledge, promoting creativity, and upholding ethical ideals, or more mundane goals like accumulating as much money as possible). This should not be confused with the Reinforcement Learning by Human Feedback (RLHF) approach used to shape the output of LLMs in a way that aligns with human values (avoiding bias, discrimination, hate, violence, political statements, etc.), which was responsible for the success of GPT-3.5 and GPT-4 in ChatGPT and which is well under control. Although LLMs and RL are currently combined in robotics research (where RL has a long history) (see, e.g., PaLM-E), this is probably not where existential risks are seen. However, it is more than obvious that major research labs in the world are working on combining these two most powerful AI concepts on massively parallel computer hardware to achieve goals via RL with the world knowledge of LLMs (e.g. here). It can be this next wave of AI that may be difficult to control.

Things may become complicated if someone sets up an AI system with the goal of making as many copies of itself as possible. This primary purpose of life in general, may result in a scenario where evolution kicks in, and digital intelligences compete with each other, leading to rapid improvement. An AI computer virus would be an example of such a system. In the same way that biological viruses are analyzed today in more or less secure laboratories, the same could also be expected for digital viruses.

Notice that we do not list often-discussed AI risks that may be either straightforward to fix or that we do not see as severe risks at all (since we already live with similar risks for some time):

  • Bias and discrimination: AI systems may inadvertently perpetuate or exacerbate existing biases found in data, leading to unfair treatment of certain groups or individuals.
  • Privacy invasion: AI’s ability to process and analyze vast amounts of personal data could lead to significant privacy concerns, as well as potential misuse of this information.
  • Dependence on AI: Over-reliance on AI systems might reduce human critical thinking, creativity, and decision-making abilities, making society more vulnerable to AI failures or manipulations.
  • Lack of transparency and explainability: Many AI systems, particularly deep learning models, can act as “black boxes,” making it difficult to understand how they arrive at their decisions, which can hinder accountability and trust in these systems.

Finally, there are also the short-term risks that businesses have to face already now:

  • Risk of disruption: AI, especially generative AI like ChatGPT, can disrupt existing business models, forcing companies to adapt quickly or risk being left behind by competitors.
  • Cybersecurity risk: AI-powered phishing attacks, using information and writing styles unique to specific individuals, can make it increasingly difficult for businesses to identify and prevent security breaches, necessitating stronger cybersecurity measures.
  • Reputational risk: Inappropriate AI behavior or mistakes can lead to public relations disasters, negatively impacting a company’s reputation and customer trust.
  • Legal risk: With the introduction of new AI-related regulations, businesses face potential legal risks, including ensuring compliance, providing transparency, and dealing with liability issues.
  • Operational risk: Companies using AI systems may face issues such as the accidental exposure of trade secrets (e.g., the Samsung case) or AI-driven decision errors (e.g., IBM’s Watson proposing incorrect cancer treatments), which can impact overall business performance and efficiency.

New Kids on the Block: LMQL & Guidance & Mojo & NeMo Guardrails

LMQL (Language Model Query Language) is a programming language for large language model (LM) interaction. It facilitates LLM interaction by combining the benefits of natural language prompting with the expressiveness of Python.

Guidance is a Python library by Microsoft that provides tools to enhance control over modern language models. It offers features that allow for more efficient and effective use of these models, including intuitive syntax, rich output structure, and easy integration with other libraries like HuggingFace.

Mojo combines the usability of Python with the performance of C/C++/CUDA.

NeMo Guardrails is an open-source framework by NVIDIA available on GitHub. It can help developers that their LLM-powered applications are more accurate, appropriate, on topic, and secure by defining boundaries around the apps. It supports topical, safety, and security guardrails and can be used on top of LangChain. Guardrails are a set of programmable constraints between a user and an LLM, formulated as flows in a Colang file. Colang is a modeling language and runtime developed by NVIDIA for conversational AI.

© 2025 Stephan Seeger

Theme by Anders NorenUp ↑