GNSS & Machine Learning Engineer

Author: admin (Page 3 of 7)

Baker Lab open-sourced RF Diffusion

On March 30, 2023, the Baker Lab announced that RF Diffusion (a powerful guided diffusion model for protein design) is now free and open source. The source code is available on ColabFold (as a Google Colab) and on GitHub.

Proteins made via RF Diffusion have the potential to prevent infections, combat cancer, reverse autoimmune disorders, and serve as key components in advanced materials.

More information can be found in the papers [1] and [2].

Open Letter by Future of Life Institute to Pause Giant AI Experiments

The Future of Life Institute initiated an open letter in which they call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4 [notice that OpenAI already trains GPT-5 for some time]. They state that powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable.

The gained time should be used to develop safety protocols by AI experts to make the systems more accurate, safe, interpretable, transparent, robust, aligned, trustworthy, and loyal. In addition, they ask for the development of robust AI governance systems by policymakers and AI developers. They also demand well-resourced institutions for coping with the dramatic economic and political disruptions (especially to democracy) that AI will cause.

Notice that the letter is not against further AI development but just to slow down and give society a chance to adapt.

The letter was signed by several influential people, e.g. Elon Musk (CEO of SpaceX, Tesla & Twitter), Emad Mostaque (CEO of Stability AI), Yuval Noah Harari (Author), Max Tegmark (president of Future of Life Institute), Yoshua Bengio (Mila, Turing Prize winner), Stuart Russell (Berkeley).

However, it should be noticed that even more influential people in the AI scene have not (yet) signed this letter, none from OpenAI, Google/Deep Mind, or Meta.

This is not the first time the Future of Live Institute has taken action on AI development. In 2015, they presented an open letter signed by over 1000 robotics and AI researchers urging the United Nations to impose a ban on the development of weaponized AI.

The Future of Life Institute is a non-profit organization that aims to mitigate existential risks facing humanity, including those posed by AI.

Yann LeCun answered on Twitter with a nice fictitious anecdote to the request:
The year is 1440 and the Catholic Church has called for a 6 months moratorium on the use of the printing press and the movable type. Imagine what could happen if commoners get access to books! They could read the Bible for themselves and society would be destroyed.

OpenAI releases ChatGPT plugins

OpenAI announced on Mar 23, 2023, the availability of plugins within ChatGPT. Access is currently limited to ChatGPT Plus subscribers that joined a waitlist and have been selected by OpenAI.

Plugins can be automatically called by ChatGPT’s underlying LLM (Large Language Model, currently GPT-3.5 or GPT-4) in order to answer the questions of the user.

In order to make this work, plugins have to be registered in the ChatGPT user interface with a manifest file
(ai-plugin.json) that is hosted in the developer’s domain at
yourdomain.com/.well-known/ai-plugin.json. The file contains in a prescribed format
– metadata about the plugin (name, logo)
– details about the authentication mechanism
– an OpenAI specification for the endpoints of the API
– and a general description for the LLM of what the plugin can do.
The web app API needs to define an endpoint “/.well-known/ai-plugin.json” to access the content of this file.

In addition to the manifest file, an openapi.yaml file, that defines the OpenAI specification, has to be generated that is referenced in the “api” section of the manifest file via the “url” field. This file contains a detailed description of the API endpoints. The web app API needs to define an endpoint “/openapi.yaml” to access the content of this file.

When the user has activated a registered plugin and starts a conversation, the plugin’s description is injected into the message to ChatGPT, but invisible to the user. In this way, the LLM may choose an API call from the plugin if this seems relevant to the user’s question. The LLM will then incorporate the API result into the response to the user. More details can be found in OpenAI’s documentation.

Among the already available plugins, a few stand out. With the Wolfram plugin, all kinds of computational problems can be solved. And with the Zapier plugin, more than 5000 apps can be accessed. OpenAI itself introduced a web browser (that uses the Bing search API) and a code interpreter plugin (that runs in a sandbox without an internet connection). In addition, they open-sourced the code for a knowledge base retrieval plugin, that has to be self-hosted by a developer.

Interestingly enough, OpenAI notices that plugins will likely have wide-ranging societal implications and that language models with access to tools will likely have much greater economic impacts than those without. They expect the current wave of AI technologies to have a big effect on the pace of job transformation, displacement, and creation. OpenAI discusses the impact potential of large language models at the labor market in a recent publication.

Just a day after the OpenAI announcement of ChatGPT plugins, the open-source community already integrated these plugins also into LangChain. This is done just by referring to the plugin manifest file ai-plugin.json (see Twitter), e.g.:

tool = AIPluginTool.from_plugin_url( "https://www.clarna.com/.wellknown/ai-plugin.json")

All the other exciting news of the week is well summarized by Matt Wolfe (Google Bard, NVIDIA GTC, Adobe Firefly, Image Generation in Bing via DALL-E2, Microsoft Loop, AI in Canva, GitHub Copilot X, AI in Ubisoft, Metahuman by Unreal Engine).

Google announces PaLM API release

On the same day as OpenAI released GPT-4 (March 14, 2023), Google also announced the availability of the PaLM API for developers on Google Cloud [video]. They said that they are now providing access to foundation models on Google Cloud’s Vertex AI platform, initially for generating text and images, and over time also for audio and video. In addition, with the Generative AI App Builder, they introduced the possibility of quickly building AI-powered chat interfaces and digital assistants.

Finally, Google also made for a limited set of trusted test users generative AI features available within Google Workspace (Gmail and Google Docs).

OpenAI releases GPT-4

OpenAI released GPT-4 within ChatGPT on March 14, 2023, described in detail in a 98-pages paper (summarized on youtube).

  • Available to ChatGPT-Plus subscribers (currently with a cap that is changing over time, e.g. 100 messages every 4 hours, or 25 messages every 3 hours).
  • Still based on training data that cuts off Sept 2021.
  • It still does not learn from its experience.
  • Still no internet access.
  • The training was already finalized in Aug 2022.
  • Fine-tuned via RLHF (Reinforcement Learning with Human Feedback).
  • API waitlist is open (so no API access yet for everyone)
  • API prices (for comparison: GPT-3.5-turbo $0.002 per 1k tokens):
    • gpt-4: 8K context window (about 13 pages of text) will cost $0.03 per 1K prompt tokens and $0.06 per 1K completion tokens.
    • gpt-4-32k: 32K context window (about 52 pages of text) will cost $0.06 per 1K prompt tokens and $0.12 per 1K completion tokens.
  • The number of parameters and size of the training data set have both not been published. So competitors are not encouraged to replicate these performance ingredients but are referred to a freely available benchmark (OpenAI Evals) that measures the real performance.
  • GPT-4 ranks in the 10% best of the bar exam and 0.5% best of biology olympiad.
  • GPT-4 can handle contexts of over 25,000 words.
  • GPT-4 can access images as inputs and can generate captions, classifications, and analyses. However, this image-to-text functionality is not yet publicly available.
  • Microsoft Bing was already using an early version of GPT-4 in the last few weeks.

An excellent overview by Greg Brockman, President and co-founder of OpenAI, can be found on youtube.

Microsoft released Visual ChatGPT on March 08, 2023, in a paper and with source code on GitHub and Hugging Face. Although this does not seem to be GPT-4-based, it demonstrates similar image capabilities via a combination of pre-existing technologies (generate/modify [text-to-image], and describe [image-to-text]).

Two days after the GPT-4 release, Microsoft announced on March 16, 2023, the integration of GPT-4 into their Office products as a feature they called Copilot. Copilot is not yet available for general use, but Microsoft plans to roll it out gradually to selected customers in the coming months.

OpenAI releases ChatGPT and Whisper APIs

On March 01, 2023, OpenAI announced the releases of APIs for ChatGPT (published on Nov 30, 2022) and the automatic speech recognition (ASR) engine Whisper for speech-to-text (STT) transcription (and translation) that was open-sourced in Sept 2022.

The ChatGPT model family is called gpt-3.5-turbo and costs just $0.002 per 1k tokens, which is 10 times cheaper than the existing GPT-3.5 models. Instead of consuming unstructured text as traditionally done by GPT, the ChatGPT models consume a sequence of messages with metadata following a new format called Chat Markup Language (ChatML). The number of tokens (tokens in prompt + tokens in response as available via response[‘usage’][‘total_tokens’]) is restricted to 4096. Notice that there is no possibility to fine-tune gpt-3.5-turbo models.

For Whisper the large-v2 model is now available through an API for a price of $0.006 per minute. The API contains endpoints for transcriptions (transcribes in source language) and translations (transcribes into English).

In addition, the possibility of dedicated instances for professional users was announced that can make economical sense beyond ~450M tokens per day.

A significant change that was made in the Terms of Service and Usage Polices is that data submitted to the API is no longer used for service improvements (e.g. model training) unless an organization opts in. Before it was necessary to opt-out.

Google demonstrates that logical Qubits actually reduce Quantum Error Rates

In an announcement from Feb 22, 2023, and in a corresponding Nature paper, Google demonstrates for the first time that logical qubits can actually reduce the error rates in a quantum computer.

Physical qubits have a 1-to-1 relation between a qubit in a quantum algorithm and its physical realization in a quantum system. The problem with physical qubits is that due to thermal noise, they can decohere so they no longer build such a quantum system with a superposition of the bit states 0 and 1. How often this decoherence happens is formalized by the quantum error rate. This error rate influences a quantum algorithm in two ways. First, the more qubits are involved in a quantum algorithm, the higher the probability of an error. Second, the longer a qubit is used in a quantum algorithm and the more gates act on it, i.e. the deeper the algorithm is, also the higher the probability of an error.

It is surprising that it is possible to correct (via quantum error correction algorithms) physical qubit errors without actually measuring the qubits (which would always destroy them). Such error correction codes are at least already known since 1996. The information of a physical qubit that is distributed over a bunch of physical qubits in a way so that certain quantum errors are automatically corrected, builds a logical qubit. However, the physical qubits involved in the logical qubit are also subjected to the quantum error rate. Thus there is an obvious trade-off between involving more physical qubits for a longer time, which could increase the error rate, and having a mechanism to reduce the error rate. Which effect prevails depends on the used error correction code as well as on the error rate of the used physical qubits. Google has now demonstrated for the first time that in their system there is actually an advantage of using a so-called surface code logical qubit.

OpenAI announced ChatGPT Plus

OpenAI announced on Feb 01, 2023, a new subscription plan, ChatGPT Plus, that will be available for $20/month. Benefits are

  • General access to ChatGPT, even during peak times
  • Faster response times
  • Priority access to new features and improvements

ChatGPT Plus will be available first just to customers in the United States.

Since Feb 10, 2023, it is also available in Germany.

On Feb 06, 2023, Google’s CEO Sundar Pichai announced with Bard a competitor to ChatGPT in a message to Google employees. Bard is based on LaMDA (Language Model for Dialog Applications) whose first version was unveiled on May 18, 2021.

Only 2 days later, on Feb 08, 2023, Microsoft’s CEO Satya Nadella demonstrated in a presentation the integration of ChatGPT into Microsoft Bing (codename Sydney).

Microsoft’s VALL-E can synthesize your voice from 3 sec of audio

Microsoft has introduced a new language modeling approach for text-to-speech synthesis (TTS) called VALL-E. The approach uses discrete codes derived from an off-the-shelf neural audio codec model, and is trained using 60K hours of English speech, which is hundreds of times larger than existing systems, and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt (project page, paper).

An unofficial Pytorch implementation for VALL-E is available on GitHub.

« Older posts Newer posts »

© 2025 Stephan Seeger

Theme by Anders NorenUp ↑