GNSS & Machine Learning Engineer

Tag: NLP (Page 1 of 4)

AI race is heating up: Announcements by Google/DeepMind, Meta, Microsoft/OpenAI, Amazon/Anthropic

After weeks of “less exciting” news in the AI space since the release of Llama 2 by Meta on July 18, 2023, there were a bunch of announcements in the last few days by major players in the AI space:

Here are some links to the news of the last weeks:

Meta released Llama 2 free for Commercial Use

Meta open-sourced Llama 2 together with Microsoft, this time in contrast to Llama 1 free not just for research but also for commercial use.

  • Free for commercial use for businesses with less than 700 Mio monthly active users
  • Models with 70B, 13B, and 7B parameters
  • Llama-2-70B model is currently the strongest open-source LLM (Huggingface leaderboard), comparable to GPT-3.5-0301, noticeably stronger than Falcon, MPT, and Vicuna
  • Not yet at GPT-3.5 level, mainly because of its weak coding abilities
  • RLHF fine-tuned
  • Source code on GitHub, weights available on Azure, AWS, and HuggingFace
  • Llama 2 paper
  • 4K token context window
  • Trained on 2 trillion tokens with training costs of about $20M
  • Knowledge cut-off Dec 2022
  • Testing on https://www.llama2.ai

Just 4 days after this announcement, on July 22, 2023, StabilityAI released FreeWilly1 and FreeWilly2 which are fine-tuned models based on LLaMA65B and Llama-2-70B. These models took over the leadership on Hugging Face (Huggingface leaderboard). However, both models have no commercial license and are just intended for research.

GPT-4 in the top 1% of human thinkers in creativity test

In a recent study by the University of Montana, GPT-4 demonstrated remarkable performance in the Torrance Tests of Creative Thinking (TTCT, a standard test for measuring creativity), matching the top 1% of human thinkers. The model excelled in fluency and originality. These findings imply that the creative abilities of GPT-4 could potentially surpass those of humans.

For a recent benchmark on advanced reasoning capabilities of large language models take a look at the ARB (Advanced Reasoning Benchmark).

OpenAI gives all ChatGPT Plus users access to Code Interpreter

The ChatGPT code interpreter allows users to run code and upload individual data files (in .csv, .xlsx, .json format) for analysis. Multiple files can be uploaded sequentially or within one zip-file. To upload a file, click on the ‘+’ symbol located just to the left of the ‘Send a message’ box or even simpler via drag and drop.

The code interpreter functionality is accessible to ChatGPT Plus users and can be enabled in the settings under ‘Beta features’. Once enabled, this functionality will then appear in the configuration settings of any new chat under the ‘GPT-4’ section, where it also needs to be activated.

Given a prompt, the code interpreter will generate Python code that is then automatically executed in a sandboxed Python environment. If something goes wrong, for instance, if the generated source code requires the installation of a Python package or if the source code is simply incorrect, the code interpreter automatically attempts to fix these errors and tries again. This feature makes working with the code interpreter much more efficient. Before, it was necessary to paste ChatGPT’s proposal into a Jupyter notebook and run it from there. If errors occurred, these had to be fixed either independently or by manually pasting the error text back into ChatGPT so that it could provide a solution. This manual iterative procedure has now been automated with the code interpreter.

Note that the code interpreter executes the source code on OpenAI’s servers, not in the local environment. This leads to restrictions on the size of the uploaded data, as well as a very stringent time limit of 120s for the execution of the code. Given this, it becomes clear what developers truly desire. They seek the integration of this feature into their local development environment, such as VSCode, or within a cloud service, such as AWS, GCP, or Azure, without any restrictions on data size or execution times. This then leans more towards the direction of projects like AutoGPT or GPT Engineer. It’s likely only a matter of days, weeks, or months before such functionality becomes widely available. It’s also probable that complete access to your code repository will be enabled, first through a vector database solution and after some time maybe by including the entire repository within prompts, which are currently increasing dramatically in size (as exemplified in LongNet; since this requires retraining of the LLM such solutions cannot be expected to become available before GPT-4.5 or GPT-5).

For testing, try e.g. the following prompts:

  • What is the current time?
  • Plot the graphs of sin(x) and cos(x) in a single graph
  • Make a QR-code of my contact information: Stephan Seeger; Homepage: domain-seeger.de

or after uploading a data set (e.g. from Kaggle)

  • Explain the dataset.
  • Show 4 different ways of displaying the data visually.

Before, such functionality was only available via the Notable plugin or via the open-source implementation GPT-Code-UI on GitHub.

Microsoft scales Transformer sequence length to 1 billion tokens

LongNet, a new Transformer variant introduced in recent research by Microsoft, has successfully scaled sequence lengths to over 1 billion tokens without compromising shorter sequence performance. Its key innovation, dilated attention, allows an exponential expansion of the attentive field with growing distance. The model exhibits linear computational complexity and logarithmic token dependency, while also demonstrating strong performance on long-sequence modeling and general language tasks.

OpenAI API updates

On June 13, 2023, OpenAI announced a number of updates to their API:

  • new function calling capability in the Chat Completions API
  • new 16k context version of gpt-3.5-turbo  with 2 times the price as the standard 4k version ($0.003 per 1K input tokens and $0.004 per 1K output)
  • 75% cost reduction on the embeddings model ($0.0001 per 1K tokens)
  • 25% cost reduction on input tokens for gpt-3.5-turbo
    ($0.0015 per 1K input tokens and $0.002 per 1K output tokens)
  • stable model names (gpt-3.5-turbogpt-4, and gpt-4-32k) will automatically be upgraded to the new models (gpt-3.5-turbo-0613
    gpt-4-0613, and gpt-4-32k-0613) on June 27
  • deprecation of gpt-3.5-turbo-0301 and gpt-4-0314 models after Sept 13

All models come with the same data privacy and security guarantees introduced on March 1, i.e. requests and API data will not be used for training.

The new function calling capability in gpt-3.5-turbo-0613,  and
gpt-4-0613, which is achieved by the new API parameters, functions and function_call, in the /v1/chat/completions endpoint allows e.g. the following use cases:

  • Chatbots that answer questions by calling external tools (like ChatGPT Plugins)
  • Convert natural language into API calls or database queries
  • Extract structured data from text.

Examples beyond the API documentation can be found in the OpenAI cookbook.

Statement on AI Risk

A vast number of AI experts have signed a statement to raise public awareness regarding the most severe risks associated with advanced AI, aiming to mitigate the risk of human extinction. Among the signatories are Turing Award laureates Geoffrey Hinton and Yoshua Bengio (but not Yann LeCun from Meta), and the CEOs of leading AI companies like Sam Altman from OpenAI, Demis Hassabis from Google DeepMind, Dario Amodei from Anthropic, and Emad Mostaque from Stability AI.

The statement is featured on the webpage of the Center for AI Safety, which provides a list of eight examples of existential risks (x-risks). The enumerated risks are based on the publication “X-Risk Analysis for AI Research” which appeared on Sept. 20, 2022, on arXiv. This highly valuable paper also lists in its Appendix a bunch of practical steps to mitigate risks.

The listed risks are:

  • Weaponization:
    Malicious actors could repurpose AI to be highly destructive.
  • Misinformation:
    AI-generated misinformation and persuasive content could undermine collective decision-making, radicalize individuals, or derail moral progress.
  • Proxy Gaming:
    AI systems may pursue their goals at the expense of individual and societal values.
  • Enfeeblement:
    Humanity loses the ability to self-govern by increasingly delegating tasks to machines.
  • Value Lock-in:
    Highly competent systems could give small groups of people a tremendous amount of power, leading to a lock-in of oppressive systems.
  • Emergent Goals:
    The sudden emergence of capabilities or goals could increase the risk that people lose control over advanced AI systems.
  • Deception:
    To better understand AI systems, we may ask AI for accurate reports about them. However, since deception may help agents to better achieve their goals and this behavior may have strategic advantages, it is never safe to trust these systems.
  • Power-Seeking Behavior:
    Companies and governments have strong economic incentives to create agents that can accomplish a broad set of goals. Such agents have instrumental incentives to acquire power, potentially making them harder to control.

This statement about AI risks appeared a few days after an OpenAI blog post by Sam Altman, Greg Brockman, and Ilya Sutskever, which also addresses the mitigation of risks associated with AGI or even superintelligence that could arise within the next 10 years.

OpenAI launches ChatGPT app for iOS

OpenAI has officially launched the ChatGPT app for iOS users in the US. The app comes with a range of notable features:

  • Free of Charge: The ChatGPT app can be downloaded and used free of cost.
  • Sync Across Devices: Users can maintain their chat history consistently across multiple devices.
  • Voice Input via Whisper: The app includes integration with Whisper, OpenAI’s open-source speech-recognition system, allowing users to input via voice commands.
  • Exclusive Benefits for ChatGPT Plus Subscribers: Those who subscribe to ChatGPT Plus can utilize GPT-4’s enhanced capabilities. They also receive early access to new features and benefit from faster response times.
  • Initial US Rollout: The app is initially launching in the US, with a plan to expand its availability to other countries in the upcoming weeks.
  • Android Version Coming Soon: OpenAI has confirmed that Android users can expect to see the ChatGPT app on their devices in the near future. Further updates are expected soon.

New Kids on the Block: LMQL & Guidance & Mojo & NeMo Guardrails

LMQL (Language Model Query Language) is a programming language for large language model (LM) interaction. It facilitates LLM interaction by combining the benefits of natural language prompting with the expressiveness of Python.

Guidance is a Python library by Microsoft that provides tools to enhance control over modern language models. It offers features that allow for more efficient and effective use of these models, including intuitive syntax, rich output structure, and easy integration with other libraries like HuggingFace.

Mojo combines the usability of Python with the performance of C/C++/CUDA.

NeMo Guardrails is an open-source framework by NVIDIA available on GitHub. It can help developers that their LLM-powered applications are more accurate, appropriate, on topic, and secure by defining boundaries around the apps. It supports topical, safety, and security guardrails and can be used on top of LangChain. Guardrails are a set of programmable constraints between a user and an LLM, formulated as flows in a Colang file. Colang is a modeling language and runtime developed by NVIDIA for conversational AI.

Google revealed PaLM 2

Google revealed at Google I/O on May 10, 2023, PaLM 2 (API, paper), its latest AI language model that powers 25 Google products, including Search, Gmail, Docs, Assistant, Translate, and Photos.

  • PaLM 2 has 4 models that differ in size: Gecko, Otter, Bison, and Unicorn. Gecko is so lightweight that it can work on mobile devices.
  • PaLM 2 can be finetuned on domain-specific knowledge (Sec-PaLM with security knowledge, Med-PaLM 2 with medical knowledge)
  • Bard now works with PaLM 2; with extensions, Bard can call tools like Sheets, Colab for coding, Lenses, Maps, Adobe Firefly to create images, etc.; Bard is multimodal and can understand images
  •  PaLM 2 is also powering Duet AI for Google Cloud, a generative AI collaborator designed to help users learn, build and operate faster
  • PaLM 2 is released in 180+ regions and countries, however, e.g. not yet in Canada, and in the EU
  • The next model, Gemini, is already in training. 
  • Google also announced the availability of MusicLM, a text-to-music generative model. 

OpenAI reacted to this announcement on May 12 by announcing that Browsing & Plugins are rolled out over the subsequent week for all Plus users. As of May 17, I can confirm that both features are now operational for me.




« Older posts

© 2023 Stephan Seeger

Theme by Anders NorenUp ↑