Tag: NLP (Page 1 of 5)

Google announced Gemini and AlphaCode-2

December 7, 2023 / admin / 0 Comments

On Dec 6, 2023, Google launched its new multimodal model Gemini that will work across Google products like search, ads, and the chatbot Bard. Gemini was trained jointly across text, image, audio, and video and has a 32K context window.

Gemini 1.0 comes in three different sizes:

Gemini Ultra: largest and most capable model to be released at the beginning of 2024
Gemini Pro: best model that is available immediately within Bard in 170 countries (but not yet in the EU and UK)
Gemini Nano: most efficient model for mobile devices with the same availability as Pro; Nano-1 (1. 8B parameters), Nano-2 (3.25B parameters)

Achievements and capabilities:

Gemini can recognize images, and speak in real-time (unfortunately, the demo was fake)
As the first AI model, it outperforms humans on the exam benchmark MMLU.
It has advanced math and coding capabilities
Gemini Ultra beats GPT-4 in 30 out of 32 commonly used benchmarks:
Gemini Pro beats GPT-3.5 in 6 out of 8 benchmarks:

Some more sources: Google release note, Gemini report, Google Developer blog, YouTube: Matt Wolfe, AI Explained.

Interestingly, Gemini was trained on a large fleet of TPUv4 accelerators across multiple data centers. At such scales, machine failures due to cosmic rays are commonplace and have to be handled (Gemini report, page 4).

When paired with search and tool-use techniques, Gemini forms the basis for advanced reasoning systems like AlphaCode 2, which excels in competitive programming challenges against human competitors. AlphaCode 2, solely based on Gemini Pro and not yet on Gemini Ultra, shows a substantial improvement over its predecessor by solving 43% of problems on Codeforces, a 1.7x increase. In this way, AlphaCode 2 performs better than 87% of human competitors. However, due to its intensive machine requirements to generate, filter, and score up to a million solutions, AlphaCode 2 is currently not feasible for customer use, although Google is working on this.

GitHub Universe 2023 Announcements

November 11, 2023 / admin / 0 Comments

There were two major announcements about GitHub Copilot at the GitHub Universe 2023 conference on Nov 08, 2023.

GitHub Copilot Enterprise account:

Copilot personalized for your organization
Contains everything in Copilot Business
Chat personalized to your codebase
Documentation search and summaries
Pull request summaries
Code review skills
Fine-tuned models
available in Feb 2024 for 39$ per user/month

Copilot Workspace: [1]

Copilot workspace automatically proposes a solution based on its deep understanding of the code base.
Builds a step-by-step plan to implement the changes; if something isn’t quite right, the spec and plan are fully editable.
With the approval of the plan, Copilot automates the implementation of changes across the repository.
Copilot not only synthesizes code but also builds, tests, and validates the success of these changes.
This workspace is designed for collaboration. You can edit any of the suggested changes and if you accidentally introduce an error along the way, Copilot will automatically catch it, repair it, and rerun the code.
Easy to create a pull request with generated summary of the work to merge and deploy fast.
Available in 2024.

OpenAI DevDay Announcements

November 7, 2023 / admin / 1 Comment

OpenAI rolled out on its DevDay an array of transformative updates and features [blog post, keynote recording]. Here’s a succinct rundown:

Recap: ChatGPT release Nov 30, 2022 with GPT-3.5. GPT-4 release in March 2023. Voice input/output, vision input with GPT-4V, text-to-image with DALL-E 3, ChatGPT Enterprise with enterprise security, higher speed access, and longer context windows. 2M developers, 92% of Fortune 500 companies building products on top of GPT, 100M weekly active users.
New GPT-4 Turbo: OpenAI’s most advanced AI model, 128K context window, knowledge up to April 2023. Reduced pricing: $0.01/1K input tokens (3x cheaper), $0.03/1K output tokens (2x cheaper). Improved function calling (multiple functions in single message, always return valid functions with JSON mode, improved accuracy on returning right function parameters). More deterministic model output via reproducible outputs beta. Access via gpt-4-1106-preview, stable release pending.
GPT-3.5 Turbo Update: Enhanced gpt-3.5-turbo-1106 model with 16K default context. Lower pricing: $0.001/1K input, $0.002/1K output. Fine-tuning available, reduced token prices for fine-tuned usage (input token prices 75% cheaper to $0.003/1K, output token prices 62% cheaper to $0.006/1K). Improved function calling, reproducible outputs feature.
Assistants API: Beta release for creating AI agents in applications. Supports natural language processing, coding, planning, and more. Enables persistent Threads, includes Code Interpreter, Retrieval, Function Calling tools. Playground integration for no-code testing.
Multimodal Capabilities: GPT-4 Turbo supports visual inputs in Chat Completions API via gpt-4-vision-preview. Integration with DALL·E 3 for image generation via Image generation API. Text-to-speech (TTS) model with six voices introduced.
Customizable GPTs in ChatGPT: New feature called GPTs allowing integration of instructions, data, and capabilities. Enables calling developer-defined actions, control over user experience, streamlined plugin to action conversion. Documentation provided for developers.

AI race is heating up: Announcements by Google/DeepMind, Meta, Microsoft/OpenAI, Amazon/Anthropic

September 22, 2023 / admin / 0 Comments

After weeks of “less exciting” news in the AI space since the release of Llama 2 by Meta on July 18, 2023, there were a bunch of announcements in the last few days by major players in the AI space:

Google/DeepMind: Bard extensions and multimodal LLM Gemini
OpenAI: DALL-E3 and GPT-Vision in ChatGPT, Gobi
Microsoft: Windows Copilot with DALL-E3 access
Amazon: Generative AI in Alexa, $4B investment in Anthropic
Meta: Meta AI, Ray-Ban, Emu, AI studio

Here are some links to the news of the last weeks:

Sep 28, 2023, Amazon: Securely customize CodeWhisperer
Sep 27, 2023, Meta: Meta AI assistant, Ray-Ban smart glasses, Emu, AI studio
Sep 25, 2023, ChatGPT can now see, hear, and speak
Sep 25, 2023, Amazon invests $4B in Anthropic, Claude in Bedrock
Sep 25, 2023, Spotify clones voices and translates them
Sep 21, 2023, Announcing Microsoft Copilot
Sep 20, 2023, Amazon brings generative AI to Alexa
Sep 20, 2023, OpenAI Announces DALL·E 3 in Research Preview
Sep 20, 2023, GitHub Copilot Chat beta now available for all individuals
Sep 19, 2023, Google Bard September update: App extensions
Sep 19, 2023, OpenAI’s multimodal LLM GPT-Vision to beat Google Gemini
Sep 16, 2023, DeepMind: LLMs can optimize their own prompts
Sep 15, 2023, Google nears release of AI software Gemini
Sep 15, 2023, Google Gemini: What We Know So Far
Sep 13, 2023, Stable Audio by Stability AI for music & sound generation
Sep 07, 2023, Anthropic introduces Claude Pro
Sep 06, 2023, Falcon 180B
Aug 31, 2023, Baidu launches Ernie chatbot
Aug 29, 2023, Duet AI for Google Workspace now generally available
Aug 28, 2023, Meta plans to take on GPT-4 with a rumored Llama 3
Aug 28, 2023, Introducing ChatGPT Enterprise
Aug 27, 2023, Google Gemini Smashes GPT-4 By 5X
Aug 24, 2023, Introducing Code Llama
Aug 22, 2023, GPT-3.5 Turbo fine-tuning and API updates
Aug 22, 2023, ElevenLabs releases Eleven Multilingual v2
Aug 21, 2023, MidJourney Adds Inpainting Feature
Aug 16, 2023, Adobe Express with AI Firefly app is released worldwide
Aug 10, 2023, ChatGPT expands its ‘custom instructions’ feature
Aug 08, 2023, Announcing StableCode — Stability AI
Aug 05, 2023, Tim Cook says Apple is building AI into ‘every product’
Aug 03, 2023, Every single Amazon team is working on generative AI
Aug 02, 2023, AudioCraft by Meta
Jul 31, 2023, ChatGPT for Android in all countries

Meta released Llama 2 free for Commercial Use

July 19, 2023 / admin / 0 Comments

Meta open-sourced Llama 2 together with Microsoft, this time in contrast to Llama 1 free not just for research but also for commercial use.

Free for commercial use for businesses with less than 700 Mio monthly active users
Models with 70B, 13B, and 7B parameters
Llama-2-70B model is currently the strongest open-source LLM (Huggingface leaderboard), comparable to GPT-3.5-0301, noticeably stronger than Falcon, MPT, and Vicuna
Not yet at GPT-3.5 level, mainly because of its weak coding abilities
RLHF fine-tuned
Source code on GitHub, weights available on Azure, AWS, and HuggingFace
Llama 2 paper
4K token context window
Trained on 2 trillion tokens with training costs of about $20M
Knowledge cut-off Dec 2022
Testing on https://www.llama2.ai

Just 4 days after this announcement, on July 22, 2023, StabilityAI released FreeWilly1 and FreeWilly2 which are fine-tuned models based on LLaMA65B and Llama-2-70B. These models took over the leadership on Hugging Face (Huggingface leaderboard). However, both models have no commercial license and are just intended for research.

GPT-4 in the top 1% of human thinkers in creativity test

July 9, 2023 / admin / 0 Comments

In a recent study by the University of Montana, GPT-4 demonstrated remarkable performance in the Torrance Tests of Creative Thinking (TTCT, a standard test for measuring creativity), matching the top 1% of human thinkers. The model excelled in fluency and originality. These findings imply that the creative abilities of GPT-4 could potentially surpass those of humans.

For a recent benchmark on advanced reasoning capabilities of large language models take a look at the ARB (Advanced Reasoning Benchmark).

OpenAI gives all ChatGPT Plus users access to Code Interpreter

July 8, 2023 / admin / 0 Comments

The ChatGPT code interpreter allows users to run code and upload individual data files (in .csv, .xlsx, .json format) for analysis. Multiple files can be uploaded sequentially or within one zip-file. To upload a file, click on the ‘+’ symbol located just to the left of the ‘Send a message’ box or even simpler via drag and drop.

The code interpreter functionality is accessible to ChatGPT Plus users and can be enabled in the settings under ‘Beta features’. Once enabled, this functionality will then appear in the configuration settings of any new chat under the ‘GPT-4’ section, where it also needs to be activated.

Given a prompt, the code interpreter will generate Python code that is then automatically executed in a sandboxed Python environment. If something goes wrong, for instance, if the generated source code requires the installation of a Python package or if the source code is simply incorrect, the code interpreter automatically attempts to fix these errors and tries again. This feature makes working with the code interpreter much more efficient. Before, it was necessary to paste ChatGPT’s proposal into a Jupyter notebook and run it from there. If errors occurred, these had to be fixed either independently or by manually pasting the error text back into ChatGPT so that it could provide a solution. This manual iterative procedure has now been automated with the code interpreter.

Note that the code interpreter executes the source code on OpenAI’s servers, not in the local environment. This leads to restrictions on the size of the uploaded data, as well as a very stringent time limit of 120s for the execution of the code. Given this, it becomes clear what developers truly desire. They seek the integration of this feature into their local development environment, such as VSCode, or within a cloud service, such as AWS, GCP, or Azure, without any restrictions on data size or execution times. This then leans more towards the direction of projects like AutoGPT or GPT Engineer. It’s likely only a matter of days, weeks, or months before such functionality becomes widely available. It’s also probable that complete access to your code repository will be enabled, first through a vector database solution and after some time maybe by including the entire repository within prompts, which are currently increasing dramatically in size (as exemplified in LongNet; since this requires retraining of the LLM such solutions cannot be expected to become available before GPT-4.5 or GPT-5).

For testing, try e.g. the following prompts:

What is the current time?
Plot the graphs of sin(x) and cos(x) in a single graph
Make a QR-code of my contact information: Stephan Seeger; Homepage: domain-seeger.de

or after uploading a data set (e.g. from Kaggle)

Explain the dataset.
Show 4 different ways of displaying the data visually.

Before, such functionality was only available via the Notable plugin or via the open-source implementation GPT-Code-UI on GitHub.

Microsoft scales Transformer sequence length to 1 billion tokens

July 8, 2023 / admin / 0 Comments

LongNet, a new Transformer variant introduced in recent research by Microsoft, has successfully scaled sequence lengths to over 1 billion tokens without compromising shorter sequence performance. Its key innovation, dilated attention, allows an exponential expansion of the attentive field with growing distance. The model exhibits linear computational complexity and logarithmic token dependency, while also demonstrating strong performance on long-sequence modeling and general language tasks.

OpenAI API updates

June 14, 2023 / admin / 0 Comments

On June 13, 2023, OpenAI announced a number of updates to their API:

new function calling capability in the Chat Completions API
new 16k context version of gpt-3.5-turbo with 2 times the price as the standard 4k version ($0.003 per 1K input tokens and $0.004 per 1K output)
75% cost reduction on the embeddings model ($0.0001 per 1K tokens)
25% cost reduction on input tokens for gpt-3.5-turbo
($0.0015 per 1K input tokens and $0.002 per 1K output tokens)
stable model names (gpt-3.5-turbo, gpt-4, and gpt-4-32k) will automatically be upgraded to the new models (gpt-3.5-turbo-0613,
gpt-4-0613, and gpt-4-32k-0613) on June 27
deprecation of gpt-3.5-turbo-0301 and gpt-4-0314 models after Sept 13

All models come with the same data privacy and security guarantees introduced on March 1, i.e. requests and API data will not be used for training.

The new function calling capability in gpt-3.5-turbo-0613, and
gpt-4-0613, which is achieved by the new API parameters, functions and function_call, in the /v1/chat/completions endpoint allows e.g. the following use cases:

Chatbots that answer questions by calling external tools (like ChatGPT Plugins)
Convert natural language into API calls or database queries
Extract structured data from text.

Examples beyond the API documentation can be found in the OpenAI cookbook.

Statement on AI Risk

May 31, 2023 / admin / 0 Comments

A vast number of AI experts have signed a statement to raise public awareness regarding the most severe risks associated with advanced AI, aiming to mitigate the risk of human extinction. Among the signatories are Turing Award laureates Geoffrey Hinton and Yoshua Bengio (but not Yann LeCun from Meta), and the CEOs of leading AI companies like Sam Altman from OpenAI, Demis Hassabis from Google DeepMind, Dario Amodei from Anthropic, and Emad Mostaque from Stability AI.

The statement is featured on the webpage of the Center for AI Safety, which provides a list of eight examples of existential risks (x-risks). The enumerated risks are based on the publication “X-Risk Analysis for AI Research” which appeared on Sept. 20, 2022, on arXiv. This highly valuable paper also lists in its Appendix a bunch of practical steps to mitigate risks.

The listed risks are:

Weaponization:
Malicious actors could repurpose AI to be highly destructive.
Misinformation:
AI-generated misinformation and persuasive content could undermine collective decision-making, radicalize individuals, or derail moral progress.
Proxy Gaming:
AI systems may pursue their goals at the expense of individual and societal values.
Enfeeblement:
Humanity loses the ability to self-govern by increasingly delegating tasks to machines.
Value Lock-in:
Highly competent systems could give small groups of people a tremendous amount of power, leading to a lock-in of oppressive systems.
Emergent Goals:
The sudden emergence of capabilities or goals could increase the risk that people lose control over advanced AI systems.
Deception:
To better understand AI systems, we may ask AI for accurate reports about them. However, since deception may help agents to better achieve their goals and this behavior may have strategic advantages, it is never safe to trust these systems.
Power-Seeking Behavior:
Companies and governments have strong economic incentives to create agents that can accomplish a broad set of goals. Such agents have instrumental incentives to acquire power, potentially making them harder to control.

This statement about AI risks appeared a few days after an OpenAI blog post by Sam Altman, Greg Brockman, and Ilya Sutskever, which also addresses the mitigation of risks associated with AGI or even superintelligence that could arise within the next 10 years.