- Recap: ChatGPT release Nov 30, 2022 with GPT-3.5. GPT-4 release in March 2023. Voice input/output, vision input with GPT-4V, text-to-image with DALL-E 3, ChatGPT Enterprise with enterprise security, higher speed access, and longer context windows. 2M developers, 92% of Fortune 500 companies building products on top of GPT, 100M weekly active users.
- New GPT-4 Turbo: OpenAI’s most advanced AI model, 128K context window, knowledge up to April 2023. Reduced pricing: $0.01/1K input tokens (3x cheaper), $0.03/1K output tokens (2x cheaper). Improved function calling (multiple functions in single message, always return valid functions with JSON mode, improved accuracy on returning right function parameters). More deterministic model output via reproducible outputs beta. Access via
gpt-4-1106-preview, stable release pending.
- GPT-3.5 Turbo Update: Enhanced
gpt-3.5-turbo-1106model with 16K default context. Lower pricing: $0.001/1K input, $0.002/1K output. Fine-tuning available, reduced token prices for fine-tuned usage (input token prices 75% cheaper to $0.003/1K, output token prices 62% cheaper to $0.006/1K). Improved function calling, reproducible outputs feature.
- Assistants API: Beta release for creating AI agents in applications. Supports natural language processing, coding, planning, and more. Enables persistent Threads, includes Code Interpreter, Retrieval, Function Calling tools. Playground integration for no-code testing.
- Multimodal Capabilities: GPT-4 Turbo supports visual inputs in Chat Completions API via
gpt-4-vision-preview. Integration with DALL·E 3 for image generation via Image generation API. Text-to-speech (TTS) model with six voices introduced.
- Customizable GPTs in ChatGPT: New feature called GPTs allowing integration of instructions, data, and capabilities. Enables calling developer-defined actions, control over user experience, streamlined plugin to action conversion. Documentation provided for developers.
- Nov 22, 2023, Inflection-2 asserted to be second best LLM (for Pi.ai)
- Nov 21, 2023, ChatGPT with voice available to all free users
- Nov 21, 2023, StabilityAI open-sources Stable Video Diffusuion
- Nov 21, 2023, Anthropic announces Claude 2.1 with 200k context [tests]
- Nov 17, 2023, Sam Altman fired from OpenAI, Greg Brockman quits
- Nov 17, 2023, Google delays release of Gemini
- Nov 16, 2023, Recap of announcements at Microsoft Ignite 2023 (keynote)
- Nov 16, 2023, Microsoft unveils its first AI chip, Maia 100
- Nov 16, 2023, Meta: Emu Video and Emu Edit
- Nov 15, 2023, OpenAI puts pause on new ChatGPT Plus signups
- Nov 14, 2023, OpenAI started building GPT-5
- Nov 13, 2023, 01.AI released LLMs YI-34B and YI-6B with 200K context
- Nov 09, 2023, AI Pin from Humane officially launched
- Nov 09, 2023, Amazon Generative AI Model: Olympus (size 2x GPT-4)
- Nov 09, 2023, Samsung Generative AI Model: Gauss
- Nov 08, 2023, GitHub Universe 2023 (Copilot Workspace in 2024 )
- Nov 06, 2023, OpenAI DevDay: Create and share AI agents 
- Nov 06, 2023, xAI’s Grok will be integrated into Tesla vehicles
- Nov 06, 2023, Chinese startup 01.AI released open-source LLM Yi-34B
- Nov 04, 2023, xAI reveals Grok chatbot with real-time data from X
- Nov 03, 2023, RoboGen: Robot Learning via Generative Simulation
- Nov 03, 2023, Adept releases multimodal Fuyu-8b model
- Nov 03, 2023, Phind finetuned on CodeLlama on GPT-4 level?
- Nov 02, 2023, UK’s AI safety summit at Bletchley Park 
- Nov 02, 2023, Runway makes huge updates to Gen-2
- Nov 02, 2023, LLM Yarn-Mistral-7b-128k
- Nov 02, 2023, Stability AI announced Stable 3D
- Oct 31, 2023, DeepMind shares progress on next-gen AlphaFold
- Oct 30, 2023, Biden’s Executive Order on Safe, Secure, Trustworthy AI
- Oct 30, 2023, GPT-3.5-Turbo in ChatGPT is a 20B model (?)
- Oct 29, 2023, Vision on input and output within ChatGPT, PDF Chat
- Oct 28, 2023, Video-to-text LLM: Pegasus-1 by Twelve Labs
- Oct 26, 2023, ChatGPT knowledge cutoff Sept 2023 (?)
- Oct 21, 2023, OpenAgents: open platform for language agents
- Oct 21, 2023, IBM announced powerful AI chip
- Oct 20, 2023, Meta announced Habitat 3.0
- Oct 20, 2023, NVIDIA announced Eureka: super-human robot dexterity
- Oct 20, 2023, Amazon trialing humanoid robots in US warehouses
- Oct 18, 2023, FlashDecoding: up to 8X faster inference on long sequences
- Oct 18, 2023, Meta: reconstructing visual and speech from MEG
- Oct 17, 2023, Llemma: An Open Language Model for Mathematics
- Oct 17, 2023, PyTorch conference: torch.compile decorator for numpy
- Oct 16, 2023, MemGPT, a method for extending LLM context windows
- Oct 13, 2023, Tectonic Copilot shift announced for GitHub Universe Nov 8
- Oct 13, 2023, Connor Leahy’s speech in Cambridge on AI Existential Risk
- Oct 12, 2023, State of AI Report
- Oct 11, 2023, Meta introduces Universal Simulator (UniSim)
- Oct 10, 2023, Adobe Firefly update
- Oct 03, 2023, GAIA1: 9B world model to simulate driving scenes
- Oct 03, 2023, RT-X: largest open-source robot dataset ever compiled
- Oct 02, 2023, StreamingLLM: 22 times faster, up to 4 million tokens 
After weeks of “less exciting” news in the AI space since the release of Llama 2 by Meta on July 18, 2023, there were a bunch of announcements in the last few days by major players in the AI space:
- Google/DeepMind: Bard extensions and multimodal LLM Gemini
- OpenAI: DALL-E3 and GPT-Vision in ChatGPT, Gobi
- Microsoft: Windows Copilot with DALL-E3 access
- Amazon: Generative AI in Alexa, $4B investment in Anthropic
- Meta: Meta AI, Ray-Ban, Emu, AI studio
Here are some links to the news of the last weeks:
- Sep 28, 2023, Amazon: Securely customize CodeWhisperer
- Sep 27, 2023, Meta: Meta AI assistant, Ray-Ban smart glasses, Emu, AI studio
- Sep 25, 2023, ChatGPT can now see, hear, and speak
- Sep 25, 2023, Amazon invests $4B in Anthropic, Claude in Bedrock
- Sep 25, 2023, Spotify clones voices and translates them
- Sep 21, 2023, Announcing Microsoft Copilot
- Sep 20, 2023, Amazon brings generative AI to Alexa
- Sep 20, 2023, OpenAI Announces DALL·E 3 in Research Preview
- Sep 20, 2023, GitHub Copilot Chat beta now available for all individuals
- Sep 19, 2023, Google Bard September update: App extensions
- Sep 19, 2023, OpenAI’s multimodal LLM GPT-Vision to beat Google Gemini
- Sep 16, 2023, DeepMind: LLMs can optimize their own prompts
- Sep 15, 2023, Google nears release of AI software Gemini
- Sep 15, 2023, Google Gemini: What We Know So Far
- Sep 13, 2023, Stable Audio by Stability AI for music & sound generation
- Sep 07, 2023, Anthropic introduces Claude Pro
- Sep 06, 2023, Falcon 180B
- Aug 31, 2023, Baidu launches Ernie chatbot
- Aug 29, 2023, Duet AI for Google Workspace now generally available
- Aug 28, 2023, Meta plans to take on GPT-4 with a rumored Llama 3
- Aug 28, 2023, Introducing ChatGPT Enterprise
- Aug 27, 2023, Google Gemini Smashes GPT-4 By 5X
- Aug 24, 2023, Introducing Code Llama
- Aug 22, 2023, GPT-3.5 Turbo fine-tuning and API updates
- Aug 22, 2023, ElevenLabs releases Eleven Multilingual v2
- Aug 21, 2023, MidJourney Adds Inpainting Feature
- Aug 16, 2023, Adobe Express with AI Firefly app is released worldwide
- Aug 10, 2023, ChatGPT expands its ‘custom instructions’ feature
- Aug 08, 2023, Announcing StableCode — Stability AI
- Aug 05, 2023, Tim Cook says Apple is building AI into ‘every product’
- Aug 03, 2023, Every single Amazon team is working on generative AI
- Aug 02, 2023, AudioCraft by Meta
- Jul 31, 2023, ChatGPT for Android in all countries
In a recent study by the University of Montana, GPT-4 demonstrated remarkable performance in the Torrance Tests of Creative Thinking (TTCT, a standard test for measuring creativity), matching the top 1% of human thinkers. The model excelled in fluency and originality. These findings imply that the creative abilities of GPT-4 could potentially surpass those of humans.
For a recent benchmark on advanced reasoning capabilities of large language models take a look at the ARB (Advanced Reasoning Benchmark).
The ChatGPT code interpreter allows users to run code and upload individual data files (in .csv, .xlsx, .json format) for analysis. Multiple files can be uploaded sequentially or within one zip-file. To upload a file, click on the ‘+’ symbol located just to the left of the ‘Send a message’ box or even simpler via drag and drop.
The code interpreter functionality is accessible to ChatGPT Plus users and can be enabled in the settings under ‘Beta features’. Once enabled, this functionality will then appear in the configuration settings of any new chat under the ‘GPT-4’ section, where it also needs to be activated.
Given a prompt, the code interpreter will generate Python code that is then automatically executed in a sandboxed Python environment. If something goes wrong, for instance, if the generated source code requires the installation of a Python package or if the source code is simply incorrect, the code interpreter automatically attempts to fix these errors and tries again. This feature makes working with the code interpreter much more efficient. Before, it was necessary to paste ChatGPT’s proposal into a Jupyter notebook and run it from there. If errors occurred, these had to be fixed either independently or by manually pasting the error text back into ChatGPT so that it could provide a solution. This manual iterative procedure has now been automated with the code interpreter.
Note that the code interpreter executes the source code on OpenAI’s servers, not in the local environment. This leads to restrictions on the size of the uploaded data, as well as a very stringent time limit of 120s for the execution of the code. Given this, it becomes clear what developers truly desire. They seek the integration of this feature into their local development environment, such as VSCode, or within a cloud service, such as AWS, GCP, or Azure, without any restrictions on data size or execution times. This then leans more towards the direction of projects like AutoGPT or GPT Engineer. It’s likely only a matter of days, weeks, or months before such functionality becomes widely available. It’s also probable that complete access to your code repository will be enabled, first through a vector database solution and after some time maybe by including the entire repository within prompts, which are currently increasing dramatically in size (as exemplified in LongNet; since this requires retraining of the LLM such solutions cannot be expected to become available before GPT-4.5 or GPT-5).
For testing, try e.g. the following prompts:
- What is the current time?
- Plot the graphs of sin(x) and cos(x) in a single graph
- Make a QR-code of my contact information: Stephan Seeger; Homepage: domain-seeger.de
or after uploading a data set (e.g. from Kaggle)
- Explain the dataset.
- Show 4 different ways of displaying the data visually.
On June 13, 2023, OpenAI announced a number of updates to their API:
- new function calling capability in the Chat Completions API
- new 16k context version of
gpt-3.5-turbowith 2 times the price as the standard 4k version ($0.003 per 1K input tokens and $0.004 per 1K output)
- 75% cost reduction on the embeddings model ($0.0001 per 1K tokens)
- 25% cost reduction on input tokens for
($0.0015 per 1K input tokens and $0.002 per 1K output tokens)
- stable model names (
gpt-4-32k) will automatically be upgraded to the new models (
gpt-4-32k-0613) on June 27
- deprecation of
gpt-4-0314models after Sept 13
All models come with the same data privacy and security guarantees introduced on March 1, i.e. requests and API data will not be used for training.
The new function calling capability in
gpt-4-0613, which is achieved by the new API parameters,
function_call, in the
/v1/chat/completions endpoint allows e.g. the following use cases:
- Chatbots that answer questions by calling external tools (like ChatGPT Plugins)
- Convert natural language into API calls or database queries
- Extract structured data from text.
OpenAI has officially launched the ChatGPT app for iOS users in the US. The app comes with a range of notable features:
- Free of Charge: The ChatGPT app can be downloaded and used free of cost.
- Sync Across Devices: Users can maintain their chat history consistently across multiple devices.
- Voice Input via Whisper: The app includes integration with Whisper, OpenAI’s open-source speech-recognition system, allowing users to input via voice commands.
- Exclusive Benefits for ChatGPT Plus Subscribers: Those who subscribe to ChatGPT Plus can utilize GPT-4’s enhanced capabilities. They also receive early access to new features and benefit from faster response times.
- Initial US Rollout: The app is initially launching in the US, with a plan to expand its availability to other countries in the upcoming weeks.
- Android Version Coming Soon: OpenAI has confirmed that Android users can expect to see the ChatGPT app on their devices in the near future. Further updates are expected soon.
1st-level generative AI as applications that are directly based on X-to-Y models (foundation models that build a kind of operating system for downstream tasks) where X and Y can be text/code, image, segmented image, thermal image, speech/sound/music/song, avatar, depth, 3D, video, 4D (3D video, NeRF), IMU (Inertial Measurement Unit), amino acid sequences (AAS), 3D-protein structure, sentiment, emotions, gestures, etc., e.g.
- X = text, Y = text: LLM-based chatbots like ChatGPT (from OpenAI based on LLMs GPT-3.5 [4K context] or GPT-4 [8K/32K context]), Bing Chat (GPT-4), Bard (from Google, based on PaLM 2), Claude (from Anthropic [100K context]), Llama2 (from Meta), Falcon 180B (from Technology Innovation Institute), Alpaca, Vicuna, OpenAssistant, HuggingChat (all based on LLaMA [GitHub] from Meta), OpenChatKit (based on EleutherAI’s GPT-NeoX-20B), CarperAI, Guanaco, My AI (from Snapchat), Tingwu (from Alibaba based on Tongyi Qianwen), (other LLMs: MPT-7B and MPT-30B from Mosaic [65K context, commercially usable], Orca, Open-LLama-13b), or coding assistants (like GitHub Copilot / OpenAI Codex, AlphaCode from DeepMind, CodeWhisperer from Amazon, Ghostwriter from Replit, CodiumAI, Tabnine, Cursor, Cody (from Sourcegraph), StarCoder from Big Code Project led by Hugging Face, CodeT5+ from Salesforce, Gorilla, StableCode from Stability.AI, Code Llama from Meta), or writing assistants (like Jasper, Copy.AI), etc.
- X = text, Y = image: Dall-E (from OpenAI), Midjourney, Stable Diffusion (from Stability.AI), Adobe Firefly, DeepFloyd-IF (from Deep Floyd, [GitHub, HuggingFace]), Imagen and Parti (from Google), Perfusion (from NVIDIA)
- X = text, Y = 360° image: Skybox AI (from Blockade Labs)
- X = text, Y = 3D avatar: Tafi
- X = text, Y = avatar lip sync: Ex-Human, D-ID, Synthesia, Colossyan, Hour Once, Movio, YEPIC-AI, Elai.io
- X = speech + face video, Y = synched audio-visual: Lalamu
- X = text, Y = video: Gen-2 (from Runway Research), Imagen-Video (from Google), Make-A-Video (from Meta), or from NVIDIA
- X = text, Y = video game: Muse & Sentis (from Unity)
- X = image, Y = text: GPT-4 (from OpenAI), LLaVA
- X = image, Y = segmented image: Segment Anything Model (SAM by Meta)
- X = speech, Y = text: STT (speech-to-text engines) like Whisper (from OpenAI), MMS [GitHub] (from Meta), Conformer-2 (from AssemblyAI)
- X = text, Y = speech: TTS (text-to-speech engines) like VALL-E (from Microsoft), Voicebox (from Meta), SoundStorm (from Google), ElevenLabs, Bark, Coqui
- X = text, Y = music: MusicLM (from Google), RIFFUSION, AudioCraft (MusicGen, AudioGen, EnCodec from Meta), Stable Audio (from Stability.ai)
- X = text, Y = song: Voicemod
- X = text, Y = 3D: DreamFusion (from Google)
- X = text, Y = 4D : MAV3D (from Meta)
- X = image, Y = 3D : CSM
- X = image, Y = audio: ImageBind  (from Meta, on GitHub)
- X = audio, Y = image: ImageBind  (from Meta)
- X = music, Y = image: MusicToImage
- X = text, Y = image & audio: ImageBind  (from Meta)
- X = audio & image, Y = image: ImageBind  (from Meta)
- X = IMU, Y = video: ImageBind (from Meta)
- X = AAS, Y = 3D-protein: AlphaFold (from Google), RoseTTAFold (from Baker Lab), ESMFold (from Meta)
- X = 3D-protein, Y = AAS: ProteinMPNN (from Baker Lab)
- X = 3D structure, Y = AAS: RFdiffusion (from Baker Lab)
and 2nd-level generative AI that builds some kind of middleware and allows to implement agents by simplifying the combination of LLM-based 1st-level generative AI with other tools via actions (like web search, semantic search [based on embeddings and vector databases like Pinecone, Chroma, Milvus, Faiss], source code generation [REPL], calls to math tools like Wolfram Alpha, etc.), by using special prompting techniques (like templates, Chain-of-Thought [COT], Self-Consistency, Self-Ask, Tree Of Thoughts, ReAct [Reason + Act], Graph of Thoughts) within action chains, e.g.
we currently (April/May/June 2023) see a 3rd-level of generative AI that implements agents that can solve complex tasks by the interaction of different LLMs in complex chains, e.g.
- Llama Lab (llama_agi, auto_llama)
- Camel, Camel-AutoGPT
- JARVIS (from Microsoft)
- Generative Agents
- ACT-1 (from Adept)
- GPT Engineer
However, older publications like Cicero may also fall into this category of complex applications. Typically, these agent implementations are (currently) not built on top of the 2nd-level generative AI frameworks. But this is going to change.
Other, simpler applications that just allow semantic search over private documents with a locally hosted LLM and embedding generation, such as e.g. PrivateGPT which is based on LangChain and Llama (functionality similar to OpenAI’s ChatGPT-Retrieval plugin), may also be of interest in this context. And also applications that concentrate on the code generation ability of LLMs like GPT-Code-UI and OpenInterpreter, both open-source implementations of OpenAI’s ChatGPT Code Interpreter/AdvancedDataAnalysis (similar to Bard’s implicit code execution; an alternative to Code Interpreter is plugin Noteable), or smol-ai developer (that generates the complete source code from a markup description) should be noticed.
There is a nice overview of LLM Powered Autonomous Agents on GitHub.
The next level may then be governed by embodied LLMs and agents (like PaLM-E with E for Embodied).
The Future of Life Institute initiated an open letter in which they call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4 [notice that OpenAI already trains GPT-5 for some time]. They state that powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable.
The gained time should be used to develop safety protocols by AI experts to make the systems more accurate, safe, interpretable, transparent, robust, aligned, trustworthy, and loyal. In addition, they ask for the development of robust AI governance systems by policymakers and AI developers. They also demand well-resourced institutions for coping with the dramatic economic and political disruptions (especially to democracy) that AI will cause.
Notice that the letter is not against further AI development but just to slow down and give society a chance to adapt.
The letter was signed by several influential people, e.g. Elon Musk (CEO of SpaceX, Tesla & Twitter), Emad Mostaque (CEO of Stability AI), Yuval Noah Harari (Author), Max Tegmark (president of Future of Life Institute), Yoshua Bengio (Mila, Turing Prize winner), Stuart Russell (Berkeley).
However, it should be noticed that even more influential people in the AI scene have not (yet) signed this letter, none from OpenAI, Google/Deep Mind, or Meta.
This is not the first time the Future of Live Institute has taken action on AI development. In 2015, they presented an open letter signed by over 1000 robotics and AI researchers urging the United Nations to impose a ban on the development of weaponized AI.
The Future of Life Institute is a non-profit organization that aims to mitigate existential risks facing humanity, including those posed by AI.
Yann LeCun answered on Twitter with a nice fictitious anecdote to the request:
The year is 1440 and the Catholic Church has called for a 6 months moratorium on the use of the printing press and the movable type. Imagine what could happen if commoners get access to books! They could read the Bible for themselves and society would be destroyed.
Plugins can be automatically called by ChatGPT’s underlying LLM (Large Language Model, currently GPT-3.5 or GPT-4) in order to answer the questions of the user.
In order to make this work, plugins have to be registered in the ChatGPT user interface with a manifest file
(ai-plugin.json) that is hosted in the developer’s domain at
yourdomain.com/.well-known/ai-plugin.json. The file contains in a prescribed format
– metadata about the plugin (name, logo)
– details about the authentication mechanism
– an OpenAI specification for the endpoints of the API
– and a general description for the LLM of what the plugin can do.
The web app API needs to define an endpoint “/.well-known/ai-plugin.json” to access the content of this file.
In addition to the manifest file, an openapi.yaml file, that defines the OpenAI specification, has to be generated that is referenced in the “api” section of the manifest file via the “url” field. This file contains a detailed description of the API endpoints. The web app API needs to define an endpoint “/openapi.yaml” to access the content of this file.
When the user has activated a registered plugin and starts a conversation, the plugin’s description is injected into the message to ChatGPT, but invisible to the user. In this way, the LLM may choose an API call from the plugin if this seems relevant to the user’s question. The LLM will then incorporate the API result into the response to the user. More details can be found in OpenAI’s documentation.
Among the already available plugins, a few stand out. With the Wolfram plugin, all kinds of computational problems can be solved. And with the Zapier plugin, more than 5000 apps can be accessed. OpenAI itself introduced a web browser (that uses the Bing search API) and a code interpreter plugin (that runs in a sandbox without an internet connection). In addition, they open-sourced the code for a knowledge base retrieval plugin, that has to be self-hosted by a developer.
Interestingly enough, OpenAI notices that plugins will likely have wide-ranging societal implications and that language models with access to tools will likely have much greater economic impacts than those without. They expect the current wave of AI technologies to have a big effect on the pace of job transformation, displacement, and creation. OpenAI discusses the impact potential of large language models at the labor market in a recent publication.
Just a day after the OpenAI announcement of ChatGPT plugins, the open-source community already integrated these plugins also into LangChain. This is done just by referring to the plugin manifest file ai-plugin.json (see Twitter), e.g.:
tool = AIPluginTool.from_plugin_url( "https://www.clarna.com/.wellknown/ai-plugin.json")
All the other exciting news of the week is well summarized by Matt Wolfe (Google Bard, NVIDIA GTC, Adobe Firefly, Image Generation in Bing via DALL-E2, Microsoft Loop, AI in Canva, GitHub Copilot X, AI in Ubisoft, Metahuman by Unreal Engine).