- Nov 22, 2023, Inflection-2 asserted to be second best LLM (for Pi.ai)
- Nov 21, 2023, ChatGPT with voice available to all free users
- Nov 21, 2023, StabilityAI open-sources Stable Video Diffusuion
- Nov 21, 2023, Anthropic announces Claude 2.1 with 200k context [tests]
- Nov 17, 2023, Sam Altman fired from OpenAI, Greg Brockman quits
- Nov 17, 2023, Google delays release of Gemini
- Nov 16, 2023, Recap of announcements at Microsoft Ignite 2023 (keynote)
- Nov 16, 2023, Microsoft unveils its first AI chip, Maia 100
- Nov 16, 2023, Meta: Emu Video and Emu Edit
- Nov 15, 2023, OpenAI puts pause on new ChatGPT Plus signups
- Nov 14, 2023, OpenAI started building GPT-5
- Nov 13, 2023, 01.AI released LLMs YI-34B and YI-6B with 200K context
- Nov 09, 2023, AI Pin from Humane officially launched
- Nov 09, 2023, Amazon Generative AI Model: Olympus (size 2x GPT-4)
- Nov 09, 2023, Samsung Generative AI Model: Gauss
- Nov 08, 2023, GitHub Universe 2023 (Copilot Workspace in 2024 )
- Nov 06, 2023, OpenAI DevDay: Create and share AI agents 
- Nov 06, 2023, xAI’s Grok will be integrated into Tesla vehicles
- Nov 06, 2023, Chinese startup 01.AI released open-source LLM Yi-34B
- Nov 04, 2023, xAI reveals Grok chatbot with real-time data from X
- Nov 03, 2023, RoboGen: Robot Learning via Generative Simulation
- Nov 03, 2023, Adept releases multimodal Fuyu-8b model
- Nov 03, 2023, Phind finetuned on CodeLlama on GPT-4 level?
- Nov 02, 2023, UK’s AI safety summit at Bletchley Park 
- Nov 02, 2023, Runway makes huge updates to Gen-2
- Nov 02, 2023, LLM Yarn-Mistral-7b-128k
- Nov 02, 2023, Stability AI announced Stable 3D
- Oct 31, 2023, DeepMind shares progress on next-gen AlphaFold
- Oct 30, 2023, Biden’s Executive Order on Safe, Secure, Trustworthy AI
- Oct 30, 2023, GPT-3.5-Turbo in ChatGPT is a 20B model (?)
- Oct 29, 2023, Vision on input and output within ChatGPT, PDF Chat
- Oct 28, 2023, Video-to-text LLM: Pegasus-1 by Twelve Labs
- Oct 26, 2023, ChatGPT knowledge cutoff Sept 2023 (?)
- Oct 21, 2023, OpenAgents: open platform for language agents
- Oct 21, 2023, IBM announced powerful AI chip
- Oct 20, 2023, Meta announced Habitat 3.0
- Oct 20, 2023, NVIDIA announced Eureka: super-human robot dexterity
- Oct 20, 2023, Amazon trialing humanoid robots in US warehouses
- Oct 18, 2023, FlashDecoding: up to 8X faster inference on long sequences
- Oct 18, 2023, Meta: reconstructing visual and speech from MEG
- Oct 17, 2023, Llemma: An Open Language Model for Mathematics
- Oct 17, 2023, PyTorch conference: torch.compile decorator for numpy
- Oct 16, 2023, MemGPT, a method for extending LLM context windows
- Oct 13, 2023, Tectonic Copilot shift announced for GitHub Universe Nov 8
- Oct 13, 2023, Connor Leahy’s speech in Cambridge on AI Existential Risk
- Oct 12, 2023, State of AI Report
- Oct 11, 2023, Meta introduces Universal Simulator (UniSim)
- Oct 10, 2023, Adobe Firefly update
- Oct 03, 2023, GAIA1: 9B world model to simulate driving scenes
- Oct 03, 2023, RT-X: largest open-source robot dataset ever compiled
- Oct 02, 2023, StreamingLLM: 22 times faster, up to 4 million tokens 
1st-level generative AI as applications that are directly based on X-to-Y models (foundation models that build a kind of operating system for downstream tasks) where X and Y can be text/code, image, segmented image, thermal image, speech/sound/music/song, avatar, depth, 3D, video, 4D (3D video, NeRF), IMU (Inertial Measurement Unit), amino acid sequences (AAS), 3D-protein structure, sentiment, emotions, gestures, etc., e.g.
- X = text, Y = text: LLM-based chatbots like ChatGPT (from OpenAI based on LLMs GPT-3.5 [4K context] or GPT-4 [8K/32K context]), Bing Chat (GPT-4), Bard (from Google, based on PaLM 2), Claude (from Anthropic [100K context]), Llama2 (from Meta), Falcon 180B (from Technology Innovation Institute), Alpaca, Vicuna, OpenAssistant, HuggingChat (all based on LLaMA [GitHub] from Meta), OpenChatKit (based on EleutherAI’s GPT-NeoX-20B), CarperAI, Guanaco, My AI (from Snapchat), Tingwu (from Alibaba based on Tongyi Qianwen), (other LLMs: MPT-7B and MPT-30B from Mosaic [65K context, commercially usable], Orca, Open-LLama-13b), or coding assistants (like GitHub Copilot / OpenAI Codex, AlphaCode from DeepMind, CodeWhisperer from Amazon, Ghostwriter from Replit, CodiumAI, Tabnine, Cursor, Cody (from Sourcegraph), StarCoder from Big Code Project led by Hugging Face, CodeT5+ from Salesforce, Gorilla, StableCode from Stability.AI, Code Llama from Meta), or writing assistants (like Jasper, Copy.AI), etc.
- X = text, Y = image: Dall-E (from OpenAI), Midjourney, Stable Diffusion (from Stability.AI), Adobe Firefly, DeepFloyd-IF (from Deep Floyd, [GitHub, HuggingFace]), Imagen and Parti (from Google), Perfusion (from NVIDIA)
- X = text, Y = 360° image: Skybox AI (from Blockade Labs)
- X = text, Y = 3D avatar: Tafi
- X = text, Y = avatar lip sync: Ex-Human, D-ID, Synthesia, Colossyan, Hour Once, Movio, YEPIC-AI, Elai.io
- X = speech + face video, Y = synched audio-visual: Lalamu
- X = text, Y = video: Gen-2 (from Runway Research), Imagen-Video (from Google), Make-A-Video (from Meta), or from NVIDIA
- X = text, Y = video game: Muse & Sentis (from Unity)
- X = image, Y = text: GPT-4 (from OpenAI), LLaVA
- X = image, Y = segmented image: Segment Anything Model (SAM by Meta)
- X = speech, Y = text: STT (speech-to-text engines) like Whisper (from OpenAI), MMS [GitHub] (from Meta), Conformer-2 (from AssemblyAI)
- X = text, Y = speech: TTS (text-to-speech engines) like VALL-E (from Microsoft), Voicebox (from Meta), SoundStorm (from Google), ElevenLabs, Bark, Coqui
- X = text, Y = music: MusicLM (from Google), RIFFUSION, AudioCraft (MusicGen, AudioGen, EnCodec from Meta), Stable Audio (from Stability.ai)
- X = text, Y = song: Voicemod
- X = text, Y = 3D: DreamFusion (from Google)
- X = text, Y = 4D : MAV3D (from Meta)
- X = image, Y = 3D : CSM
- X = image, Y = audio: ImageBind  (from Meta, on GitHub)
- X = audio, Y = image: ImageBind  (from Meta)
- X = music, Y = image: MusicToImage
- X = text, Y = image & audio: ImageBind  (from Meta)
- X = audio & image, Y = image: ImageBind  (from Meta)
- X = IMU, Y = video: ImageBind (from Meta)
- X = AAS, Y = 3D-protein: AlphaFold (from Google), RoseTTAFold (from Baker Lab), ESMFold (from Meta)
- X = 3D-protein, Y = AAS: ProteinMPNN (from Baker Lab)
- X = 3D structure, Y = AAS: RFdiffusion (from Baker Lab)
and 2nd-level generative AI that builds some kind of middleware and allows to implement agents by simplifying the combination of LLM-based 1st-level generative AI with other tools via actions (like web search, semantic search [based on embeddings and vector databases like Pinecone, Chroma, Milvus, Faiss], source code generation [REPL], calls to math tools like Wolfram Alpha, etc.), by using special prompting techniques (like templates, Chain-of-Thought [COT], Self-Consistency, Self-Ask, Tree Of Thoughts, ReAct [Reason + Act], Graph of Thoughts) within action chains, e.g.
we currently (April/May/June 2023) see a 3rd-level of generative AI that implements agents that can solve complex tasks by the interaction of different LLMs in complex chains, e.g.
- Llama Lab (llama_agi, auto_llama)
- Camel, Camel-AutoGPT
- JARVIS (from Microsoft)
- Generative Agents
- ACT-1 (from Adept)
- GPT Engineer
However, older publications like Cicero may also fall into this category of complex applications. Typically, these agent implementations are (currently) not built on top of the 2nd-level generative AI frameworks. But this is going to change.
Other, simpler applications that just allow semantic search over private documents with a locally hosted LLM and embedding generation, such as e.g. PrivateGPT which is based on LangChain and Llama (functionality similar to OpenAI’s ChatGPT-Retrieval plugin), may also be of interest in this context. And also applications that concentrate on the code generation ability of LLMs like GPT-Code-UI and OpenInterpreter, both open-source implementations of OpenAI’s ChatGPT Code Interpreter/AdvancedDataAnalysis (similar to Bard’s implicit code execution; an alternative to Code Interpreter is plugin Noteable), or smol-ai developer (that generates the complete source code from a markup description) should be noticed.
There is a nice overview of LLM Powered Autonomous Agents on GitHub.
The next level may then be governed by embodied LLMs and agents (like PaLM-E with E for Embodied).
On March 30, 2023, the Baker Lab announced that RF Diffusion (a powerful guided diffusion model for protein design) is now free and open source. The source code is available on ColabFold (as a Google Colab) and on GitHub.
Proteins made via RF Diffusion have the potential to prevent infections, combat cancer, reverse autoimmune disorders, and serve as key components in advanced materials.
While ProteinMPNN takes a protein backbone (N-CA-C-O atoms, CA = C-Alpha) and finds an amino acid sequence that would fold to that backbone structure, RFdiffusion [Twitter] instead makes the protein backbone by just providing some geometrical and functional constraints like “create a molecule that binds X”.
The authors used a guided diffusion model for generating new proteins in the same way as Dall-E produces high-quality images that have never existed before by a diffusion technique.
See also this presentation by David Baker.
If I interpret this announcement correctly it means that drug design is now basically solved (or starts to get interesting depending on the viewpoint).
This technique can be expected to significantly increase the number of potential drugs for combating diseases. However, animal tests and human studies can also be expected as the bottlenecks of the new possibilities. Techniques like organ chips from companies like emulate may be a way out of this dilemma (before one-day entire cell, tissue, or whole body computational simulations become possible).
The software tool ProteinMPNN (Message Passing Neural Network) from Baker Lab can predict from a given 3D protein structure possible amino acid sequences that would fold into the given structure, in this way effectively reversing what AlphaFold from DeepMind or ESMFold from Meta can do. So the approach allows to design proteins. With a DNA/RNA printer as the BioXp from TelesisBio or the Syntax system from DNAScript it is possible to directly output the desired protein or a virus that generates the protein in a cell when injected into the body.
ESMFold (ESM = Evolutionary Scale Modeling) [paper] uses a large language model that allows to accelerate folding (i.e. predicting the 3D structure of a protein from the DNA sequence [that encodes the amino acid sequence]) by up to 60 times (compared to state-of-the-art techniques like AlphaFold). This improvement has the potential to accelerate work in medicine, green chemistry, environmental applications, and renewable energy.
In addition, Meta AI made a new database of 600 million metagenomic protein structures (proteins which are found in microbes in the soil, deep in the ocean, and even in our guts and on our skin) available to the scientific community via the ESM Metagenomic Atlas.