Google presents two complementary techniques to significantly improve language models without massive extra compute:
UL2R (UL2 Repair): additional stage of continued pre-training with the UL2 (Unified Language Learner) objective (paper) for training language models as denoising tasks, where the model has to recover missing sub-sequences of a given input. Applying it to PaLM results in new language model U-PaLM.
Flan (Fine-tuning language model): instruction fine-tuning on a collection of NLP datasets. Applying it to PaLM results in language model Flan-PaLM.
Combination of two approaches applied to PaLM results in Flan-U-PaLM.
– New Text-to-Image diffusion models (improved quality, 512×512 and 768×768 image sizes by default) – Super-resolution up-scaler (up to 4x upscaling for 2048×2048+ images) – Depth-to-Image diffusion model – Updated Inpainting diffusion model
Meta AI presents CICERO, an AI agent that can negotiate and cooperate with people. It is the first AI system that achieves human-level performance in the popular strategy game Diplomacy. Cicero ranked in the top 10 of participants on webDiplomacy.net.
Yannic Kilcher gives a great discussion of the accompanying Science paper. A second paper is freely available on arXiv. The source code is accessible on GitHub.
Meanwhile also DeepMind published an AI agent playing Diplomacy.
Meta AI publishes with Galactica.ai a large language model trained on scientific papers that allows to write a literature review, wiki article, or lecture note with references, formulas, etc. just by giving some text input about a topic. Even the paper about Galactica was written with the help of Galactica.
Just after a day, the Galactica.ai webpage is now down. But the source code is available on GitHub. Yannic Kilcher made a nice paper review about Galactica where he also explains why the demo webpage has been taken down.
IBM announced the availability of its new 433-qubits quantum processor Osprey, the largest gate-based quantum processor to date. In this way, IBM is in line with their long-term roadmap (see below). With 433 qubits the Osprey processor is more than 3 times larger than IBM’s previous flagship, the 127-qubits processor Eagle.
ESMFold (ESM = Evolutionary Scale Modeling) [paper] uses a large language model that allows to accelerate folding (i.e. predicting the 3D structure of a protein from the DNA sequence [that encodes the amino acid sequence]) by up to 60 times (compared to state-of-the-art techniques like AlphaFold). This improvement has the potential to accelerate work in medicine, green chemistry, environmental applications, and renewable energy.
In addition, Meta AI made a new database of 600 million metagenomic protein structures (proteins which are found in microbes in the soil, deep in the ocean, and even in our guts and on our skin) available to the scientific community via the ESM Metagenomic Atlas.
ESMFold and related models like ESM-2 are published together with the API on GitHub and HuggingFace.