GNSS & Machine Learning Engineer

Tag: TTI

Google announces PaLM API release

On the same day as OpenAI released GPT-4 (March 14, 2023), Google also announced the availability of the PaLM API for developers on Google Cloud [video]. They said that they are now providing access to foundation models on Google Cloud’s Vertex AI platform, initially for generating text and images, and over time also for audio and video. In addition, with the Generative AI App Builder, they introduced the possibility of quickly building AI-powered chat interfaces and digital assistants.

Finally, Google also made for a limited set of trusted test users generative AI features available within Google Workspace (Gmail and Google Docs).

RIFFUSION: Stable Diffusion for Real-Time Music Generation

By using the stable diffusion model v1.5 without any modifications, just fine-tuned on images of spectrograms paired with text, the software RIFFUSION (RIFF + diffusion) generates incredibly interesting music from text input. By interpolating in latent space it is possible to transition from one text prompt to the next. You can try out the model here.

The authors provide source code on GitHub for an interactive web app and an inference server. A model checkpoint is available on Hugging Face.

There is a nice video about RIFFUSION by Alan Thompson on youtube.

Even more shocking than using diffusion on spectrograms and getting great results may be a paper by Google Research published on Dec 15, 2022. They use text as an image and train their model with contrastive loss alone, thus calling their model CLIP-Pixels Only (CLIPPO). It’s a joint model for processing images and text with a single ViT (Vison Transformer) approach and astonishing performance.

© 2023 Stephan Seeger

Theme by Anders NorenUp ↑