In 2023, we saw the rise of generative text-based AI models, including GPT-4, Llama, Bard, Palm, and a host of other models built on top of these. This year, we may witness the emergence of more advanced generative AI models focused on images and videos. While DALL·E, MidJourney, Runway ML and Stability.AI have been at the forefront of visual generation models, Apple and Google are also entering the race.
Some Recent Generative Visual Models-
OpenAI released Sora which converts text into video. It’s text-to-video model and generate videos up to a minute long.
Apple released MGIE which edits photo based on instructions. It can crop, resize, flip, and add filters to images all through text prompts. Try some demo here.
Stable Cascade is one of the new model from Stability.AI and claims to be more powerful and faster than it’s previous model Stable Diffusion. Here is one of the image I generated with this model.
Generate your own image with prompts here.
Imagen 2 is text-to-image model developed by Google. The API is available only though Google Cloud account and you have to access it through the Vertex AI suite.
Links Around The Web
Books I Read This Week-
It's a collection of short stories, and I finished reading the book in one day. Each story has a different genre, ranging from horror, science fiction, and magic realism. Some of them felt like modern Aesop, but overall, I enjoyed it a lot and highly recommend it.
Big Billion Startup - The Untold Flipkart Story by Mihir Dalal
I took up this book with much anticipation but was very disappointed by the end. I was expecting to learn more about how Flipkart navigated through their highs and lows but all I found were the names of engineers and managers who worked there and their background! I felt the book lacks serious depth and research.
That’s it folks. Thanks for reading!
The Cursed Bunny is a must read!