Emerging Open-Source AI Vision Model Takes on ChatGPT with Some Challenges

December 4, 2023 1:05 PM

Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.

Nous Research, a private applied research group known for publishing open-source work in the large language model (LLM) domain, has dropped a lightweight vision-language model called Nous Hermes 2 Vision.

Available via Hugging Face, the open-source model builds on the company’s previous OpenHermes-2.5-Mistral-7B model. It brings vision capabilities, including the ability to prompt with images and extract text information from visual content.

However, soon after launch, the model was found to be hallucinating more than expected, leading to glitches and the eventual renaming of the project to Hermes 2 Vision Alpha. The company is expected to follow this up with a more stable release, providing similar benefits but fewer glitches.

Nous Hermes 2 Vision Alpha

Named after Hermes, the Greek messenger of Gods, the Nous vision model is designed to be a system that navigates “the complex intricacies of human discourse with celestial finesse.” It taps the image data provided by a user and combines that visual information with its learnings to provide detailed answers in natural language.

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

Learn More

For instance, it could analyze a user’s image and detail different aspects of what it contains. The co-founder of Nous, who goes by Teknium on X, shared a test screenshot where the LLM was able to analyze a photo of a burger and figure out if it would be unhealthy to eat and explain why.

Nous Hermes 2 Vision at work

While ChatGPT, based on GPT-4V, also brings the ability to prompt with images, the open-source offering from Nous differentiates with two key enhancements.

First, unlike traditional approaches that rely on substantial 3B vision encoders, Nous Hermes 2 Vision harnesses SigLIP-400M. This not only streamlines the model’s architecture, making it more lightweight than its counterparts, but also helps boost performance on vision-language tasks.

Secondly, it has been trained on a custom dataset enriched with function calling. This allows users to prompt the model with a tag and extract written information from an image, like a menu or billboard.

“This distinctive addition transforms Nous-Hermes-2-Vision into a Vision-Language Action Model. Developers now have a versatile tool at their disposal, primed for crafting a myriad of ingenious automations,” the company wrote on the Hugging Face page of the model.

Other datasets used for training the model were LVIS-INSTRUCT4V, ShareGPT4V and conversations from OpenHermes-2.5.

Despite differentiations, issues remain at this stage

While the Nous vision-language model is available for research and development, early usage has shown that it is far from perfect.

» …
Read More
rnrn

Emerging Open-Source AI Vision Model Takes on ChatGPT with Some Challenges

LEAVE A REPLY Cancel reply

Exclusive content

Charges pending for bus driver in Virginia crash that killed 5

Raul Castro’s six-fingered, jet-setting grandson emerges from shadows in US-Cuba talks

Muslim interfaith activist marching in Sunday’s Israel Parade in NYC — despite haters’ threats

Latest article

Charges pending for bus driver in Virginia crash that killed 5

Raul Castro’s six-fingered, jet-setting grandson emerges from shadows in US-Cuba talks

Muslim interfaith activist marching in Sunday’s Israel Parade in NYC — despite haters’ threats

Inside Marilyn Monroe grave-robbing business…

50 Best Restaurants in N America Announced…

More article

4 miners walk out of Laos mine after water recedes; 2 still missing

Charges pending for bus driver in Virginia crash that killed 5

Raul Castro’s six-fingered, jet-setting grandson emerges from shadows in US-Cuba talks

Muslim interfaith activist marching in Sunday’s Israel Parade in NYC — despite haters’ threats

About Us

Popular Posts

Charges pending for bus driver in Virginia crash that killed 5

Raul Castro’s six-fingered, jet-setting grandson emerges from shadows in US-Cuba talks

Muslim interfaith activist marching in Sunday’s Israel Parade in NYC — despite haters’ threats

Popular Categories

Stay connected