LLM - Chat GPT and Gemini to "see" images and then take action

Guilherme Oliveira · Feb 11, 2025

Good morning!

is it possible to connect ChatGPT or Gemini API with Google Drive to:

The key point here (I think): The action shouldn't be a pre-made action but a prompted action

Retrieve images from a specific folder.
Analyze their content using AI.
Determine their proper sequence based on relevance.
Place them in a Google Docs file in the correct order using AI.
PS.: every file will be slightly different I am not able to create one template.
E.G.: LLM reads the images based on my prompt, and after defining it, moves and saves the images in Google Docs based on the organization said on my prompt.

ArshilAhmad · Feb 12, 2025

Hi @Guilherme Oliveira,

Currently, it's not possible to analyze images using OpenAI or Gemini action steps in Pabbly Connect. Please try using the "Google Cloud Vision: Detect Text in Images" action step to extract text from the images and then pass it to OpenAI or Gemini action step. Let us know the results after you've tried this.

Fagun Shah · Feb 12, 2025

Guilherme Oliveira said:
Good morning!

is it possible to connect ChatGPT or Gemini API with Google Drive to:

The key point here (I think): The action shouldn't be a pre-made action but a prompted action

Retrieve images from a specific folder.

Analyze their content using AI.

Determine their proper sequence based on relevance.

Place them in a Google Docs file in the correct order using AI.
PS.: every file will be slightly different I am not able to create one template.
E.G.: LLM reads the images based on my prompt, and after defining it, moves and saves the images in Google Docs based on the organization said on my prompt.

This is not possible by pabbly type of automation softwares.

May be possible by some google drive plugins or extensions or app script.

LLM - Chat GPT and Gemini to "see" images and then take action

Guilherme Oliveira

Member

ArshilAhmad

Well-known member

Fagun Shah

Well-known member

Similar threads