LLM - Chat GPT and Gemini to "see" images and then take action

Good morning!

is it possible to connect ChatGPT or Gemini API with Google Drive to:

The key point here (I think): The action shouldn't be a pre-made action but a prompted action

  1. Retrieve images from a specific folder.
  2. Analyze their content using AI.
  3. Determine their proper sequence based on relevance.
  4. Place them in a Google Docs file in the correct order using AI.
    PS.: every file will be slightly different I am not able to create one template.
    E.G.: LLM reads the images based on my prompt, and after defining it, moves and saves the images in Google Docs based on the organization said on my prompt.
 

ArshilAhmad

Well-known member
Staff member
Hi @Guilherme Oliveira,

Currently, it's not possible to analyze images using OpenAI or Gemini action steps in Pabbly Connect. Please try using the "Google Cloud Vision: Detect Text in Images" action step to extract text from the images and then pass it to OpenAI or Gemini action step. Let us know the results after you've tried this.

1739293762561.png


 

Fagun Shah

Well-known member
Good morning!

is it possible to connect ChatGPT or Gemini API with Google Drive to:

The key point here (I think): The action shouldn't be a pre-made action but a prompted action

  1. Retrieve images from a specific folder.
  2. Analyze their content using AI.
  3. Determine their proper sequence based on relevance.
  4. Place them in a Google Docs file in the correct order using AI.
    PS.: every file will be slightly different I am not able to create one template.
    E.G.: LLM reads the images based on my prompt, and after defining it, moves and saves the images in Google Docs based on the organization said on my prompt.
This is not possible by pabbly type of automation softwares.

May be possible by some google drive plugins or extensions or app script.
 
Top