Gemini toolbox: everything you can do today with Google’s AI apps, tools and API

Última actualización: 25 de March de 2026
  • Gemini’s toolbox combines stable tools like Canvas, Deep Research and Guided Learning with experimental Labs features.
  • The Gemini API unlocks multimodal and function-calling workflows across Google Workspace and custom automations.
  • Guided Learning, Canvas and agents make Gemini both a personal tutor and a work assistant for documents, slides and email.
  • Using Labs, Gemini Enterprise and Workspace integration lets teams test powerful AI safely on top of their own data.

Gemini toolbox concept

The “Gemini toolbox” is no longer just a catchy phrase; it’s the practical set of apps, tools, agents and APIs that Google is quietly weaving into everything from casual learning to enterprise workflows. Instead of a single monolithic assistant, Gemini now behaves more like a toolbox where each feature is a specific instrument: research engine, tutor, code helper, meeting scheduler, slide builder and much more.

If you understand how these pieces fit together – Canvas, Guided Learning, Labs, agents, Gemini Enterprise and the Gemini API – you can turn Gemini into a real workhorse instead of a novelty chatbot. Below you’ll find a detailed tour of this toolbox: what lives in the stable “Tools” area, what’s being tested in “Labs”, how Gemini behaves as a tutor with images and videos, and how developers can wire the API into Google Workspace for serious automation.

What exactly is in the Gemini toolbox today?

Gemini is best understood as a family of AI models (what are language models) (Gemini 1.0, Gemini 1.5, Gemini 3 and so on) delivered through different front-ends: web, mobile apps, Workspace integration and a developer API. The “toolbox” idea comes from the way Google now groups concrete capabilities inside the Gemini interface, especially on the web.

On the web, the main picker inside Gemini is split into two major zones: “Tools” for stable, production-ready functionality and “Labs” for experiments still in flux. Think of “Tools” as the trusted screwdriver you grab every day, while “Labs” is the tray where you keep prototypes that might change shape next week.

On mobile, Gemini apps are adding many of these same tools – guided learning, Canvas-like experiences, image-rich help – but they are rolling out gradually. If you don’t see a specific feature in the app yet, Google explicitly recommends trying again later or jumping to gemini.google.com to see the latest version on the web.

Under the hood, all of these surfaces are backed by the Gemini API, which exposes multimodal models and function calling so you can generate content, analyze images or orchestrate workflows through code. That API is the backbone for many of the Workspace automations we’ll cover later.

Gemini tools and features

Tools vs Labs: how Gemini organizes its features

As Gemini has accumulated more buttons and modes, Google has introduced a clearer separation between mature features and experimental ones through two sections: “Tools” and “Labs”. This change is already visible on the web interface and is progressively being deployed from Google’s servers, so not every account sees the same layout at the same time.

The “Tools” section is where Google parks capabilities it considers stable and predictable for everyday use. Reports from sources like Android Police and 9to5Google show that this area includes items such as Deep Research, image generation, video creation via Veo, Canvas, Guided Learning and Deep Think, sometimes tied to specific subscription tiers like Google AI Pro or Google AI Ultra.

“Labs”, on the other hand, is the explicit playground: a dedicated area inside the Gemini picker that groups features marked as experimental. You’ll typically see icons with a little lab flask and labels like Gemini Agent, Dynamic View (also called Visual layout) and Personal Intelligence. The expectation when you click anything under Labs is simple: behavior may change, disappear or move with little warning.

From a product-design standpoint, this separation matters for trust. When an AI app grows quickly, the risk is not just “too many features” but “no idea which features I can rely on”. By putting day-to-day tools in one zone and experiments in another, Gemini is signalling risk in a way similar to “normal” vs “sport” mode in a car.

The stable Gemini tools: Deep Research, Canvas, Guided Learning and more

The core Gemini toolbox for most users lives under “Tools”, where you’ll find the experiences that Google wants you to build habits around. Although the exact lineup varies by account and subscription level, a few elements are already central.

Deep Research transforms Gemini into a structured research assistant rather than a generic chat model. When you ask a question that requires digging through multiple sources, Deep Research follows a more explicit multi-step process, surfacing a consistent methodology so users know what to expect every time they invoke it.

Content creation tools for images and video – including integrations powered by Veo – also sit in the Tools drawer. Users who rely on Gemini for visual content need these capabilities to be findable and reasonably stable, not hidden behind shifting experimental flags.

Canvas is another pillar: a workspace mode where you can start a document or coding project directly from a prompt, then iteratively refine it with Gemini. Under the request bar, you can select “Canvas” and type your prompt to generate a starting point for content or code, then keep editing in an interactive, side-by-side layout.

Guided Learning and Deep Think round out the more cognitively focused tools, especially for users who want structured help with complex topics. Guided Learning can behave like a tutor, walking you through ideas step by step, while Deep Think encourages slower, more deliberate reasoning on challenging questions.

Gemini as a personal tutor: Guided Learning, images and videos

One of the most user-friendly aspects of the Gemini toolbox is its ability to act as a private teacher, blending guided sequences with visual explanations. Rather than dumping a wall of text, Gemini can incorporate images, sketches and even videos into its responses to make concepts easier to grasp.

In practical terms, you can ask Gemini to explain a topic and explicitly request a diagram, a visual breakdown or an illustrative image. The response can embed those images directly in the explanation, helping you visualize, say, a math concept, a workflow or a scientific process.

Video-based learning is also supported, although the details vary by region and rollout phase. For some topics, Gemini can surface or reference videos that complement its textual answer, creating a more multimodal learning path where you read, watch and interact with questions in the same flow.

This teaching mode is being introduced gradually in the mobile Gemini apps, so you might not see all options straight away. When that happens, the fallback is to use the web experience, where Gemini’s feature set often appears earlier during staged rollouts.

Gemini Enterprise and Workspace: AI agents for teams

Beyond personal use, the Gemini toolbox extends into the workplace through Gemini Enterprise and Google Workspace integrations. Here, the focus shifts from one-off prompts to persistent agents, workflows and collaboration at scale.

Gemini Enterprise is described by Google as an advanced agent platform that brings the best of Google’s AI to every employee and workflow. In practice, it lets teams discover, create, share and run AI agents in a secure environment backed by their own company data, reducing development bottlenecks and enabling use cases like sales analysis, process automation and internal knowledge search.

Google Workspace itself acts as a collaboration platform supercharged by Gemini, with AI woven into apps like Gmail, Docs and Meet. Instead of switching out to a separate AI tool, users can summon Gemini within their everyday productivity apps to draft content, summarize information or generate ideas in context.

In some setups, you can even chat with Gemini directly over your enterprise data stored across Google Workspace, Microsoft 365 and other connected systems. That turns Gemini into a corporate knowledge layer that can answer questions based on emails, documents and files, subject to the permissions and security settings configured by IT.

The Gemini API: backbone of the developer toolbox

Underneath the user-facing Gemini apps lies the Gemini API, which exposes the same core models for developers to embed in their own applications. This API is where multimodality, function calling and custom workflows come together for serious automation, particularly with Google Workspace and Apps Script.

Gemini models are Google’s most powerful AI systems, and the API provides various model variants – such as text-focused and vision-oriented versions – each with specific capabilities and limits. You can explore them visually in Google AI Studio, a hosted interface for trying prompts, tweaking model settings and even tuning custom models without writing code.

To start using the API, you request an API key via Google AI Studio or another supported console, then test it with a simple REST call. For example, you can export your key into an environment variable like GOOGLE_API_KEY and invoke the endpoint that lists available models, receiving JSON such as models/gemini-1.0-pro if everything is configured correctly.

From there, generating content is a matter of POSTing a JSON payload to the appropriate endpoint, such as the generateContent method for a chosen model. A minimal request includes a contents field with text parts, while optional generationConfig and safetySettings let you control parameters like temperature and safety filters.

Calling the Gemini API from Apps Script

One of the most powerful patterns in the Gemini toolbox is combining the API with Google Apps Script to automate workflows inside Workspace. This approach lets you orchestrate Gemini alongside services like Drive, Calendar, Gmail, Sheets and Slides without building a full backend.

The standard setup begins with an Apps Script project (for example, created via script.new) where you store your Gemini API key as a script property. In code, you retrieve that value and construct an endpoint URL for a specific model, often gemini-1.0-pro-latest:generateContent with your API key passed as a query parameter.

A helper function such as callGemini(prompt, temperature) typically builds a JSON payload, sends it via UrlFetchApp.fetch and parses the response to extract the generated text. This wrapper simplifies repeated use of the API from different utilities in your script.

Testing is straightforward: you can create a testGemini() function that defines a prompt, calls your helper and logs both the input and output to the execution logs. Once that works, you know your Apps Script environment and Gemini API key are wired up correctly for more advanced scenarios.

Using the Gemini Vision endpoint for images

The Gemini toolbox goes beyond text thanks to multimodal support, especially the ability to process images through a vision-enabled endpoint. In Apps Script, this is usually a separate endpoint such as gemini-1.0-pro-vision-latest:generateContent, again parameterized by your API key.

A typical helper like callGeminiProVision(prompt, image, temperature) will convert an image blob into base64, embed it as inlineData with the appropriate MIME type and send it together with a textual prompt. The model then returns text that reflects its understanding of both the image and the prompt.

To verify the setup, you might write a small testGeminiVision() that downloads a sample image from a public URL, passes it to your helper and logs a fun fact or analysis produced by Gemini Vision. This kind of test demonstrates that multimodal input is working correctly in your environment.

Once the vision flow is stable, you can reuse it inside higher-level automations, such as analysing charts from Google Sheets or images stored in Drive. That’s where multimodality starts to feel like a genuinely useful part of the toolbox rather than a demo trick.

Function calling: giving Gemini access to tools

Another key element of the Gemini toolbox is function calling, which lets the model decide when to invoke your own tools or APIs. Instead of just generating text, Gemini can return structured functionCall objects that describe which function to use and with what arguments.

In Apps Script, you can set up a helper such as callGeminiWithTools(prompt, tools, temperature) that sends a tools specification along with the user prompt. This specification follows a FunctionDeclaration schema, where you describe the function’s name, purpose and JSON parameters.

When Gemini decides a tool should be used, its response includes a function call object that you can parse in your script and route to the actual implementation. You might, for instance, define a stub tool named “datetime” that returns the current date and time, and watch how Gemini requests that function to solve questions related to calendar calculations.

Function calling is especially powerful because it can operate across multiple turns, not just single-shot requests. That means you can design more complex, conversational agents that decide when to call tools, interpret the results and continue the dialogue.

Demo integrations: Gemini + Google Workspace as a practical toolbox

Once you combine text generation, vision input and function calling, the Gemini toolbox becomes a practical engine for Workspace automations. Google’s codelab material outlines several concrete examples that illustrate what’s possible.

At a high level, incoming user queries are passed to Gemini with a set of available tools representing different workflows: meeting scheduling, email drafting from charts, and slide deck creation. Based on the query, Gemini chooses the right function and returns a function call with structured arguments such as times, filenames or topics.

In your Apps Script, you then interpret the function call inside an if…else chain, invoking the appropriate workflow – for example, setupMeeting(), draftEmail() or createDeck(). This combination of model reasoning and explicit script logic is what turns Gemini from a chat window into a toolbox for real work.

Automating meetings: summarizing Drive files into Calendar events

One demo shows how Gemini can help set up a Calendar meeting that automatically includes a summary of a text file hosted in Google Drive. The user might type something like: “Set up a meeting at 10AM tomorrow with Helen to discuss the news in the Gemini-blog.txt file.”

Behind the scenes, a Workspace tool named “setupMeeting” is declared in the tools spec, with parameters for time, recipient and filename. When Gemini interprets the query, it chooses this tool and returns a function call with those arguments filled in.

The corresponding setupMeeting() function then finds the specified file in Drive, reads its content and passes it to Gemini via callGemini() with instructions to produce a short JSON object containing a title and a brief summary. The response may come back wrapped in formatting fences that you strip before parsing as JSON.

Using the extracted title and summary, the script creates a Calendar event using CalendarApp, sets the description to the summary and attaches the source file via the advanced Calendar service. The result is a scheduled meeting with context baked in, all triggered by a single, natural-language request.

Drafting emails from Sheets charts with Gemini Vision

Another workflow in the Gemini toolbox involves analysing a chart in Google Sheets and drafting a Gmail message based on it. Imagine you keep a spreadsheet of college expenses and want an email that summarizes what the chart shows for a colleague named Mary.

The user query might say: “Draft an email for Mary with insights from the chart in the CollegeExpenses sheet.” A tool called “draftEmail” is defined to accept a sheet_name and recipient, and Gemini chooses that tool when it sees this type of request.

The draftEmail() function locates the requested spreadsheet in Drive, opens the relevant sheet, retrieves its first chart and saves that chart as a file (for instance, ExpenseChart.png). It then builds a prompt instructing Gemini to use only information in the chart, avoid historical comparisons and keep the message concise.

By calling callGeminiProVision(prompt, expenseChart), the script sends both the prompt and the chart image to Gemini Vision, which returns a tailored email body. Finally, the script creates a Gmail draft addressed to the recipient’s email, sets a subject like “College expenses” and attaches the chart image.

This pattern effectively turns Gemini into an analyst that can read a chart, extract the key story and phrase it in natural language on your behalf. You still review and adjust the draft, but most of the heavy lifting is done automatically.

Building slide decks automatically with Gemini and Google Slides

The third major demo workflow in this toolbox automatically builds a skeletal Google Slides presentation on a user-specified topic. For example, you might ask: “Help me put together a deck about water conservation.”

A tool called “createDeck” is declared with a single parameter, topic, and Gemini is instructed to return structured JSON describing a series of slides. The prompt tells Gemini how many slides to create (based on a constant like NUM_SLIDES), requests short titles and bullet points, and explicitly asks for a valid JSON object so the script can parse it safely.

After calling callGemini() with that prompt, the script removes any formatting fences, parses the JSON and then uses SlidesApp to generate a new presentation. The first slide is treated as the title page, and subsequent slides follow a TITLE_AND_BODY layout where the script populates the title and bullet text.

Within a few seconds, you get a basic deck with structured talking points per slide, ready for you to customize visually. While the output is intentionally minimal, this workflow shows how Gemini can jumpstart content structure so you can focus on design and nuance.

Expanding the toolbox: chatbots, RAG and multi-turn tools

The examples above are only a starting point; the broader Gemini toolbox can be expanded into many directions once you’re comfortable with the API and function calling. Google explicitly suggests several avenues for exploration.

One popular use case is building chatbots for Google Chat using the Gemini API. Here, the same patterns apply: you expose tools, let Gemini decide when to call them and connect the responses back into a conversational interface inside Chat, all governed by the Chat API and associated codelabs.

Another major direction is retrieval-augmented generation (RAG) on top of private content in Drive or Keep. Instead of summarizing a single text file, you can combine the Gemini API with a vector database and, optionally, an orchestration framework like LangChain to fetch relevant snippets from PDFs, images and notes before asking Gemini to generate a response grounded in those documents.

Multi-turn function calling also unlocks more sophisticated agents that can iteratively decide which tools to use and in what sequence. Rather than a single decision, an agent can call a function, examine the result, then call another function or ask a clarifying question, all within one ongoing thread.

Finally, there’s no requirement to stay inside Workspace; once you master the Gemini API patterns, you can hook the model into external APIs across the wider web. That’s how Gemini transitions from a contained corporate assistant to a general-purpose orchestrator of digital work.

Put together, these pieces – stable Tools, experimental Labs, tutoring features, enterprise agents and the developer API – form a genuinely rich Gemini toolbox that can adapt to both casual learners and power users. If you treat Gemini less like a single app and more like a growing set of instruments you can compose, you’ll be in a strong position to take advantage of whatever Google adds next without having to rethink your entire workflow each time.

que son los modelos de lenguaje
Related article:
What Are Language Models and How Do LLMs Really Work