Nemo Nemo
Guide

5 LLM Providers, Your Choice: How to Pick the Best AI Model for Nemo

A detailed comparison of every LLM provider Nemo supports, with cost breakdowns, skill-specific recommendations, and a guide to smart routing that cuts your AI costs in half.

By the Nemo Team | | 13 min read

Table of Contents

  1. Why Model Choice Matters
  2. Anthropic (Claude)
  3. OpenAI (GPT-4)
  4. Ollama (Fully Local)
  5. OpenRouter
  6. Custom Endpoints
  7. Smart Routing Explained
  8. Cost Comparison Table
  9. Which Provider for Which Skill
  10. How to Switch Providers
  11. Token Usage Tracking
  12. Our Recommendation
  13. Frequently Asked Questions

Why Model Choice Matters

Most AI applications lock you into a single model provider. ChatGPT uses OpenAI. Copilot uses OpenAI. Gemini uses Google. You get whatever model the company chose, at whatever price they set, with whatever privacy policy they enforce. If the model is too expensive, too slow, or sends your data to servers you do not trust, your only option is to switch to an entirely different product.

Nemo takes a fundamentally different approach. It is a model-agnostic AI agent that supports five LLM providers out of the box. You choose which model powers your agent. You can change it at any time. You can even use different providers for different tasks simultaneously. This is not a theoretical feature — it is a practical necessity because different models genuinely excel at different things.

Claude is exceptional at nuanced writing and safety-conscious reasoning. GPT-4 has the broadest general capabilities and strong vision support. Ollama models run entirely on your hardware for complete privacy. OpenRouter gives you access to over 100 models, including free tiers for experimentation. And custom endpoints let enterprises use their own self-hosted models behind corporate firewalls.

The model you choose affects three things: quality (how well the agent performs tasks), cost (how much you pay per task), and privacy (where your data goes). This guide helps you make an informed decision across all three dimensions.

Anthropic (Claude)

Anthropic's Claude is our recommended model for users who prioritize quality. Claude is known for nuanced, thoughtful responses that follow instructions precisely. It excels at understanding complex multi-step tasks, producing well-structured output, and behaving predictably with safety-related instructions.

Strengths

Setup

To use Claude with Nemo, you need an Anthropic API key. Sign up at console.anthropic.com, create an API key, and enter it in Nemo's Settings under LLM Provider. The key is stored in Nemo's encrypted vault — never in plain text, never sent anywhere except Anthropic's API endpoint.

Best for

Email triage and composition, document writing, complex multi-step tasks that require precise instruction following, any task where output quality matters more than cost.

OpenAI (GPT-4)

OpenAI's GPT-4 family is the most widely used LLM ecosystem. GPT-4 and its variants (GPT-4 Turbo, GPT-4o) offer broad capabilities across virtually every task category. If you are already paying for an OpenAI API subscription, using GPT-4 with Nemo is a natural choice.

Strengths

Setup

Sign up at platform.openai.com, generate an API key, and enter it in Nemo's Settings. OpenAI requires a paid account with API credits. There is no permanent free tier for API access, though new accounts typically receive a small starter credit.

Best for

General-purpose tasks, desktop automation with screenshot analysis (vision), users already in the OpenAI ecosystem, tasks that benefit from the latest model updates.

Ollama (Fully Local)

Ollama is an open-source tool that lets you run large language models entirely on your local hardware. No API keys, no cloud services, no usage fees, no data leaving your machine. It is the only provider option that makes Nemo truly cost-free and completely private.

How it works

Ollama downloads and manages open-source models on your computer. It provides a local API endpoint (typically http://localhost:11434) that is compatible with the same interface Nemo uses for cloud providers. From Nemo's perspective, Ollama is just another LLM provider — the fact that it is running on your local GPU or CPU is transparent.

Recommended models

Performance on consumer hardware

Running LLMs locally is more practical than most people expect. On a modern laptop with 16GB RAM, Llama 3 8B generates about 20–30 tokens per second on CPU. With a discrete GPU (even a mid-range NVIDIA RTX 3060), speeds jump to 40–80 tokens per second. For comparison, cloud APIs typically deliver 30–60 tokens per second. The experience is comparable for most tasks.

The main limitation is the initial model download (4–40GB depending on the model) and the first-load time (10–30 seconds to load the model into memory). Once loaded, the model stays in memory and subsequent queries are fast. Ollama handles model management, caching, and memory optimization automatically.

The privacy advantage

When you use Ollama, your data never leaves your computer. Not for inference, not for safety screening (Sentinel already runs locally), not for anything. Your emails, documents, form data, and browsing activity stay on your hardware. No API provider sees your data. No cloud service stores your queries. This is the strongest privacy guarantee any AI system can offer.

Setup

Download Ollama from ollama.com, install it, and run ollama pull llama3 to download your first model. In Nemo's Settings, select Ollama as your provider and enter the local URL (http://localhost:11434). No API key needed.

Best for

Privacy-focused users, offline operation, zero-cost AI, users with decent hardware (16GB+ RAM), document summarization, desktop automation, any task where data sensitivity is paramount.

OpenRouter

OpenRouter is an API aggregator that provides access to over 100 models from multiple providers through a single, unified API. It uses the OpenAI-compatible API format, making it seamless to use with Nemo. OpenRouter is the best option for users who want to experiment with different models or access free-tier models without signing up for multiple providers.

How it works

OpenRouter acts as a proxy between Nemo and various model providers. You send your request to OpenRouter, they route it to the appropriate model provider, and return the response. The API format is identical to OpenAI's, so Nemo's existing integration works without modification. OpenRouter requires two custom headers (HTTP-Referer and X-Title) for attribution, which Nemo sends automatically.

Free tier models

OpenRouter hosts several models with free tiers, including variants of Llama, Mistral, and other open-source models. These free tiers have rate limits (typically 10–20 requests per minute) but no per-token costs. Nemo can discover available free models automatically using the OpenRouter API, so you always know what is available without checking the website.

Best for

Experimentation with multiple models, budget-conscious users who want to find the cheapest capable model, accessing newer or niche models that are not available directly through major providers, fallback provider when your primary provider is down.

Custom Endpoints

For enterprises and advanced users who host their own models, Nemo supports custom API endpoints. Any server that exposes an OpenAI-compatible chat completions API can be used as a provider. This includes self-hosted deployments of vLLM, text-generation-inference, LocalAI, and LiteLLM.

Use cases

Setup

In Nemo's Settings, select Custom as your provider. Enter the base URL of your OpenAI-compatible endpoint (e.g., http://your-server:8000/v1). Enter any required API key or leave blank if your endpoint does not require authentication. Nemo will test the connection and confirm that the endpoint responds correctly.

Smart Routing Explained

Using a top-tier model like Claude or GPT-4 for every task is like driving a Ferrari to the grocery store. It works, but it is unnecessarily expensive. Many tasks that Nemo performs — reading a file listing, categorizing a simple email, extracting text from a document — do not require the reasoning capabilities of a frontier model. A smaller, cheaper model handles them just as well.

Smart routing is Nemo's automatic model selection system. When you configure it (in Settings under LLM Provider > Routing), you specify a primary model and a secondary model. Nemo then analyzes the complexity of each task and routes it to the appropriate model:

The complexity classification uses a lightweight analysis of the task description, the number of tools available, and the expected number of tool calls. It does not require a separate LLM call — the classification itself is rule-based and adds no latency.

Smart routing also supports a judge model for the Sentinel safety layer. Instead of using your expensive primary model to run safety checks, you can assign a cheaper model specifically for Sentinel's screening decisions. Since safety screening is a simpler classification task (safe vs. unsafe, PII vs. no PII), a smaller model handles it effectively. This can reduce the total cost of Sentinel's overhead to near zero.

Typical savings

In our testing, smart routing reduces LLM costs by 40–60% compared to using a single frontier model for everything. For a user spending $10/month on Claude, enabling smart routing with Ollama as the secondary model can bring costs down to $4–6/month with no noticeable quality difference in task completion.

Cost Comparison Table

Here is what each provider costs for typical Nemo usage patterns, based on February 2026 pricing:

Provider Model Input (per 1M tokens) Output (per 1M tokens) Est. Light Use/mo Est. Heavy Use/mo
Anthropic Claude 3.5 Sonnet $3.00 $15.00 $3–5 $10–15
Anthropic Claude 3.5 Haiku $0.80 $4.00 $1–2 $3–5
OpenAI GPT-4o $2.50 $10.00 $2–4 $8–12
OpenAI GPT-4o mini $0.15 $0.60 <$1 $1–2
Ollama Llama 3 / Mistral / Phi Free Free $0 $0
OpenRouter Free tier models Free Free $0 $0 (rate limited)
OpenRouter Paid models Varies Varies $1–5 $5–15
Custom Self-hosted Hardware cost Hardware cost Varies Varies

Light use: 5–10 tasks per day. Heavy use: 30+ tasks per day. Estimates based on average token usage per task across Nemo's skill categories.

Which Provider for Which Skill

Not all skills have the same requirements. Here are our tested recommendations for matching providers to Nemo's most popular skills:

Email triage — Claude (recommended)

Email triage requires understanding context, tone, urgency, and relationships between senders. Claude excels at this nuanced classification. It correctly identifies passive-aggressive emails, distinguishes between "FYI" and "action required" messages, and understands organizational hierarchy cues. GPT-4 performs nearly as well. Ollama models (Llama 3 8B) handle basic urgent/not-urgent classification but struggle with subtle priority distinctions.

Email composition — Claude (recommended)

Composing emails that sound natural and match the appropriate tone is Claude's strength. It produces replies that sound like a human wrote them, matching formality level to the original thread. GPT-4 is a close second. Local models tend to produce slightly stilted or overly formal emails, though Llama 3 70B is competitive with cloud models.

Document summarization — GPT-4 or Ollama

Summarization is a task where local models shine. Llama 3 8B produces excellent summaries of most documents, and since the task does not require external API access, it can run fully offline. GPT-4 produces slightly more polished summaries with better structural organization. For most users, Ollama is the best choice here because summarized documents often contain sensitive content that benefits from local processing.

Form filling — Claude (recommended)

Form filling is one of Nemo's most complex skills, involving multi-step reasoning, field matching, profile data lookup, and derived value computation (extracting birth month from a date of birth, for example). Claude handles these reasoning chains most reliably. GPT-4 is a good alternative. Smaller local models struggle with the multi-step reasoning required for complex forms but work well for simple forms with obvious field mappings.

Desktop automation — GPT-4 (recommended)

Desktop automation benefits from GPT-4's vision capabilities when screenshot analysis is involved. The agent can look at what is on screen, identify UI elements, and plan its actions accordingly. Claude handles text-based desktop automation well but lacks native vision. Ollama models work for simple, scripted desktop tasks where the agent does not need to interpret visual content.

Coding and development tasks — Claude or GPT-4

Both Claude and GPT-4 excel at code generation, debugging, and development workflow automation. For local code assistance, CodeLlama via Ollama is a strong option. The choice between Claude and GPT-4 for coding is largely a matter of personal preference — both produce high-quality code with good explanations.

How to Switch Providers

Switching your LLM provider in Nemo takes about 30 seconds:

  1. Open Nemo and navigate to Settings (gear icon in the sidebar).
  2. Find the LLM Provider section.
  3. Select your desired provider from the dropdown (Anthropic, OpenAI, Ollama, OpenRouter, or Custom).
  4. Enter your API key (not needed for Ollama) or endpoint URL (for Custom).
  5. Select the specific model you want to use from the model dropdown.
  6. Click Test Connection to verify the provider responds correctly.
  7. Click Save. All subsequent tasks will use the new provider.

Your API keys are stored in Nemo's encrypted vault, which uses AES-256 encryption. Keys are never stored in plain text, never logged, and never sent anywhere except the specific provider's API endpoint during requests. You can view, update, or delete stored keys from the vault at any time.

Switching providers does not affect your task history, skill configurations, or any other settings. The agent's behavior is the same regardless of provider — only the underlying model intelligence changes.

Token Usage Tracking

Nemo tracks every token consumed across all your tasks, providing real-time visibility into your AI costs. This is particularly important for pay-per-token providers like Anthropic and OpenAI, where costs can accumulate without clear visibility.

Real-time header pill

The Nemo header bar displays a small pill showing your current session's token usage and estimated cost. Updated after every LLM call, it shows both input and output tokens consumed along with the calculated cost based on your provider's pricing. You can click it for a detailed breakdown by task.

Task-level tracking

Every completed task records its total token usage and cost. This data appears in the History view, where you can see a Tokens column alongside each task's name, status, and timestamp. You can sort by token usage to identify which tasks are the most expensive and optimize your workflow accordingly.

Session-level tracking

Nemo maintains a running session total that accumulates across all tasks until you reset it. This gives you a clear picture of your daily or weekly AI spending. You can reset the counter at any time from the header pill menu.

Provider-specific normalization

Different providers report usage differently. Anthropic uses input_tokens and output_tokens. OpenAI uses a similar format. Ollama does not return usage data at all, so Nemo estimates it from response length. OpenRouter forwards the underlying provider's usage data. Nemo normalizes all of these into a consistent format so your tracking is accurate regardless of which provider you are using.

Our Recommendation

After extensive testing across all of Nemo's skill categories, here is our recommendation for three types of users:

For maximum quality: Anthropic Claude

If you want the best possible task completion quality and are willing to pay $5–15/month in API costs, Claude 3.5 Sonnet is our top recommendation. It handles Nemo's full skill catalog with the highest reliability, produces the most natural language output, and follows complex multi-step instructions most faithfully. Enable smart routing with Claude 3.5 Haiku as the secondary model to reduce costs by 40% without noticeable quality loss.

For maximum privacy: Ollama

If your data sensitivity requirements are high — medical documents, financial records, legal contracts, personal correspondence — Ollama with Llama 3 8B is the clear choice. Your data never leaves your hardware. There are zero API costs. Performance is good on 16GB+ machines. You sacrifice some quality on the most complex tasks compared to frontier cloud models, but for the majority of everyday tasks, the difference is minimal.

For budget-conscious users: OpenRouter free tier + Ollama

If you want to use Nemo at zero cost, combine OpenRouter's free-tier models (for tasks that need internet access like email) with Ollama (for local tasks like document summarization and desktop automation). Smart routing can manage this automatically. You get capable AI automation for $0/month, though you will hit rate limits during heavy use of OpenRouter's free tier.

The best model is the one that matches your priorities. Nemo gives you the freedom to choose — and the intelligence to help you choose wisely through smart routing.

Your AI. Your model. Your choice.

5 LLM providers. Smart routing. Free with Ollama. Download Nemo and pick your model.

Download Nemo Free for Windows

Windows 10+ · macOS coming soon · No credit card required

Frequently Asked Questions

Which LLM provider is best for Nemo?
The best LLM provider for Nemo depends on your priorities. Anthropic Claude is the best overall choice for quality, nuanced reasoning, and safety-aware responses, especially for email and writing tasks. OpenAI GPT-4 offers the broadest capabilities including vision. Ollama is the best for privacy since models run entirely on your hardware with zero cloud dependency. OpenRouter gives you access to 100+ models and often has free tiers for experimentation. For most users, we recommend starting with Claude for quality or Ollama for privacy.
Can I use Nemo completely free with Ollama?
Yes. Ollama is a completely free, open-source local LLM runner. You can download and run models like Llama 3, Mistral, and Phi on your own hardware at zero cost. Combined with Nemo's free core features, this gives you a fully functional AI agent with no API costs, no subscriptions, and no cloud dependency. The only cost is your electricity. Performance depends on your hardware: a machine with 16GB RAM can run 7B parameter models comfortably, while 32GB or more is recommended for larger models.
How much does it cost to run Nemo with Claude?
Running Nemo with Anthropic Claude costs approximately $3–15 per month for typical personal use. Claude's pricing is based on tokens: roughly $3 per million input tokens and $15 per million output tokens for Claude 3.5 Sonnet. A typical email triage task uses about 2,000–5,000 tokens. A document summary uses 3,000–10,000 tokens. Light use (5–10 tasks per day) costs around $3–5 per month. Heavy use (30+ tasks per day) can reach $10–15 per month. Nemo's smart routing feature can reduce costs by automatically using cheaper models for simple tasks.
What is smart routing?
Smart routing is Nemo's automatic model selection feature. Instead of always using your most powerful (and expensive) model for every task, Nemo analyzes the complexity of each task and routes it to the most cost-effective model capable of handling it. Simple tasks like reading a file listing or categorizing an email might be routed to a smaller, cheaper model. Complex tasks like multi-step form filling or nuanced email composition are routed to the full-capability model. Smart routing can reduce LLM costs by 40–60% without noticeable quality degradation for most workflows.
Can I use multiple providers at once?
Yes. Nemo supports configuring multiple LLM providers simultaneously. You can set Claude as your primary model for complex tasks, Ollama as your local fallback for offline use, and a cheaper OpenRouter model for simple operations. Smart routing can automatically distribute tasks across providers based on complexity and cost. You can also manually override the provider for any specific skill in Nemo's settings. API keys for all providers are stored securely in Nemo's encrypted vault.