Prospecting Agent — Autonomous B2B Lead Generation

The Problem Space

Freelance Prospecting Doesn't Scale

Finding clients as a solo freelancer means spending hours on job boards, reading irrelevant postings, and sending generic cold emails that get ignored. This agent automates the entire funnel.

Time Sink

Manually scanning Upwork, LinkedIn, Indeed and remote boards eats 2-3 hours/day that should be spent on billable work. The agent runs 3x/day while you sleep.

Signal vs. Noise

90% of job posts are irrelevant to your niche. The agent uses Gemini AI to score each lead 1-10 against your portfolio and discards anything below 4.

Cold ≠ Generic

Generic outreach converts at <1%. The agent extracts pain points from each job post and drafts emails with portfolio proof tailored to their needs.

System Design

5-Stage Autonomous Pipeline

Each run is orchestrated by GitHub Actions (cron 3x/day). Data flows from search APIs through AI qualification to human-approved outreach.

1

Sourcing

27 queries across Upwork, LinkedIn, WeWorkRemotely and Indeed via Serper.dev Google Search API.

~138 leads/run

2

Enrichment

Jina Reader converts each URL to clean markdown (~4,000 chars). Company websites scraped for email addresses.

95% success rate

3

AI Qualification

Gemini 2.5 Flash scores fit 1-10, extracts pain points, suggests outreach angles, identifies contacts.

~1-2s per lead

4

Outreach

Jinja2 templates draft personalized emails using extracted pain points and portfolio proof. Routed to Telegram.

5 outreach angles

5

Dashboard

Streamlit real-time analytics: KPI funnel, leads explorer, email queue with approve/reject, source performance.

4 dashboard pages

Data Model

PostgreSQL on Supabase

Four tables track every lead from discovery to email delivery, with a full audit trail for compliance and analytics.

R

raw_leads

Source, URL (unique), raw JSON, timestamp. Dedup gate for the pipeline.

Q

qualified_leads

fit_score, pain_point, contact info, company website. Gemini structured output.

E

email_queue

Subject, body, status (pending/approved/sent/rejected), Telegram message ID.

A

hitl_audit_log

Every approve/edit/reject action with operator notes and timestamp.

prospecting-agent/ ● tree

prospecting-agent/
├── .github/workflows/
│   ├── vertical1_scraper.yml   # cron: 3x/day
│   └── vertical2_scraper.yml
├── services/
│   ├── vertical1_tech/src/
│   │       ├── main.py             # orchestrator
│   │       ├── qualifier.py        # Gemini scoring
│   │       ├── email_drafter.py    # Jinja2 templates
│   │       └── scrapers/
│   ├── vertical2_cerrieta/src/
│   │       ├── main.py
│   │       └── scrapers/
│   │           ├── serper_search.py  # Instagram
│   │           └── gmaps_scraper.py  # Places API
│   └── hitl_gateway/src/
│           ├── main.py             # FastAPI
│           ├── telegram_bot.py     # inline keyboards
│           └── email_sender.py     # Brevo SMTP
├── shared/prompts/ + utils/
├── dashboard/
│   ├── app.py + pages/ (4 views)
│   └── utils/
└── supabase/migrations/

AI Qualification Engine

Gemini 2.5 Flash-Lite reads the full job description (enriched via Jina Reader) and returns structured JSON with a fit score, pain point analysis, and suggested outreach angle.

Fit Score 1-10

Granular scoring against your portfolio enables prioritization. A score-9 "financial modeling in Python" lead gets attention before a score-5 generic "data analyst" role.

Pain Point Extraction

The LLM identifies the client's core pain point from the job description and matches it to a specific project in your portfolio as proof of capability.

5 Outreach Angles

Each lead gets a suggested angle: ROI-focused, Time-saving, Technical architecture, Risk-reduction, or Revenue-uplift for maximum relevance.

Contact Extraction

Gemini identifies hiring manager names from the full job description. Company website scraping discovers real email addresses via mailto: links and regex patterns.

qualification_result.json ● Gemini Output

{
  "fit_score": 9,
  "reasoning": "Client needs Monte Carlo
    simulation for portfolio risk. Direct
    match with NVIDIA project (GARCH +
    VaR + DCF). High budget signals.",

  "pain_point": "Quantify downside risk
    for a $50M equity portfolio before
    Q3 board presentation.",

  "portfolio_proof": "NVIDIA project:
    GARCH(1,1) + 10K Monte Carlo paths
    + VaR 95% + FCFF/DCF valuation.",

  "suggested_angle": "Risk-reduction",

  "contact_name": "Sarah Chen",
  "company_website": "meridiancp.com",
  "budget_estimate": "$5,000-$15,000"
}

Quality Control

Human-in-the-Loop via Telegram

Cold outreach demands human judgment. Every email draft is sent to Telegram with inline buttons for instant approval, editing, or rejection — no context switching required.

Approve

One tap sends the email via Brevo SMTP. Status updated to sent with delivery timestamp.

Edit

Send natural-language instructions in Telegram. Gemini re-drafts the email following your direction, then re-sends for approval.

Reject

Lead archived with operator note. Full audit trail in hitl_audit_log for analytics and pattern detection.

See the Live Dashboard

Live Analytics

Real-Time Pipeline Dashboard

Four pages of live analytics powered by Streamlit — KPI funnel, leads explorer, email queue with HITL actions, and source performance breakdown.

Page 1

Overview & KPIs

Page 2

Leads Explorer

Page 3

Email Queue

Page 4

Source Analytics

prospeccionagente.streamlit.app

Live

Open Full Dashboard

Infrastructure

The $0.50/month Stack

Every component runs on free-tier infrastructure. The only cost is Gemini API usage at ~$0.50/month for thousands of lead qualifications.

Component	Service	Purpose	Cost
Orchestration	GitHub Actions	Cron jobs (3x/day)	Free
Search API	Serper.dev	Google Search queries	Free
Enrichment	Jina Reader	URL to markdown	Free
LLM	Gemini 2.5 Flash	Qualification + scoring	~$0.50
Database	Supabase	PostgreSQL + REST API	Free
HITL Gateway	Cloud Run	Telegram webhook + API	Free
Email	Brevo SMTP	Transactional delivery	Free
Dashboard	Streamlit Cloud	Real-time analytics	Free
Bot	Telegram API	Approval notifications	Free
Total Monthly Cost			~$0.50

Results

Performance Metrics

~138

Leads per Run

Across 4 job boards

95%

Enrichment Rate

Jina Reader success

~30s

Pipeline Duration

Full end-to-end run

40%

Contact Name Found

From job descriptions

Design Decisions

Every architectural choice was driven by two constraints: zero cost and zero maintenance.

Serper.dev over direct scraping —

Google already indexed job boards. No IP blocking, no Selenium, no maintenance overhead.

Gemini over GPT-4 —

Free tier for prototyping. Flash-Lite is fast and cheap for structured extraction at $0.50/month.

fit_score 1-10 over binary —

Granular scoring enables prioritization. Score 9 financial models get attention before score 5 generic jobs.

Telegram HITL over auto-send —

Cold outreach needs human judgment. Telegram is instant and mobile-friendly with zero switching cost.

A/B keyword pools —

Rotates keywords each run (pool A on even, pool B on odd) to maximize coverage within free-tier API limits.

Async everywhere —

asyncio + httpx + aiosmtplib for maximum throughput on free-tier rate limits.

Best-effort enrichment —

Jina/scraping failures never block the pipeline. Graceful degradation to snippet data keeps throughput stable.