How we programmed our latest hire

Writing an executive assistant from scratch

Mar 31, 2026

I’ve been thinking about what makes AI useful versus what makes it feel like a toy.

Most AI tools are stateless. You open a window, ask something, get an answer, close it. Nothing persists. Nothing happens on its own. The AI waits for you to show up, which means it only helps when you remember to use it. ChatGPT, Claude, Gemini: all brilliant in a conversation. All useless the moment you close the tab.

We wanted something different. An AI that shows up every morning, reads the inbox, checks the calendar, tracks open deals, monitors engineering PRs, and follows up on things you’d otherwise drop. Not a chatbot. An executive assistant.

Here’s how we built it.

The model layer: LLMs as reasoning engines

The core of the system is a large language model. We use ChatGPT as the primary LLM. Specifically the API, not the chat interface. ChatGPT handles reading, summarizing, drafting, and deciding what needs attention. For some tasks we’ve experimented with other models as a fallback or for comparison, but ChatGPT seems to be easier to train to have the right personality.

But we’re not using LLMs as chatbots. We’re using them as reasoning engines that get called by code, receive structured context, and return structured actions. The difference between “AI tool” and “AI agent” is mostly this: who initiates the interaction.

Start with identity

The first decision, and the one most people skip, is giving the assistant a real identity.

Ours has a name, a persona, and its own email address. When it sends an email, it comes from a real mailbox with a proper signature, threaded replies, and HTML formatting. Recipients don’t get a wall of monospace text from `noreply@`. They get a professional email that looks like it came from someone on the team.

This matters more than it sounds. When the assistant has an identity, it changes how you think about it. You’re not prompting a tool. You’re delegating to someone. And it changes how recipients interact with it too.

Practically, this means:

A dedicated account. A real email address in Google Workspace.
OAuth credentials. So it can read and send email on that account.
An HTML signature. stored as a file the assistant loads at send time.
Proper reply threading. Setting `In-Reply-To` and `References` headers, passing the `threadId` back to the Gmail API so replies land in the right conversation.

Write an operating manual, not a prompt

The assistant’s behavior is defined in a file called `AGENTS.md`. Think of it less as a system prompt and more as an employee handbook.

It covers the assistant’s role, tone, how to prioritize competing demands, when to act autonomously and when to check first, and how to handle specific situations: a cold email from an investor, a bug report from a customer, a PR that’s been open for a week.

A key insight here: a good prompt tells an AI what to do today. A good operating manual tells it how to think, so it can handle situations you didn’t anticipate. This is what separates a useful AI agent from a glorified macro.

The manual evolves. Every time the assistant handles something poorly, or you find yourself explaining the same preference twice, that’s a signal to update it. I treat it like onboarding documentation I’d write for a new hire.

Organize work into lanes

An executive assistant can’t just have a to-do list. The work is too varied.

We split responsibilities into five lanes: current customers, marketing and go-to-market, fundraising, engineering, and operations. Each lane has its own directory in the repository with a tracking file. The assistant maintains state per lane: open tasks, recent decisions, things to follow up on.

This prevents everything from collapsing into one undifferentiated pile where everything feels equally urgent. It also makes routing easy. An email from a potential customer goes to the customers lane. A GitHub notification goes to engineering. The assistant knows where things belong.

This lane structure is also what makes the system composable. You can add a new lane (legal, HR, investor relations) without touching anything else.

Routines and reactive tasks

The assistant does two kinds of work: scheduled and reactive.

Scheduled routines run via cron. The most important one is the morning briefing. Every day, the assistant reads the inbox, checks the calendar, scans each lane’s tracking file, and sends a priorities email. It surfaces what’s time-sensitive, what’s been waiting too long, and what on the calendar needs preparation. Other routines run weekly: a review of open tasks per lane, a check for stale GitHub issues, a follow-up sweep for emails that didn’t get a response.

Reactive tasks happen in response to events. A new email arrives, a PR is opened, a message comes in on Telegram. The assistant processes these and either handles them directly or flags them for attention.

Routines give structure. Reactive tasks give responsiveness. Together they cover most of what an executive assistant actually does. No n8n, no Zapier, no low-code automation platform. Just Python, cron, and an LLM API.

Email: the full stack

Email is the hardest integration to get right. Reading is easy. Everything else takes work.

We built a Gmail integration that handles the full lifecycle:

Reading. Checking the inbox, filtering by sender, fetching full message bodies.
Sending. Plain text for simple cases, HTML with signature for external emails.
Replying. Proper threading with the original quoted below: reply, then signature, then original, exactly like a human would format it.
Archiving. Removing the INBOX label once something is handled.

For real-time responsiveness, we replaced polling with Gmail’s push notification system. Gmail publishes a notification to a Google Cloud Pub/Sub topic whenever the inbox changes. The assistant pulls from that subscription, gets the `historyId`, calls the Gmail History API to find out what actually changed, then fetches the new messages. The result is that it reads to email in minutes instead of having to run a continuous poling cycle (which will get it blocked), or wait for hours to read the email.

The infrastructure for this is straightforward. A Pub/Sub topic and subscription in GCP, IAM grants so Gmail can publish and the assistant’s account can pull, and a state file that persists the last known `historyId` across restarts.

GitHub instead of spreadsheets

Most engineering teams already live in GitHub. Work is tracked in issues. Code changes go through pull requests. Discussions happen in comments.

Rather than maintaining a separate tracker, we connected the assistant directly to GitHub. It monitors open PRs, flags ones waiting for review, checks issue status, and can open or comment on issues as needed.

The practical benefit: zero duplication. There’s no spreadsheet to keep in sync with reality. The assistant reads the ground truth directly. For a founder or CTO, this means the morning briefing can include “three PRs are waiting for your review” and “this issue has been open for two weeks,” without anyone having to compile that information manually.

Calendar and time awareness

The assistant connects to Google Calendar and reads the day’s schedule as part of its morning routine. This sounds simple but it changes what the briefing can do.

With calendar access, the assistant can flag back-to-back meetings that leave no time for focus work, notice that a meeting has no agenda, and cross-reference email with calendar. If someone emailed about a topic and there’s a call with them today, it surfaces that connection.

It also gives the assistant temporal grounding. It knows what day it is, what’s coming up this week, and when deadlines are approaching.

Other channels

Telegram gives a lightweight async channel for quick back-and-forth. Rather than opening a full coding environment to interact with the assistant, you can message it from your phone and get a response.

Text-to-speech lets the assistant deliver voice briefings. Instead of reading the morning priorities email, you can listen to it. Useful when moving between meetings.

The principle behind both: meet the human where they are. The assistant should be reachable in the way that’s most natural for the moment. Just like a real world assistant can sometimes make a call or leave a voice note.

Running on real infrastructure

An executive assistant that only runs on your laptop isn’t very useful. We run ours on a dedicated machine that’s always on.

A few things this required:

Credential management across machines. OAuth tokens are tied to a Google account, not a machine. We store tokens and have a script that copies them to the remote machine via `scp`. A single setup script creates the virtual environment and installs all dependencies. One command to get a new machine running.

Cron for routines. Scheduled tasks run via system cron. Simple, transparent, easy to debug.

State persistence. The assistant tracks its position in the email stream (via `historyId`), open tasks per lane, and other state in files in the repository. These persist across restarts and are visible to anyone who looks.

The approach is deliberately simple. No Kubernetes, no managed services, no LangChain, no agent framework. A Python virtual environment, the LLM SDK, a few API credentials, and a cron job. The complexity is in the operating manual, not the infrastructure. We initially ran this using a homemade framework but we have now moved to openclaw. If you use openclaw too, remember to spend enough time locking it down. It has a big security surface area.

Lessons learned

The integrations aren’t the hard part. OAuth flows, API clients, Pub/Sub subscriptions: these take a day or two to get right but they’re well-documented and mostly mechanical. The hard part is defining what the assistant should actually do with the information it has access to.

Autonomy is a dial, not a switch. Some things the assistant should do without asking: send the morning briefing, archive a handled email, update a tracking file. Other things need a human in the loop. Getting this calibration right takes iteration.

The operating manual is never done. Every week there’s a situation it didn’t handle well. That’s not a failure. It’s input. The manual gets better over time, and so does the assistant.

Treating it like a new hire changes everything. When you ask “what would I tell a new person joining the team?” you write better instructions than when you ask “what prompt should I give the AI?” The former produces judgment. The latter produces rules.

Beyond the executive assistant

We built this for an executive assistant, but the underlying pattern applies anywhere someone manages high information volume across multiple tracks of work.

A venture capital investor faces the same structural problem: hundreds of portfolio companies, thousands of inbound pitches, LP communications, market research, and a calendar that fills up faster than it empties. Lane-based tracking maps directly to sectors or stages. The morning briefing surfaces which portfolio companies need attention and which inbound pitches are worth reading.

A law firm partner juggles client matters, court deadlines, associate work product, and business development. Email threading and reply quality matter enormously in a profession where written communication is everything. Routine sweeps for approaching deadlines are table stakes.

An executive assistant is an important hire. An AI agent powered by Claude, GPT, or whichever frontier model fits the task does most of the same work at a fraction of the cost, around the clock, without needing to be managed.

The common thread: any role that involves managing information across multiple domains, maintaining relationships over time, and keeping many things from falling through the cracks is a candidate for this kind of system.

The technology is available. The LLM APIs are mature. The integrations are straightforward. The real investment is in the operating manual. Think carefully about how good judgment in your domain actually works, and write it down.

Pauling AI Blog

Discussion about this post

Ready for more?