Role Overview
We’re a high-volume eCommerce company looking for an AI Automation Engineer who can design and ship agent-based workflows that automate day-to-day operations: order handling, inventory/fulfillment coordination, customer operations, and marketing operations.
This is not a “prompt writing” role. You will build production-grade automations that integrate with our systems through APIs and webhooks, run under real-world constraints (rate limits, data quality issues, edge cases), and include monitoring and safe fallbacks.
Core Responsibilities
You will:
- Design and implement agent-based workflows that solve multi-step operational tasks (triage → decision → action → audit trail), combining deterministic logic with LLM tool-calling where appropriate.
- Build and maintain integrations with Shopify via Admin API + webhooks (event-driven automations as an alternative to constant polling).
- Engineer systems that respect Shopify API rate limits (GraphQL cost-based limits; REST constraints where applicable) and implement backoff/retry/idempotency patterns.
- Create robust automation services/tools: job runners, queues, state persistence, human-in-the-loop review, and audit logging.
- Implement structured outputs (schemas) for reliable downstream automation (e.g., extracting order exception reasons, generating a validated action plan) using function/tool calling.
- Partner with Ops, CX, and Marketing to identify the highest-leverage automations; define success metrics; iterate based on performance and error rates.
Day-to-Day Activities
A typical week may include:
- Mapping an ops workflow (e.g., “inventory discrepancy → investigate → decide hold/release → notify 3PL → update tags/notes”) and translating it into a stateful workflow with clear stop conditions.
- Implementing or updating a webhook-driven pipeline for “order created/updated/fulfilled” events to trigger downstream actions.
- Building a toolset for agents: Shopify queries/mutations, WMS/OMS API calls, ticket creation, Slack/email notifications, Google Sheets updates, and internal dashboards.
- Tuning reliability: retries, dead-letter queues, deduplication, sandbox/test modes, and rollbacks.
- Reviewing production telemetry: latency, failure modes (THROTTLED), token/cost impact, and business KPIs (manual hours saved, exception volume).
Qualifications, Skills, and ToolingRequired Qualifications
- Production engineering experience building automation/integration services (APIs, webhooks, queues, background jobs).
- Strong Python or TypeScript/Node.js.
- Experience building AI applications that use tool calling/function calling and structured outputs (schemas) to reliably trigger actions.
- Hands-on agent workflow orchestration experience (one or more): LangGraph/LangChain patterns, Microsoft AutoGen patterns, CrewAI-style role/task orchestration, or a custom state machine approach.
- Strong API integration skills with event-driven design; familiarity with Shopify webhooks and Admin API patterns (GraphQL preferred).
- Practical reliability engineering: idempotency, retries/backoff, state persistence for long-running flows, and safe rollbacks/human review steps.
Preferred Qualifications
- Experience with cost/rate-limit aware designs (Shopify’s GraphQL Admin API).
- Familiarity with stateful agent systems and enterprise features such as session/state management and telemetry.
- Experience with modern retrieval patterns (RAG) and vector search for internal knowledge bases.
- Observability: logs/metrics/traces, structured logging for auditability, alerting for failures.
- Security: secrets management, least privilege scopes, PII-safe logging.
What We Are Explicitly Not Looking For
- No “OpenCrawl/Common Crawl-first” or “scrape-the-web-at-scale” solutions. We prioritize first-party data and stable integrations.
- Avoid fragile UI scraping as a primary strategy. If any scraping is proposed, it must be compliant, rate-limited, cached, and have an API-based fallback.
Pay: $130,000.00 - $170,000.00 per year
Benefits:
- Employee discount
- Health insurance
- Paid time off
People with a criminal record are encouraged to apply
Application Question(s):
- Describe an agent-based workflow you deployed to production. What tools did the agent call, and how did you prevent unsafe actions?
- How do you implement idempotency and deduplication for webhook-driven systems (e.g., Shopify order events) to ensure actions aren't repeated?
- Provide an example of using function/tool calling with a JSON schema to trigger real actions like database updates or API calls.
- Shopify has strict API limits (including GraphQL cost). How have you designed systems to avoid throttling while still processing high volumes of data?
- When an agent fails mid-workflow, how do you resume safely without repeating previous actions or losing the context of the task?
Work Location: In person