Skip to content

Federation Troubleshooting

Practical solutions to real problems encountered during local federation development. Each entry follows: Symptom (what you see), Cause (why it happens), Fix (what to do).


Startup Failures

Model API fails to start -- Docker container exits immediately

Symptom: docker compose up -d in cred-model-api exits. Logs show connection errors to PostgreSQL or Elasticsearch.

Cause: The .env has stale or unreachable database/ES endpoints. Manually assembled .env files often use older key names (MODEL_DATABASE_URL instead of MODEL_API_DATABASE_URL, ELASTICSEARCH_NODE instead of ES_ENDPOINT) or stale hostnames.

Fix:

  1. Verify the .env uses correct key names: MODEL_API_DATABASE_URL (not MODEL_DATABASE_URL), ES_ENDPOINT (not ELASTICSEARCH_NODE).
  2. Pull fresh values from Cloud Run: cd cred-model-api && ./source-cloudrun-variables.sh (requires gcloud).
  3. Confirm the DB host is reachable from your machine.

Model API seed preflight fails -- "Unknown resource"

Symptom: Startup prints: Model API is up, but personById(2) cannot reach its backing search data.

Cause: Model API started but Elasticsearch is misconfigured or unreachable.

Fix: Check cred-model-api/.env for:

  • ES_CLOUD_ID and ES_API_KEY (preferred), or ES_ENDPOINT + ES_USER + ES_PASSWORD
  • ES_INDEX_ENV should be dev (indices are named <index>-dev)
  • Verify ES connectivity: curl -s https://<ES_ENDPOINT> from your host

Model API seed preflight fails -- "UNAUTHENTICATED"

Symptom: Startup prints: Model API rejected the bootstrap query token.

Cause: CRED_MODEL_API_TOKEN in commercial's .env does not match API_TOKEN in model-api's .env.

Fix:

  1. Open cred-api-commercial/.env and find CRED_MODEL_API_TOKEN.
  2. Open cred-model-api/.env and find API_TOKEN.
  3. They must be the same value. Copy one to the other.
  4. Restart model-api: cd cred-model-api && docker compose down && docker compose up -d

Model API seed preflight fails -- personById returns null

Symptom: Model API is up, but personById(2) returned null.

Cause: The Elasticsearch index does not contain the seed person (ID 2). ES_INDEX_ENV points at an empty or wrong index suffix.

Fix: Verify ES_INDEX_ENV=dev in cred-model-api/.env.

Router composition fails -- schema mismatch

Symptom: Apollo Router fails to start. Logs show rover supergraph compose errors about conflicting type definitions.

Cause: Commercial API was built against a stale model-api schema. The generated model client in src/services/model/generated/ has outdated type definitions.

Fix:

  1. Ensure model-api is running and healthy at http://localhost:3000/graphql.
  2. Regenerate the commercial model client:
    cd cred-api-commercial
    docker compose run --rm --no-deps web ./generate-model-api-client.sh
    
  3. Rebuild the commercial image:
    docker compose --profile with_federation up -d --build
    

Note

The fed CLI runs codegen automatically as a pre-task. This issue mainly occurs when starting services manually.

Router composition fails -- subgraph unreachable

Symptom: Router logs show connection refused to http://host.docker.internal:3000/graphql.

Cause: The model-api or filter-api container is not running, or host.docker.internal is not resolving.

Fix:

  1. Verify model-api is up: curl -s http://localhost:3000/graphql -H "Content-Type: application/json" -d '{"query":"{ __typename }"}'
  2. Verify Docker Desktop is running (it provides host.docker.internal).
  3. On Linux without Docker Desktop, add extra_hosts: ["host.docker.internal:host-gateway"] to the compose service.

Startup script fails -- "Missing cred-api-commercial/.env"

Symptom: Federation startup exits immediately with a missing .env error.

Cause: The commercial .env is gitignored and must be created manually.

Fix:

  1. Pull from Cloud Run: cd cred-api-commercial && ./source-cloudrun-variables.sh
  2. Or manually: gcloud run services describe cred-api-commercial-dev --region us-central1 --format 'yaml(spec.template.spec.containers[0].env)'
  3. Verify it contains at minimum: JWT_SECRET and CRED_MODEL_API_TOKEN

Startup script fails -- "Could not locate cred-model-api"

Symptom: Could not locate cred-model-api near /path/to/cred-api-commercial.

Cause: The workspace layout is non-standard and the tooling cannot find sibling repos.

Fix: Create .local-workspace.env at the workspace root:

MODEL_API_DIR=/absolute/path/to/cred-model-api

Startup script fails -- multiline PEM breaks shell parsing

Symptom: Cryptic shell errors like unexpected end of file when reading .env.

Cause: The commercial .env contains multiline PEM private keys. Manual source of the file breaks.

Fix: Never manually source cred-api-commercial/.env. The tooling uses dotenv-utils.sh to safely extract single-line values.

fed start preflight fails

Symptom: ./fed start exits before starting any services, showing preflight errors.

Cause: The fed CLI validates Docker, directories, .env files, required env keys, and compose overlay files before starting.

Fix: Read the specific error messages. Common fixes:

  1. Docker not running: open -a Docker
  2. Missing .env: Copy from reference or pull from Cloud Run
  3. Missing required env key: Add the key to the service's .env. Check fed.toml required_env_keys for the full list
  4. Directory not found: Ensure repos are cloned in the expected layout

Startup takes too long -- prep-db rebuilds every time

Symptom: Every startup runs yarn prep-db (build + migrate + seed), adding 3-5 minutes.

Cause: The skip check looks for the User table in pg_tables. If anonymous Docker volumes lose the dist directory between restarts, the build step re-runs.

Fix:

  • Use --clean only when you need a full reset (it removes volumes)
  • The --build flag on docker compose up regenerates dist without a full DB reset

Service-Specific Issues

Commercial API -- "Cannot find module" or stale TypeScript errors

Symptom: The web container crashes with TypeScript/import errors referencing model/generated/ paths.

Cause: The generated model-api client is stale or missing. Codegen failed silently during build because model-api was down.

Fix:

cd cred-api-commercial
# Ensure model-api is running first, then:
docker compose run --rm --no-deps web ./generate-model-api-client.sh
docker compose up -d --build web

Commercial API -- Worker not running

Symptom: Background jobs (imports, CRM sync, scheduled tasks) do not execute. BullMQ shows queued but unprocessed jobs.

Cause: The worker is a separate Docker Compose service that may not have started or may have crashed.

Fix:

cd cred-api-commercial
docker compose ps worker
# If not running:
docker compose up -d worker
# Check logs:
docker compose logs -f worker

Commercial API -- OOM (Out of Memory) in web/worker container

Symptom: Container crashes with FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory.

Cause: Default Node.js heap is too small for TypeScript compilation and runtime.

Fix: Verify docker-compose.override.yml sets the heap limit:

services:
  web:
    environment:
      - NODE_OPTIONS=--max-old-space-size=4096
  worker:
    environment:
      - NODE_OPTIONS=--max-old-space-size=4096

Model API -- Redis connection errors

Symptom: Logs show Error: connect ECONNREFUSED 127.0.0.1:6379.

Cause: Inside Docker, model-api connects to redis://model_api_cache, not localhost. The Redis container name is model_api_cache.

Fix: Ensure the Redis container is running: cd cred-model-api && docker compose ps. If REDISCLOUD_URL in .env points to localhost, that is the host-side URL; the Docker container uses the compose service name internally.

Filter API -- auth tokens rejected against remote dev

Symptom: Filter-dependent features fail with 401/403 errors even though the rest of the federation works.

Cause: When filter-api is not started locally, the router routes to the remote dev filter-api. Local auth tokens (signed with your local JWT_SECRET) are not accepted by the remote service.

Fix:

  1. Start filter-api locally (remove --skip filter-api)
  2. Or accept that filter-dependent features will not work in the hybrid setup

Agent AI -- WebSocket auth hangs or times out

Symptom: iOS or web connects to ws://localhost:8080/ws but auth never succeeds.

Cause: Agent-ai cannot validate the JWT because the user does not exist in its database.

Fix: Ensure cred-agent-ai/.env has:

LOCAL_FEDERATION_MODE=true
COMMERCIAL_API_URL=http://host.docker.internal:8000
DATABASE_URL=postgres://cred@host.docker.internal:5432/cred_commercial
MCP_SERVER_URL=http://host.docker.internal:5000/mcp

All three targets must be local for the auth fallback to activate.

Agent AI -- MCP tools fail for local users

Symptom: Agent responses that use MCP tools fail with tool call errors.

Cause: The MCP server is not running locally, or it points at the remote router instead of the local one. Local-only users do not exist in the remote system.

Fix:

  1. Verify the GraphQL MCP server is running: curl -s http://localhost:5000/mcp
  2. Check cred-mcp/graphql-mcp-server/.env:
    APOLLO_MCP_ENDPOINT=http://host.docker.internal:4000/graphql
    MCP_ALLOWED_HOST=host.docker.internal
    
  3. Verify agent-ai points to local MCP: MCP_SERVER_URL=http://host.docker.internal:5000/mcp

GraphQL MCP Server -- platform mismatch (Docker)

Symptom: MCP container fails to start on Apple Silicon Mac. Logs mention architecture mismatch or binary exec format error.

Cause: The Apollo MCP binary is x86_64 only. The Docker Compose file sets platform: linux/amd64 for Rosetta emulation, which requires Docker Desktop with Rosetta enabled.

Fix:

  1. Open Docker Desktop > Settings > General > "Use Rosetta for x86_64/amd64 emulation on Apple Silicon" -- enable it.
  2. Rebuild: cd cred-mcp/graphql-mcp-server && docker compose up -d --build

Database & Data Issues

FDW bootstrap fails -- "could not connect to server"

Symptom: bootstrap-local-sql-federation.sh fails with could not connect to server: Connection refused.

Cause: The local commercial PostgreSQL cannot reach the remote model-api database through the FDW.

Fix:

  1. Verify the model DB host is reachable from inside the Docker container (not just from your host).
  2. Check cred-model-api/.env for MODEL_API_DATABASE_URL.
  3. Test from inside the container:
    cd cred-api-commercial
    docker compose exec db psql -U cred -d cred_commercial -c "SELECT * FROM pg_foreign_server;"
    

FDW bootstrap fails -- "schema credentity does not exist" on remote

Symptom: IMPORT FOREIGN SCHEMA credentity fails because the remote model DB does not have the schema.

Cause: The connection string points at the wrong database.

Fix: Verify MODEL_API_DATABASE_URL points at the main model-api DB, not the prediction or generated DB.

Model API -- "relation X does not exist" (PgBouncer search_path -- COM-33540)

Known Issue: COM-33540

This is a persistent issue with the remote dev database connection. It affects all services connecting through PgBouncer.

Symptom: Model API queries intermittently fail with relation "X" does not exist for tables in public or credentity schemas. Non-deterministic -- may succeed on retry.

Cause: The remote dev database connects through PgBouncer at port 40431 in transaction mode. PgBouncer's server_reset_query = DISCARD ALL is configured but not executing (query/xact ratio = 1.001). Some backend connections are contaminated with search_path = pg_catalog from unknown prior clients. Without the reset, contamination persists indefinitely.

In PgBouncer transaction mode, every SQL statement is a separate transaction that can be routed to a different backend. Standalone SET search_path commands (including Knex's built-in searchPath config) are ineffective because the SET and the query can hit different backends.

Fix: Set KNEX_SEARCH_PATH=public,credentity,pg_catalog in the model-api .env:

# In cred-model-api/.env:
KNEX_SEARCH_PATH=public,credentity,pg_catalog

This activates a _query override in model-api-db.ts that wraps standalone queries in BEGIN / SET LOCAL search_path / <query> / COMMIT, guaranteeing the SET and query hit the same backend via PgBouncer's transaction pinning.

Permanent fix: COM-33540 -- add search_path to PgBouncer's track_extra_parameters or fix the non-executing server_reset_query on the postgres-model-api-dev-bouncer deployment.

DataDescription missing commercial entities (Contact, Account, etc.)

Symptom: Features that depend on credentity.DataDescription (e.g., contact import) fail or return incomplete data. The table only shows model-api entities (Person, Company) but no commercial entities.

Cause: The FDW only mirrors the model-api's raw table. In production, cred-commercial-dbt merges model + commercial entities. Without the credentity-bootstrap task, the local table is incomplete.

Fix: Run the credentity bootstrap:

cd cred-api-commercial
python3 ../../federation/repo-tools/bootstrap_credentity.py .

Or restart the federation stack -- ./fed start runs this automatically. The script is idempotent.

Import reads stale file data after DB reset

Symptom: After a --clean restart or DB reset, imports produce wrong data from a previous database's files.

Cause: The import system caches files at temp/file-{id} inside the commercial API container. These persist via Docker bind mount across DB recreations. When file IDs recycle, new imports read stale cached files.

Fix:

cd cred-api-commercial
docker compose exec -T web bash -c 'rm -rf /usr/src/app/temp/file-*'

The fed CLI runs this automatically after db-init.

CustomFieldRecord tables are empty locally

Symptom: CustomFieldRecord_p_450931 has zero rows. Custom field resolvers return null.

Cause: This is expected for a fresh local environment. Custom field records are populated by the waterfall processing pipeline. The local seed does not populate them.

Fix: This is expected behavior. For local testing, either use the web client to trigger the data source provisioning flow or accept that custom field data will be empty.

Migration errors during prep-db

Symptom: yarn prep-db or yarn migrate fails inside the web container.

Cause: A migration references a missing table/column, targets a newer schema version, or a previous migration left the DB inconsistent.

Fix:

cd cred-api-commercial
# Check migration status:
docker compose exec web yarn knex migrate:status
# Try rolling back and re-running:
docker compose exec web yarn rollback
docker compose exec web yarn migrate
# Nuclear option -- full reset:
docker compose exec web yarn reset-db

Cannot connect to remote database from host

Symptom: psql cannot reach the remote dev database directly.

Cause: GCP Cloud SQL databases are not publicly accessible.

Fix: Connect through the running container instead:

docker compose exec web yarn knex-repl

Auth & Token Issues

JWT_SECRET mismatch across services

Symptom: Requests succeed on commercial API but fail with UNAUTHENTICATED on model-api or filter-api.

Cause: The JWT HMAC key must be identical across services.

Service Env Var Name
commercial-api JWT_SECRET
model-api COMMERCIAL_JWT_SECRET
filter-api COMMERCIAL_JWT_SECRET
agent-ai JWT_SECRET

Fix: Verify all services use the same value:

grep JWT_SECRET cred-api-commercial/.env
grep COMMERCIAL_JWT_SECRET cred-model-api/.env

Tip

When using ./fed start, the CLI injects COMMERCIAL_JWT_SECRET automatically from commercial-api's JWT_SECRET. This issue mainly occurs when starting services manually.

Login returns a token but subsequent queries fail

Symptom: Login succeeds but GraphQL queries return UNAUTHENTICATED.

Cause: The token was issued with a different JWT_SECRET than the validating service expects, the token expired (7-day TTL), or the issuer claim is not recognized.

Fix:

  1. Decode the token at jwt.io and check the iss claim.
  2. Verify the secret matches between issuing and validating services.
  3. For local dev, use the seed credentials: admin@credinvestments.com / P@ssword01.

Auth server port conflicts with model-api

Symptom: Starting cred-auth causes model-api to fail. Both default to port 3000.

Cause: cred-auth is not part of the federation stack.

Fix: Override the auth server port: PORT=3001 npm run start:dev

CORS errors in the browser

Symptom: Browser console shows Access-Control-Allow-Origin errors.

Cause: The router CORS config only allows specific origins: http://localhost:8002, http://localhost:3000, http://localhost:8082, http://localhost:9001, and https://studio.apollographql.com.

Fix:

  1. Run the web dev server on port 8002 (the default).
  2. If using a different port, add it to router.local-federation.yml and restart the router.

Docker Issues

Volume staleness -- stale node_modules or dist

Symptom: After updating dependencies or switching branches, the container has outdated modules. Build or runtime crashes with "Cannot find module".

Cause: Docker anonymous volumes persist across restarts. Old node_modules in the volume shadows freshly-installed ones.

Fix:

# Recommended: use the --clean flag
./fed start --clean

# Or manually for a specific service:
cd cred-api-commercial
docker compose --profile with_federation down -v
docker compose --profile with_federation up -d --build

Port conflicts -- "address already in use"

Symptom: docker compose up fails with Bind for 0.0.0.0:XXXX failed: port is already allocated.

Cause: Another process or container is using the port.

Fix:

# Find what is using the port:
lsof -i :8000
# Or check Docker:
docker ps --format '{{.Ports}} {{.Names}}' | grep 8000

Port allocation reference:

Port Service
3000 Model API
4000 Apollo Router
5000 GraphQL MCP
5432 Commercial Postgres
6379 Commercial Redis
6380 Model Redis
6382 Filter Redis
8000 Commercial API
8002 Web Frontend
8080 Agent AI
8081 Filter API

Container will not start -- no error in compose output

Symptom: docker compose up -d succeeds, but docker compose ps shows the service as exited/restarting.

Cause: The container started but the application inside crashed.

Fix:

docker compose logs web
# Or follow in real time:
docker compose logs -f web

Docker Desktop not running

Symptom: Any Docker command fails with Cannot connect to the Docker daemon.

Fix: Start Docker Desktop: open -a Docker or launch from Applications.

host.docker.internal not resolving

Symptom: Services inside Docker cannot reach host-network services. getaddrinfo ENOTFOUND host.docker.internal.

Cause: host.docker.internal is provided by Docker Desktop. It may not be available on Linux.

Fix: Ensure docker-compose.yml has extra_hosts: ["host.docker.internal:host-gateway"] on services that need to reach the host.


Code Generation Issues

Schema fetch fails during codegen

Symptom: generate-model-api-client.sh fails with connection errors.

Cause: The target service is not running or not reachable.

Fix:

  1. For local codegen, ensure model-api is running at http://localhost:3000/graphql.
  2. Test manually: curl -s http://localhost:3000/graphql -H "Content-Type: application/json" -d '{"query":"{ __typename }"}'

Stale generated types -- web

Symptom: Web TypeScript errors referencing GraphQL types that should exist or have wrong shapes.

Cause: Turbo caches the gql task output. Generated types are outdated.

Fix:

cd cred-web-commercial
# Force regenerate (bypass Turbo cache):
turbo run gql --force
# Or against local federation:
bash scripts/gql-local.sh
# Or delete Turbo cache:
rm -rf .turbo/
bun run gql:local

Stale generated types -- iOS

Symptom: Xcode build errors in generated GraphQL types -- missing types or wrong enum cases.

Cause: iOS codegen was run against a different schema version.

Fix:

cd cred-ios-commercial
./cred graphql sync local    # Against local router
./cred graphql sync dev      # Against remote dev

Codegen generates but types do not match runtime

Symptom: Generated types compile but queries fail at runtime with unexpected null fields.

Cause: Codegen schema was fetched from one environment but the app runs against another.

Fix: Always regenerate types against the same environment you are running:

  • Local: bun run gql:local (web) or ./cred graphql sync local (iOS)
  • Dev: bun run gql (web) or ./cred graphql sync dev (iOS)

Web codegen fails -- "Could not fetch schema"

Symptom: bun run gql:local fails because it cannot introspect the local gateway.

Cause: The local router is not running at http://localhost:4000/graphql.

Fix:

  1. Start the federation stack first: ./fed start
  2. Verify: curl -s http://localhost:4000/graphql -H "Content-Type: application/json" -d '{"query":"{ __typename }"}'

Network & Tunnel Issues

Webhooks not reaching local services

Symptom: External webhooks (Unipile/LinkedIn, Slack, Nylas) are not received.

Cause: External services cannot reach localhost.

Fix: Use a tunnel:

# cloudflared (preferred):
cloudflared tunnel --url http://localhost:8000

# ngrok (used by agent-ai for Slack OAuth):
ngrok http 8080 --domain=clearly-full-wasp.ngrok-free.app

ngrok tunnel conflicts

Symptom: Agent AI tries to start ngrok automatically but fails.

Cause: NGROK_ENABLED=true in agent-ai .env, and the domain is already in use.

Fix: Set NGROK_ENABLED=false in cred-agent-ai/.env. You only need ngrok for Slack OAuth testing.

iOS simulator cannot reach localhost

Symptom: iOS app on the simulator fails to connect to http://localhost:4000/graphql.

Cause: ATS blocks cleartext HTTP unless the DebugLocal scheme is used.

Fix:

  1. Build with the DebugLocal scheme (./cred build).
  2. Physical devices cannot use localhost -- they need the Mac's LAN IP.

iOS xcconfig URL truncation

Symptom: iOS app resolves BASE_URL as empty or truncated.

Cause: xcconfig files treat // as a comment, truncating http:// URLs.

Fix: Already handled: DebugLocal.xcconfig stores the URL without the scheme, and Environment.swift prepends http:// at runtime.


Diagnostic Commands Quick Reference

Check service health

# Apollo Router
curl -s http://localhost:4000/graphql -H "Content-Type: application/json" \
  -d '{"query":"{ __typename }"}'

# Commercial API
curl -s http://localhost:8000/graphql -H "Content-Type: application/json" \
  -d '{"query":"{ __typename }"}'

# Model API
curl -s http://localhost:3000/graphql -H "Content-Type: application/json" \
  -d '{"query":"{ __typename }"}'

# Filter API
curl -s http://localhost:8081/graphql -H "Content-Type: application/json" \
  -d '{"query":"{ __typename }"}'

# Agent AI
curl -s http://localhost:8080/

# GraphQL MCP
curl -s http://localhost:5000/mcp

View Docker service status

cd cred-api-commercial && docker compose --profile with_federation ps
cd cred-model-api && docker compose ps
cd cred-filter-api && docker compose ps
cd cred-agent-ai && docker compose ps
cd cred-mcp/graphql-mcp-server && docker compose ps

Database inspection

# Connect to local commercial DB:
cd cred-api-commercial
docker compose exec db psql -U cred -d cred_commercial

# Check if DB is bootstrapped:
docker compose exec db psql -U cred -d cred_commercial -c \
  "SELECT exists (SELECT 1 FROM pg_tables WHERE tablename = 'User');"

# Check FDW status:
docker compose exec db psql -U cred -d cred_commercial -c \
  "SELECT * FROM pg_foreign_server;"

# Check credentity bootstrap status:
docker compose exec db psql -U cred -d cred_commercial -tAc \
  "SELECT EXISTS(SELECT 1 FROM information_schema.columns WHERE table_schema = 'credentity' AND table_name = 'DataDescription' AND column_name = 'isImportable');"

Verify shared secrets alignment

# JWT_SECRET in commercial:
grep "^JWT_SECRET=" cred-api-commercial/.env | head -1

# COMMERCIAL_JWT_SECRET in model-api:
grep "^COMMERCIAL_JWT_SECRET=" cred-model-api/.env | head -1

# Model API token alignment:
grep "^CRED_MODEL_API_TOKEN=" cred-api-commercial/.env
grep "^API_TOKEN=" cred-model-api/.env

Environment File Checklist

If the stack is broken on a new machine, verify these files exist and are populated:

# File Required Keys
1 cred-api-commercial/.env JWT_SECRET, CRED_MODEL_API_TOKEN, DATABASE_URL, REDISCLOUD_URL
2 cred-model-api/.env MODEL_API_DATABASE_URL, ES_CLOUD_ID, ES_API_KEY, API_TOKEN, KNEX_SEARCH_PATH
3 cred-mcp/graphql-mcp-server/.env APOLLO_MCP_ENDPOINT, APOLLO_GRAPH_REF, APOLLO_KEY, MCP_ALLOWED_HOST
4 cred-agent-ai/.env JWT_SECRET, ANTHROPIC_API_KEY, COMMERCIAL_API_URL, DATABASE_URL, MCP_SERVER_URL