Federation Troubleshooting
Practical solutions to real problems encountered during local federation development. Each entry follows: Symptom (what you see), Cause (why it happens), Fix (what to do).
Startup Failures
Model API fails to start -- Docker container exits immediately
Symptom: docker compose up -d in cred-model-api exits. Logs show connection errors to PostgreSQL or Elasticsearch.
Cause: The .env has stale or unreachable database/ES endpoints. Manually assembled .env files often use older key names (MODEL_DATABASE_URL instead of MODEL_API_DATABASE_URL, ELASTICSEARCH_NODE instead of ES_ENDPOINT) or stale hostnames.
Fix:
- Verify the
.envuses correct key names:MODEL_API_DATABASE_URL(notMODEL_DATABASE_URL),ES_ENDPOINT(notELASTICSEARCH_NODE). - Pull fresh values from Cloud Run:
cd cred-model-api && ./source-cloudrun-variables.sh(requiresgcloud). - Confirm the DB host is reachable from your machine.
Model API seed preflight fails -- "Unknown resource"
Symptom: Startup prints: Model API is up, but personById(2) cannot reach its backing search data.
Cause: Model API started but Elasticsearch is misconfigured or unreachable.
Fix: Check cred-model-api/.env for:
ES_CLOUD_IDandES_API_KEY(preferred), orES_ENDPOINT+ES_USER+ES_PASSWORDES_INDEX_ENVshould bedev(indices are named<index>-dev)- Verify ES connectivity:
curl -s https://<ES_ENDPOINT>from your host
Model API seed preflight fails -- "UNAUTHENTICATED"
Symptom: Startup prints: Model API rejected the bootstrap query token.
Cause: CRED_MODEL_API_TOKEN in commercial's .env does not match API_TOKEN in model-api's .env.
Fix:
- Open
cred-api-commercial/.envand findCRED_MODEL_API_TOKEN. - Open
cred-model-api/.envand findAPI_TOKEN. - They must be the same value. Copy one to the other.
- Restart model-api:
cd cred-model-api && docker compose down && docker compose up -d
Model API seed preflight fails -- personById returns null
Symptom: Model API is up, but personById(2) returned null.
Cause: The Elasticsearch index does not contain the seed person (ID 2). ES_INDEX_ENV points at an empty or wrong index suffix.
Fix: Verify ES_INDEX_ENV=dev in cred-model-api/.env.
Router composition fails -- schema mismatch
Symptom: Apollo Router fails to start. Logs show rover supergraph compose errors about conflicting type definitions.
Cause: Commercial API was built against a stale model-api schema. The generated model client in src/services/model/generated/ has outdated type definitions.
Fix:
- Ensure model-api is running and healthy at
http://localhost:3000/graphql. - Regenerate the commercial model client:
cd cred-api-commercial docker compose run --rm --no-deps web ./generate-model-api-client.sh - Rebuild the commercial image:
docker compose --profile with_federation up -d --build
Note
The fed CLI runs codegen automatically as a pre-task. This issue mainly occurs when starting services manually.
Router composition fails -- subgraph unreachable
Symptom: Router logs show connection refused to http://host.docker.internal:3000/graphql.
Cause: The model-api or filter-api container is not running, or host.docker.internal is not resolving.
Fix:
- Verify model-api is up:
curl -s http://localhost:3000/graphql -H "Content-Type: application/json" -d '{"query":"{ __typename }"}' - Verify Docker Desktop is running (it provides
host.docker.internal). - On Linux without Docker Desktop, add
extra_hosts: ["host.docker.internal:host-gateway"]to the compose service.
Startup script fails -- "Missing cred-api-commercial/.env"
Symptom: Federation startup exits immediately with a missing .env error.
Cause: The commercial .env is gitignored and must be created manually.
Fix:
- Pull from Cloud Run:
cd cred-api-commercial && ./source-cloudrun-variables.sh - Or manually:
gcloud run services describe cred-api-commercial-dev --region us-central1 --format 'yaml(spec.template.spec.containers[0].env)' - Verify it contains at minimum:
JWT_SECRETandCRED_MODEL_API_TOKEN
Startup script fails -- "Could not locate cred-model-api"
Symptom: Could not locate cred-model-api near /path/to/cred-api-commercial.
Cause: The workspace layout is non-standard and the tooling cannot find sibling repos.
Fix: Create .local-workspace.env at the workspace root:
MODEL_API_DIR=/absolute/path/to/cred-model-api
Startup script fails -- multiline PEM breaks shell parsing
Symptom: Cryptic shell errors like unexpected end of file when reading .env.
Cause: The commercial .env contains multiline PEM private keys. Manual source of the file breaks.
Fix: Never manually source cred-api-commercial/.env. The tooling uses dotenv-utils.sh to safely extract single-line values.
fed start preflight fails
Symptom: ./fed start exits before starting any services, showing preflight errors.
Cause: The fed CLI validates Docker, directories, .env files, required env keys, and compose overlay files before starting.
Fix: Read the specific error messages. Common fixes:
- Docker not running:
open -a Docker - Missing
.env: Copy from reference or pull from Cloud Run - Missing required env key: Add the key to the service's
.env. Checkfed.tomlrequired_env_keysfor the full list - Directory not found: Ensure repos are cloned in the expected layout
Startup takes too long -- prep-db rebuilds every time
Symptom: Every startup runs yarn prep-db (build + migrate + seed), adding 3-5 minutes.
Cause: The skip check looks for the User table in pg_tables. If anonymous Docker volumes lose the dist directory between restarts, the build step re-runs.
Fix:
- Use
--cleanonly when you need a full reset (it removes volumes) - The
--buildflag ondocker compose upregenerates dist without a full DB reset
Service-Specific Issues
Commercial API -- "Cannot find module" or stale TypeScript errors
Symptom: The web container crashes with TypeScript/import errors referencing model/generated/ paths.
Cause: The generated model-api client is stale or missing. Codegen failed silently during build because model-api was down.
Fix:
cd cred-api-commercial
# Ensure model-api is running first, then:
docker compose run --rm --no-deps web ./generate-model-api-client.sh
docker compose up -d --build web
Commercial API -- Worker not running
Symptom: Background jobs (imports, CRM sync, scheduled tasks) do not execute. BullMQ shows queued but unprocessed jobs.
Cause: The worker is a separate Docker Compose service that may not have started or may have crashed.
Fix:
cd cred-api-commercial
docker compose ps worker
# If not running:
docker compose up -d worker
# Check logs:
docker compose logs -f worker
Commercial API -- OOM (Out of Memory) in web/worker container
Symptom: Container crashes with FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory.
Cause: Default Node.js heap is too small for TypeScript compilation and runtime.
Fix: Verify docker-compose.override.yml sets the heap limit:
services:
web:
environment:
- NODE_OPTIONS=--max-old-space-size=4096
worker:
environment:
- NODE_OPTIONS=--max-old-space-size=4096
Model API -- Redis connection errors
Symptom: Logs show Error: connect ECONNREFUSED 127.0.0.1:6379.
Cause: Inside Docker, model-api connects to redis://model_api_cache, not localhost. The Redis container name is model_api_cache.
Fix: Ensure the Redis container is running: cd cred-model-api && docker compose ps. If REDISCLOUD_URL in .env points to localhost, that is the host-side URL; the Docker container uses the compose service name internally.
Filter API -- auth tokens rejected against remote dev
Symptom: Filter-dependent features fail with 401/403 errors even though the rest of the federation works.
Cause: When filter-api is not started locally, the router routes to the remote dev filter-api. Local auth tokens (signed with your local JWT_SECRET) are not accepted by the remote service.
Fix:
- Start filter-api locally (remove
--skip filter-api) - Or accept that filter-dependent features will not work in the hybrid setup
Agent AI -- WebSocket auth hangs or times out
Symptom: iOS or web connects to ws://localhost:8080/ws but auth never succeeds.
Cause: Agent-ai cannot validate the JWT because the user does not exist in its database.
Fix: Ensure cred-agent-ai/.env has:
LOCAL_FEDERATION_MODE=true
COMMERCIAL_API_URL=http://host.docker.internal:8000
DATABASE_URL=postgres://cred@host.docker.internal:5432/cred_commercial
MCP_SERVER_URL=http://host.docker.internal:5000/mcp
All three targets must be local for the auth fallback to activate.
Agent AI -- MCP tools fail for local users
Symptom: Agent responses that use MCP tools fail with tool call errors.
Cause: The MCP server is not running locally, or it points at the remote router instead of the local one. Local-only users do not exist in the remote system.
Fix:
- Verify the GraphQL MCP server is running:
curl -s http://localhost:5000/mcp - Check
cred-mcp/graphql-mcp-server/.env:APOLLO_MCP_ENDPOINT=http://host.docker.internal:4000/graphql MCP_ALLOWED_HOST=host.docker.internal - Verify agent-ai points to local MCP:
MCP_SERVER_URL=http://host.docker.internal:5000/mcp
GraphQL MCP Server -- platform mismatch (Docker)
Symptom: MCP container fails to start on Apple Silicon Mac. Logs mention architecture mismatch or binary exec format error.
Cause: The Apollo MCP binary is x86_64 only. The Docker Compose file sets platform: linux/amd64 for Rosetta emulation, which requires Docker Desktop with Rosetta enabled.
Fix:
- Open Docker Desktop > Settings > General > "Use Rosetta for x86_64/amd64 emulation on Apple Silicon" -- enable it.
- Rebuild:
cd cred-mcp/graphql-mcp-server && docker compose up -d --build
Database & Data Issues
FDW bootstrap fails -- "could not connect to server"
Symptom: bootstrap-local-sql-federation.sh fails with could not connect to server: Connection refused.
Cause: The local commercial PostgreSQL cannot reach the remote model-api database through the FDW.
Fix:
- Verify the model DB host is reachable from inside the Docker container (not just from your host).
- Check
cred-model-api/.envforMODEL_API_DATABASE_URL. - Test from inside the container:
cd cred-api-commercial docker compose exec db psql -U cred -d cred_commercial -c "SELECT * FROM pg_foreign_server;"
FDW bootstrap fails -- "schema credentity does not exist" on remote
Symptom: IMPORT FOREIGN SCHEMA credentity fails because the remote model DB does not have the schema.
Cause: The connection string points at the wrong database.
Fix: Verify MODEL_API_DATABASE_URL points at the main model-api DB, not the prediction or generated DB.
Model API -- "relation X does not exist" (PgBouncer search_path -- COM-33540)
Known Issue: COM-33540
This is a persistent issue with the remote dev database connection. It affects all services connecting through PgBouncer.
Symptom: Model API queries intermittently fail with relation "X" does not exist for tables in public or credentity schemas. Non-deterministic -- may succeed on retry.
Cause: The remote dev database connects through PgBouncer at port 40431 in transaction mode. PgBouncer's server_reset_query = DISCARD ALL is configured but not executing (query/xact ratio = 1.001). Some backend connections are contaminated with search_path = pg_catalog from unknown prior clients. Without the reset, contamination persists indefinitely.
In PgBouncer transaction mode, every SQL statement is a separate transaction that can be routed to a different backend. Standalone SET search_path commands (including Knex's built-in searchPath config) are ineffective because the SET and the query can hit different backends.
Fix: Set KNEX_SEARCH_PATH=public,credentity,pg_catalog in the model-api .env:
# In cred-model-api/.env:
KNEX_SEARCH_PATH=public,credentity,pg_catalog
This activates a _query override in model-api-db.ts that wraps standalone queries in BEGIN / SET LOCAL search_path / <query> / COMMIT, guaranteeing the SET and query hit the same backend via PgBouncer's transaction pinning.
Permanent fix: COM-33540 -- add search_path to PgBouncer's track_extra_parameters or fix the non-executing server_reset_query on the postgres-model-api-dev-bouncer deployment.
DataDescription missing commercial entities (Contact, Account, etc.)
Symptom: Features that depend on credentity.DataDescription (e.g., contact import) fail or return incomplete data. The table only shows model-api entities (Person, Company) but no commercial entities.
Cause: The FDW only mirrors the model-api's raw table. In production, cred-commercial-dbt merges model + commercial entities. Without the credentity-bootstrap task, the local table is incomplete.
Fix: Run the credentity bootstrap:
cd cred-api-commercial
python3 ../../federation/repo-tools/bootstrap_credentity.py .
Or restart the federation stack -- ./fed start runs this automatically. The script is idempotent.
Import reads stale file data after DB reset
Symptom: After a --clean restart or DB reset, imports produce wrong data from a previous database's files.
Cause: The import system caches files at temp/file-{id} inside the commercial API container. These persist via Docker bind mount across DB recreations. When file IDs recycle, new imports read stale cached files.
Fix:
cd cred-api-commercial
docker compose exec -T web bash -c 'rm -rf /usr/src/app/temp/file-*'
The fed CLI runs this automatically after db-init.
CustomFieldRecord tables are empty locally
Symptom: CustomFieldRecord_p_450931 has zero rows. Custom field resolvers return null.
Cause: This is expected for a fresh local environment. Custom field records are populated by the waterfall processing pipeline. The local seed does not populate them.
Fix: This is expected behavior. For local testing, either use the web client to trigger the data source provisioning flow or accept that custom field data will be empty.
Migration errors during prep-db
Symptom: yarn prep-db or yarn migrate fails inside the web container.
Cause: A migration references a missing table/column, targets a newer schema version, or a previous migration left the DB inconsistent.
Fix:
cd cred-api-commercial
# Check migration status:
docker compose exec web yarn knex migrate:status
# Try rolling back and re-running:
docker compose exec web yarn rollback
docker compose exec web yarn migrate
# Nuclear option -- full reset:
docker compose exec web yarn reset-db
Cannot connect to remote database from host
Symptom: psql cannot reach the remote dev database directly.
Cause: GCP Cloud SQL databases are not publicly accessible.
Fix: Connect through the running container instead:
docker compose exec web yarn knex-repl
Auth & Token Issues
JWT_SECRET mismatch across services
Symptom: Requests succeed on commercial API but fail with UNAUTHENTICATED on model-api or filter-api.
Cause: The JWT HMAC key must be identical across services.
| Service | Env Var Name |
|---|---|
| commercial-api | JWT_SECRET |
| model-api | COMMERCIAL_JWT_SECRET |
| filter-api | COMMERCIAL_JWT_SECRET |
| agent-ai | JWT_SECRET |
Fix: Verify all services use the same value:
grep JWT_SECRET cred-api-commercial/.env
grep COMMERCIAL_JWT_SECRET cred-model-api/.env
Tip
When using ./fed start, the CLI injects COMMERCIAL_JWT_SECRET automatically from commercial-api's JWT_SECRET. This issue mainly occurs when starting services manually.
Login returns a token but subsequent queries fail
Symptom: Login succeeds but GraphQL queries return UNAUTHENTICATED.
Cause: The token was issued with a different JWT_SECRET than the validating service expects, the token expired (7-day TTL), or the issuer claim is not recognized.
Fix:
- Decode the token at
jwt.ioand check theissclaim. - Verify the secret matches between issuing and validating services.
- For local dev, use the seed credentials:
admin@credinvestments.com/P@ssword01.
Auth server port conflicts with model-api
Symptom: Starting cred-auth causes model-api to fail. Both default to port 3000.
Cause: cred-auth is not part of the federation stack.
Fix: Override the auth server port: PORT=3001 npm run start:dev
CORS errors in the browser
Symptom: Browser console shows Access-Control-Allow-Origin errors.
Cause: The router CORS config only allows specific origins: http://localhost:8002, http://localhost:3000, http://localhost:8082, http://localhost:9001, and https://studio.apollographql.com.
Fix:
- Run the web dev server on port 8002 (the default).
- If using a different port, add it to
router.local-federation.ymland restart the router.
Docker Issues
Volume staleness -- stale node_modules or dist
Symptom: After updating dependencies or switching branches, the container has outdated modules. Build or runtime crashes with "Cannot find module".
Cause: Docker anonymous volumes persist across restarts. Old node_modules in the volume shadows freshly-installed ones.
Fix:
# Recommended: use the --clean flag
./fed start --clean
# Or manually for a specific service:
cd cred-api-commercial
docker compose --profile with_federation down -v
docker compose --profile with_federation up -d --build
Port conflicts -- "address already in use"
Symptom: docker compose up fails with Bind for 0.0.0.0:XXXX failed: port is already allocated.
Cause: Another process or container is using the port.
Fix:
# Find what is using the port:
lsof -i :8000
# Or check Docker:
docker ps --format '{{.Ports}} {{.Names}}' | grep 8000
Port allocation reference:
| Port | Service |
|---|---|
| 3000 | Model API |
| 4000 | Apollo Router |
| 5000 | GraphQL MCP |
| 5432 | Commercial Postgres |
| 6379 | Commercial Redis |
| 6380 | Model Redis |
| 6382 | Filter Redis |
| 8000 | Commercial API |
| 8002 | Web Frontend |
| 8080 | Agent AI |
| 8081 | Filter API |
Container will not start -- no error in compose output
Symptom: docker compose up -d succeeds, but docker compose ps shows the service as exited/restarting.
Cause: The container started but the application inside crashed.
Fix:
docker compose logs web
# Or follow in real time:
docker compose logs -f web
Docker Desktop not running
Symptom: Any Docker command fails with Cannot connect to the Docker daemon.
Fix: Start Docker Desktop: open -a Docker or launch from Applications.
host.docker.internal not resolving
Symptom: Services inside Docker cannot reach host-network services. getaddrinfo ENOTFOUND host.docker.internal.
Cause: host.docker.internal is provided by Docker Desktop. It may not be available on Linux.
Fix: Ensure docker-compose.yml has extra_hosts: ["host.docker.internal:host-gateway"] on services that need to reach the host.
Code Generation Issues
Schema fetch fails during codegen
Symptom: generate-model-api-client.sh fails with connection errors.
Cause: The target service is not running or not reachable.
Fix:
- For local codegen, ensure model-api is running at
http://localhost:3000/graphql. - Test manually:
curl -s http://localhost:3000/graphql -H "Content-Type: application/json" -d '{"query":"{ __typename }"}'
Stale generated types -- web
Symptom: Web TypeScript errors referencing GraphQL types that should exist or have wrong shapes.
Cause: Turbo caches the gql task output. Generated types are outdated.
Fix:
cd cred-web-commercial
# Force regenerate (bypass Turbo cache):
turbo run gql --force
# Or against local federation:
bash scripts/gql-local.sh
# Or delete Turbo cache:
rm -rf .turbo/
bun run gql:local
Stale generated types -- iOS
Symptom: Xcode build errors in generated GraphQL types -- missing types or wrong enum cases.
Cause: iOS codegen was run against a different schema version.
Fix:
cd cred-ios-commercial
./cred graphql sync local # Against local router
./cred graphql sync dev # Against remote dev
Codegen generates but types do not match runtime
Symptom: Generated types compile but queries fail at runtime with unexpected null fields.
Cause: Codegen schema was fetched from one environment but the app runs against another.
Fix: Always regenerate types against the same environment you are running:
- Local:
bun run gql:local(web) or./cred graphql sync local(iOS) - Dev:
bun run gql(web) or./cred graphql sync dev(iOS)
Web codegen fails -- "Could not fetch schema"
Symptom: bun run gql:local fails because it cannot introspect the local gateway.
Cause: The local router is not running at http://localhost:4000/graphql.
Fix:
- Start the federation stack first:
./fed start - Verify:
curl -s http://localhost:4000/graphql -H "Content-Type: application/json" -d '{"query":"{ __typename }"}'
Network & Tunnel Issues
Webhooks not reaching local services
Symptom: External webhooks (Unipile/LinkedIn, Slack, Nylas) are not received.
Cause: External services cannot reach localhost.
Fix: Use a tunnel:
# cloudflared (preferred):
cloudflared tunnel --url http://localhost:8000
# ngrok (used by agent-ai for Slack OAuth):
ngrok http 8080 --domain=clearly-full-wasp.ngrok-free.app
ngrok tunnel conflicts
Symptom: Agent AI tries to start ngrok automatically but fails.
Cause: NGROK_ENABLED=true in agent-ai .env, and the domain is already in use.
Fix: Set NGROK_ENABLED=false in cred-agent-ai/.env. You only need ngrok for Slack OAuth testing.
iOS simulator cannot reach localhost
Symptom: iOS app on the simulator fails to connect to http://localhost:4000/graphql.
Cause: ATS blocks cleartext HTTP unless the DebugLocal scheme is used.
Fix:
- Build with the
DebugLocalscheme (./cred build). - Physical devices cannot use
localhost-- they need the Mac's LAN IP.
iOS xcconfig URL truncation
Symptom: iOS app resolves BASE_URL as empty or truncated.
Cause: xcconfig files treat // as a comment, truncating http:// URLs.
Fix: Already handled: DebugLocal.xcconfig stores the URL without the scheme, and Environment.swift prepends http:// at runtime.
Diagnostic Commands Quick Reference
Check service health
# Apollo Router
curl -s http://localhost:4000/graphql -H "Content-Type: application/json" \
-d '{"query":"{ __typename }"}'
# Commercial API
curl -s http://localhost:8000/graphql -H "Content-Type: application/json" \
-d '{"query":"{ __typename }"}'
# Model API
curl -s http://localhost:3000/graphql -H "Content-Type: application/json" \
-d '{"query":"{ __typename }"}'
# Filter API
curl -s http://localhost:8081/graphql -H "Content-Type: application/json" \
-d '{"query":"{ __typename }"}'
# Agent AI
curl -s http://localhost:8080/
# GraphQL MCP
curl -s http://localhost:5000/mcp
View Docker service status
cd cred-api-commercial && docker compose --profile with_federation ps
cd cred-model-api && docker compose ps
cd cred-filter-api && docker compose ps
cd cred-agent-ai && docker compose ps
cd cred-mcp/graphql-mcp-server && docker compose ps
Database inspection
# Connect to local commercial DB:
cd cred-api-commercial
docker compose exec db psql -U cred -d cred_commercial
# Check if DB is bootstrapped:
docker compose exec db psql -U cred -d cred_commercial -c \
"SELECT exists (SELECT 1 FROM pg_tables WHERE tablename = 'User');"
# Check FDW status:
docker compose exec db psql -U cred -d cred_commercial -c \
"SELECT * FROM pg_foreign_server;"
# Check credentity bootstrap status:
docker compose exec db psql -U cred -d cred_commercial -tAc \
"SELECT EXISTS(SELECT 1 FROM information_schema.columns WHERE table_schema = 'credentity' AND table_name = 'DataDescription' AND column_name = 'isImportable');"
Verify shared secrets alignment
# JWT_SECRET in commercial:
grep "^JWT_SECRET=" cred-api-commercial/.env | head -1
# COMMERCIAL_JWT_SECRET in model-api:
grep "^COMMERCIAL_JWT_SECRET=" cred-model-api/.env | head -1
# Model API token alignment:
grep "^CRED_MODEL_API_TOKEN=" cred-api-commercial/.env
grep "^API_TOKEN=" cred-model-api/.env
Environment File Checklist
If the stack is broken on a new machine, verify these files exist and are populated:
| # | File | Required Keys |
|---|---|---|
| 1 | cred-api-commercial/.env |
JWT_SECRET, CRED_MODEL_API_TOKEN, DATABASE_URL, REDISCLOUD_URL |
| 2 | cred-model-api/.env |
MODEL_API_DATABASE_URL, ES_CLOUD_ID, ES_API_KEY, API_TOKEN, KNEX_SEARCH_PATH |
| 3 | cred-mcp/graphql-mcp-server/.env |
APOLLO_MCP_ENDPOINT, APOLLO_GRAPH_REF, APOLLO_KEY, MCP_ALLOWED_HOST |
| 4 | cred-agent-ai/.env |
JWT_SECRET, ANTHROPIC_API_KEY, COMMERCIAL_API_URL, DATABASE_URL, MCP_SERVER_URL |