Database Replication
This guide covers the logical replication setup between CRED's PostgreSQL databases and how to resynchronize tables when needed.
Overview
We use pglogical to replicate tables from cred-model (CloudSQL) to cred-model-api-dev.
Resynchronizing a Table
When a table gets out of sync or needs to be fully reloaded, you can trigger a resynchronization. This process truncates the target table and performs a fresh COPY from the source (cred-model CloudSQL).
1. Drop Table Indexes
Before resynchronizing, drop non-primary-key indexes on the target table to speed up the COPY operation. Run this on cred-model-api-dev.
Generate the DROP INDEX statements:
SELECT 'DROP INDEX IF EXISTS "' || indexname || '";'
FROM pg_indexes
WHERE tablename IN ('CompanyIdentifier', 'Person', 'PersonIdentifier_LINKEDIN')
AND indexname NOT IN (
SELECT conindid::regclass::text
FROM pg_constraint
WHERE contype = 'p'
AND conrelid::regclass::text IN ('"CompanyIdentifier"')
);
Review the output and execute the generated DROP INDEX statements.
Warning
Only drop non-primary-key indexes. The query above excludes primary key indexes via the pg_constraint subquery. Always review the generated statements before executing.
2. Trigger the Resynchronization
Run the following on cred-model-api-dev to resynchronize a table via the cred_tables pglogical subscription:
SELECT pglogical.alter_subscription_resynchronize_table(
subscription_name := 'cred_tables',
relation := 'public."Person"'
);
This will truncate the table on the subscriber (cred-model-api-dev) and start a full COPY from the publisher (cred-model CloudSQL).
Tip
Replace 'public."Person"' with the fully-qualified name of whichever table you need to resync.
3. Index Recreation
After the COPY completes, indexes are recreated automatically by the scheduled schema sync script:
cred-postgres/scripts/schedule-schema-sync.sh
No manual action is required for index recreation.
Manual Sync (Incremental)
Instead of resynchronizing entire tables from scratch, you can force pglogical to re-replicate only recently changed rows. This is a lightweight approach that triggers replication by performing a no-op update on the updatedAt column, which pglogical detects as a change and replicates to cred-model-api-dev.
1. Generate the Update Queries
Run this on cred-model (CloudSQL). Set the date to the point from which you want to sync changes:
SELECT string_agg(format(
$fmt$SELECT '%s' AS "table",
min(date_trunc('day', "updatedAt")) AS updated_at,
count(*) AS cnt,
%L || '''' || min(date_trunc('day', "updatedAt"))::varchar || ''';' AS update_query
FROM %s
WHERE "updatedAt" >= '2026-03-20'
GROUP BY 1$fmt$,
set_reloid::varchar,
format('UPDATE %s SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= ', set_reloid::varchar),
set_reloid::varchar
), E'\nUNION ALL\n')
FROM pglogical.replication_set_table
ORDER BY 1;
Tip
Replace '2026-03-20' with the date from which you want to start syncing.
2. Review the Output
The query returns a result set showing each table, its earliest updatedAt, the row count, and the UPDATE statement to execute. Example output:
| table | updated_at | cnt | update_query |
|---|---|---|---|
"Company" |
2026-03-20 | 1500 | UPDATE "Company" SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= '2026-03-20 00:00:00+00'; |
"SoccerPlayer" |
2026-03-20 | 800 | UPDATE "SoccerPlayer" SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= '2026-03-20 00:00:00+00'; |
3. Execute the Update Statements
Copy the update_query values from the output and run them on cred-model (CloudSQL). For example:
UPDATE "Company" SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= '2026-03-20 00:00:00+00';
UPDATE "SoccerPlayer" SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= '2026-03-20 00:00:00+00';
UPDATE "SoccerTeam" SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= '2026-03-26 00:00:00+00';
-- ... remaining tables
These no-op updates touch the updatedAt column without changing its value, which is enough to trigger pglogical to replicate those rows to cred-model-api-dev.
Warning
Review the row counts (cnt column) before executing. Large tables may take longer to process on the publisher side.
Tracking Replication Progress
To quickly check row counts on a table without running a full SELECT count(*) (which can be slow on large tables), use the PostgreSQL statistics estimate:
SELECT relname AS table_name,
reltuples::bigint AS estimated_rows
FROM pg_class
WHERE relname IN ('Person', 'Company', 'CompanyIdentifier')
ORDER BY relname;
Run this on both cred-model (CloudSQL) and cred-model-api-dev to compare counts and verify replication is catching up.
Tip
reltuples is updated by ANALYZE and after bulk operations like COPY. It may lag slightly behind the actual count, but is fast enough to poll repeatedly during a long-running resync.
Related Documentation
- Database Backup Restoration — Point-in-time recovery via CloudSQL Clone
- Connect to Cloud SQL — How to connect to CloudSQL instances
- Data Architecture — Database structure and data management