Database Replication

This guide covers the logical replication setup between CRED's PostgreSQL databases and how to resynchronize tables when needed.

Overview

We use pglogical to replicate tables from cred-model (CloudSQL) to cred-model-api-dev.

Resynchronizing a Table

When a table gets out of sync or needs to be fully reloaded, you can trigger a resynchronization. This process truncates the target table and performs a fresh COPY from the source (cred-model CloudSQL).

1. Drop Table Indexes

Before resynchronizing, drop non-primary-key indexes on the target table to speed up the COPY operation. Run this on cred-model-api-dev.

Generate the DROP INDEX statements:

SELECT 'DROP INDEX IF EXISTS "' || indexname || '";'
FROM pg_indexes
WHERE tablename IN ('CompanyIdentifier', 'Person', 'PersonIdentifier_LINKEDIN')
  AND indexname NOT IN (
    SELECT conindid::regclass::text
    FROM pg_constraint
    WHERE contype = 'p'
      AND conrelid::regclass::text IN ('"CompanyIdentifier"')
  );

Review the output and execute the generated DROP INDEX statements.

Warning

Only drop non-primary-key indexes. The query above excludes primary key indexes via the pg_constraint subquery. Always review the generated statements before executing.

2. Trigger the Resynchronization

Run the following on cred-model-api-dev to resynchronize a table via the cred_tables pglogical subscription:

SELECT pglogical.alter_subscription_resynchronize_table(
  subscription_name := 'cred_tables',
  relation          := 'public."Person"'
);

This will truncate the table on the subscriber (cred-model-api-dev) and start a full COPY from the publisher (cred-model CloudSQL).

Tip

Replace 'public."Person"' with the fully-qualified name of whichever table you need to resync.

3. Index Recreation

After the COPY completes, indexes are recreated automatically by the scheduled schema sync script:

cred-postgres/scripts/schedule-schema-sync.sh

No manual action is required for index recreation.

Manual Sync (Incremental)

Instead of resynchronizing entire tables from scratch, you can force pglogical to re-replicate only recently changed rows. This is a lightweight approach that triggers replication by performing a no-op update on the updatedAt column, which pglogical detects as a change and replicates to cred-model-api-dev.

1. Generate the Update Queries

Run this on cred-model (CloudSQL). Set the date to the point from which you want to sync changes:

SELECT string_agg(format(
  $fmt$SELECT '%s' AS "table",
       min(date_trunc('day', "updatedAt")) AS updated_at,
       count(*) AS cnt,
       %L || '''' || min(date_trunc('day', "updatedAt"))::varchar || ''';' AS update_query
  FROM %s
  WHERE "updatedAt" >= '2026-03-20'
  GROUP BY 1$fmt$,
  set_reloid::varchar,
  format('UPDATE %s SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= ', set_reloid::varchar),
  set_reloid::varchar
), E'\nUNION ALL\n')
FROM pglogical.replication_set_table
ORDER BY 1;

Tip

Replace '2026-03-20' with the date from which you want to start syncing.

2. Review the Output

The query returns a result set showing each table, its earliest updatedAt, the row count, and the UPDATE statement to execute. Example output:

table	updated_at	cnt	update_query
`"Company"`	2026-03-20	1500	`UPDATE "Company" SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= '2026-03-20 00:00:00+00';`
`"SoccerPlayer"`	2026-03-20	800	`UPDATE "SoccerPlayer" SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= '2026-03-20 00:00:00+00';`

3. Execute the Update Statements

Copy the update_query values from the output and run them on cred-model (CloudSQL). For example:

UPDATE "Company" SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= '2026-03-20 00:00:00+00';
UPDATE "SoccerPlayer" SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= '2026-03-20 00:00:00+00';
UPDATE "SoccerTeam" SET "updatedAt" = "updatedAt" WHERE "updatedAt" >= '2026-03-26 00:00:00+00';
-- ... remaining tables

These no-op updates touch the updatedAt column without changing its value, which is enough to trigger pglogical to replicate those rows to cred-model-api-dev.

Warning

Review the row counts (cnt column) before executing. Large tables may take longer to process on the publisher side.

Tracking Replication Progress

To quickly check row counts on a table without running a full SELECT count(*) (which can be slow on large tables), use the PostgreSQL statistics estimate:

SELECT relname AS table_name,
       reltuples::bigint AS estimated_rows
FROM pg_class
WHERE relname IN ('Person', 'Company', 'CompanyIdentifier')
ORDER BY relname;

Run this on both cred-model (CloudSQL) and cred-model-api-dev to compare counts and verify replication is catching up.

Tip

reltuples is updated by ANALYZE and after bulk operations like COPY. It may lag slightly behind the actual count, but is fast enough to poll repeatedly during a long-running resync.

Database Backup Restoration — Point-in-time recovery via CloudSQL Clone
Connect to Cloud SQL — How to connect to CloudSQL instances
Data Architecture — Database structure and data management