Skip to content

LinkedIn Data Updates

Overview

When we receive new data from LinkedIn, we need to run two specific jobs to ensure updates are properly propagated throughout the system:

  1. LinkedIn Update Job - Rebuilds only the LinkedIn models
  2. Downstream Refresh Job - Updates dependent models that rely on LinkedIn data

LinkedIn Update Job

The LinkedIn update job rebuilds only the LinkedIn models with the latest data received.


Downstream Refresh Job

The downstream refresh job updates the main dependent models that rely on LinkedIn data.

Affected Models

Person Models:

  • PersonJob, PersonEducation, PersonLanguage, PersonSkill
  • PersonInterest, PersonCertification, PersonCategory
  • PersonAudienceSkill, PersonAddress, PersonFieldValue

Address Models:

  • all_addresses, address_chatgpt_components
  • all_addresses_distinct, all_addresses_distinct_components
  • all_addresses_matched_components, all_company_hq_addresses
  • address_regionid, address_sources_prebuild
  • company_office_location_normalized, CompanyAddress
  • person_education_address, ProfileAddress
  • address_prebuild, Address

Field Models:

  • LICompanyFields, LIPersonFields, CompanyFieldValue

Search Models:

  • PersonSearch, CompanySearch

Other:

  • CompanyFieldsPrebuild

Workflow

When new LinkedIn data arrives:

1. Automatic: LinkedIn Update Job runs (Wednesdays)
        ↓
2. Optional: Trigger Downstream Refresh Job manually
        ↓
3. Alternative: Wait for weekly jobs to propagate

Weekly Jobs vs Manual Refresh

We already have weekly jobs that propagate these updates gradually across all models. The manual downstream refresh job is just a shortcut to speed things up when we specifically want to refresh everything that's closely linked to LinkedIn data.