LinkedIn Data Updates
Overview
When we receive new data from LinkedIn, we need to run two specific jobs to ensure updates are properly propagated throughout the system:
- LinkedIn Update Job - Rebuilds only the LinkedIn models
- Downstream Refresh Job - Updates dependent models that rely on LinkedIn data
LinkedIn Update Job
The LinkedIn update job rebuilds only the LinkedIn models with the latest data received.
- Schedule: Runs every Wednesday (automated)
- Link: LinkedIn Update Job
Downstream Refresh Job
The downstream refresh job updates the main dependent models that rely on LinkedIn data.
- Trigger: Manual
- Link: Downstream Refresh Job
Affected Models
Person Models:
- PersonJob, PersonEducation, PersonLanguage, PersonSkill
- PersonInterest, PersonCertification, PersonCategory
- PersonAudienceSkill, PersonAddress, PersonFieldValue
Address Models:
- all_addresses, address_chatgpt_components
- all_addresses_distinct, all_addresses_distinct_components
- all_addresses_matched_components, all_company_hq_addresses
- address_regionid, address_sources_prebuild
- company_office_location_normalized, CompanyAddress
- person_education_address, ProfileAddress
- address_prebuild, Address
Field Models:
- LICompanyFields, LIPersonFields, CompanyFieldValue
Search Models:
- PersonSearch, CompanySearch
Other:
- CompanyFieldsPrebuild
Workflow
When new LinkedIn data arrives:
1. Automatic: LinkedIn Update Job runs (Wednesdays)
↓
2. Optional: Trigger Downstream Refresh Job manually
↓
3. Alternative: Wait for weekly jobs to propagate
Weekly Jobs vs Manual Refresh
We already have weekly jobs that propagate these updates gradually across all models. The manual downstream refresh job is just a shortcut to speed things up when we specifically want to refresh everything that's closely linked to LinkedIn data.