Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Member enrichment ProgAI Linkedin scraper source #2674

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

epipav
Copy link
Collaborator

@epipav epipav commented Nov 5, 2024

Changes proposed ✍️

What

copilot:summary

copilot:poem

Why

How

copilot:walkthrough

Checklist ✅

  • Label appropriately with Feature, Improvement, or Bug.
  • Add screenshots to the PR description for relevant FE changes
  • New backend functionality has been unit-tested.
  • API documentation has been updated (if necessary) (see docs on API documentation).
  • Quality standards are met.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new enrichment source for LinkedIn scraping via PROGAI_LINKEDIN_SCRAPER.
    • Added a function to retrieve member enrichment cache data for all sources.
  • Enhancements

    • Updated methods to support asynchronous operations, improving performance and responsiveness.
    • Enhanced data structures to accommodate more flexible handling of member enrichment data, including new properties in interfaces.
    • Improved logic for handling member identities to prioritize the most active identity.
    • Added a new interface for LinkedIn enrichment data, incorporating additional metadata.
  • Documentation

    • Updated interface definitions to reflect new properties and method signatures.

Copy link

coderabbitai bot commented Nov 5, 2024

Walkthrough

The pull request introduces several enhancements to the member enrichment worker service. A new function, findMemberEnrichmentCacheForAllSources, retrieves member enrichment cache data for all sources based on a member ID. The normalizeEnrichmentData function's return type is modified for greater flexibility. Additionally, a new enrichment service for LinkedIn scraping is added, and several existing methods are updated to support asynchronous operations. The overall structure remains intact while enhancing data handling and expanding enrichment sources.

Changes

File Change Summary
src/activities/enrichment.ts - Added method: findMemberEnrichmentCacheForAllSources(memberId: string): Promise<IMemberEnrichmentCache<IMemberEnrichmentData>[]>
- Updated method signature: normalizeEnrichmentData(...) to return more flexible types.
src/factory.ts - Updated getEnrichmentSourceService to include MemberEnrichmentSource.PROGAI_LINKEDIN_SCRAPER, returning an instance of EnrichmentServiceProgAILinkedinScraper.
src/sources/clearbit/service.ts - Updated method signature: isEnrichableBySource(input: IEnrichmentSourceInput): Promise<boolean>.
src/sources/progai-linkedin-scraper/service.ts - Added class: EnrichmentServiceProgAILinkedinScraper with several methods for LinkedIn data enrichment.
src/sources/progai/service.ts - Updated method signature: isEnrichableBySource(input: IEnrichmentSourceInput): Promise<boolean>.
src/sources/serp/service.ts - Updated method signature: isEnrichableBySource(input: IEnrichmentSourceInput): Promise<boolean>.
- Updated normalization logic in normalize method.
src/types.ts - Added property: memberId: string to IEnrichmentSourceInput.
- Updated IMemberEnrichmentData to allow for an array of types.
- Updated method signatures in IEnrichmentService to return Promise<boolean>.
src/workflows/enrichMember.ts - Modified enrichmentInput to include memberId: input.id and adjusted logic for handling member identities.
src/workflows/getMembersToEnrich.ts - Added MemberEnrichmentSource.PROGAI_LINKEDIN_SCRAPER to the sources array in getMembersToEnrich.
src/old/apps/premium/members_enrichment_worker/index.ts - Added method: findMemberEnrichmentCacheForAllSourcesDb<T>(tx: DbConnOrTx, memberId: string): Promise<IMemberEnrichmentCache<T>[]>.
src/enums/enrichment.ts - Added enum member: PROGAI_LINKEDIN_SCRAPER = 'progai-linkedin-scraper'.

Possibly related PRs

Suggested labels

Feature, Improvement

Suggested reviewers

  • sausage-todd

Poem

In the fields where data flows,
A rabbit hops where knowledge grows.
With new sources and methods bright,
Enrichment shines, a joyful sight!
From LinkedIn's grasp to caches wide,
Our members' tales we now can guide! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b565f7a and 5632d0e.

📒 Files selected for processing (1)
  • services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@epipav epipav marked this pull request as ready for review November 5, 2024 12:43
@epipav epipav self-assigned this Nov 5, 2024
@epipav epipav requested a review from themarolt November 5, 2024 12:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (16)
services/libs/types/src/enums/enrichment.ts (1)

5-5: Ensure compliance with LinkedIn's terms of service.

Web scraping should be implemented responsibly. Please verify:

  1. Compliance with LinkedIn's robots.txt and terms of service
  2. Implementation of appropriate delays between requests
  3. Proper error handling for blocked requests
services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (1)

27-27: Consider implementing rate limiting for LinkedIn scraping.

LinkedIn has strict rate limiting policies. Consider implementing a rate limiter at the service level to prevent potential IP blocks or account restrictions.

Suggestions:

  1. Implement a token bucket rate limiter for LinkedIn requests
  2. Add configurable delays between requests
  3. Consider using proxy rotation if high throughput is needed

Would you like me to provide a code example for implementing these rate-limiting mechanisms?

services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts (2)

Line range hint 63-84: Consider optimizing GitHub identity selection.

The logic for handling multiple verified GitHub identities is correct but could be optimized and made more maintainable.

Consider extracting this logic into a helper function:

+ async function selectPrimaryGithubIdentity(
+   memberId: string,
+   identities: MemberIdentity[]
+ ): Promise<MemberIdentity | undefined> {
+   const verifiedGithubIdentities = identities.filter(
+     (i) =>
+       i.verified &&
+       i.platform === PlatformType.GITHUB &&
+       i.type === MemberIdentityType.USERNAME,
+   )
+
+   if (verifiedGithubIdentities.length <= 1) {
+     return verifiedGithubIdentities[0]
+   }
+
+   const identityWithMostActivities =
+     await findMemberIdentityWithTheMostActivityInPlatform(memberId, PlatformType.GITHUB)
+
+   return identityWithMostActivities
+     ? verifiedGithubIdentities.find(
+         (i) => i.value === identityWithMostActivities.username
+       )
+     : verifiedGithubIdentities[0]
+ }

- if (verifiedGithubIdentities.length > 1) {
-   const ghIdentityWithTheMostActivities =
-     await findMemberIdentityWithTheMostActivityInPlatform(input.id, PlatformType.GITHUB)
-   if (ghIdentityWithTheMostActivities) {
-     enrichmentInput.github = input.identities.find(
-       (i) =>
-         i.verified &&
-         i.platform === PlatformType.GITHUB &&
-         i.type === MemberIdentityType.USERNAME &&
-         i.value === ghIdentityWithTheMostActivities.username,
-     )
-   }
- } else {
-   enrichmentInput.github = verifiedGithubIdentities?.[0] || undefined
- }
+ enrichmentInput.github = await selectPrimaryGithubIdentity(input.id, input.identities)

Consider caching the activity counts to improve performance for frequently accessed members.


Line range hint 115-115: Track the pending data squasher implementation.

The TODO comment indicates a significant pending feature for LLM-based data squashing. This should be tracked separately to ensure it's not overlooked.

Would you like me to create a GitHub issue to track the implementation of the LLM-based data squasher and member entity enrichment logic?

services/apps/premium/members_enrichment_worker/src/sources/serp/service.ts (2)

Line range hint 39-49: Consider keeping the method synchronous.

The method has been changed to async but contains no asynchronous operations. Unless there's a specific requirement for consistency across enrichment services or future async operations are planned, keeping it synchronous would be more efficient and avoid unnecessary Promise wrapping.

-  async isEnrichableBySource(input: IEnrichmentSourceInput): Promise<boolean> {
+  isEnrichableBySource(input: IEnrichmentSourceInput): boolean {

Line range hint 39-129: Consider enhancing the service's robustness.

As this service is crucial for member enrichment and LinkedIn profile discovery, consider these improvements:

  1. Add rate limiting to prevent SERP API quota exhaustion
  2. Implement exponential backoff for API failures
  3. Add documentation about expected LinkedIn URL formats and username patterns
  4. Consider caching SERP API results to optimize quota usage

Example rate limiting implementation:

import rateLimit from 'express-rate-limit';

private async querySerpApiWithRateLimit(...args) {
  const limiter = rateLimit({
    windowMs: 60 * 1000, // 1 minute
    max: 10 // limit each IP to 10 requests per windowMs
  });
  
  // Implement exponential backoff
  const backoff = async (retries: number) => {
    if (retries === 0) throw new Error('Max retries reached');
    try {
      return await this.querySerpApi(...args);
    } catch (error) {
      if (error.response?.status === 429) { // Too Many Requests
        await new Promise(resolve => setTimeout(resolve, Math.pow(2, 4-retries) * 1000));
        return backoff(retries - 1);
      }
      throw error;
    }
  };

  return backoff(3);
}
services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts (1)

104-108: Add documentation and error handling

The new function would benefit from:

  1. JSDoc documentation explaining its purpose and return value
  2. Error handling for database failures

Consider updating the implementation:

+/**
+ * Retrieves enrichment cache data for all sources for a given member
+ * @param memberId - The ID of the member to fetch cache for
+ * @returns Promise resolving to an array of cache entries from all sources
+ * @throws {Error} If database query fails
+ */
 export async function findMemberEnrichmentCacheForAllSources(
   memberId: string,
 ): Promise<IMemberEnrichmentCache<IMemberEnrichmentData>[]> {
+  try {
     return findMemberEnrichmentCacheForAllSourcesDb(svc.postgres.reader.connection(), memberId)
+  } catch (error) {
+    svc.log.error('Failed to fetch enrichment cache for all sources', {
+      error,
+      memberId,
+    });
+    throw error;
+  }
 }
services/apps/premium/members_enrichment_worker/src/sources/clearbit/service.ts (2)

59-61: Consider adding email format validation.

Since this is a critical validation point for Clearbit enrichment, consider adding more robust email validation:

 async isEnrichableBySource(input: IEnrichmentSourceInput): Promise<boolean> {
-  return !!input.email?.value && input.email?.verified
+  const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
+  return !!input.email?.value && 
+         input.email?.verified && 
+         emailRegex.test(input.email.value);
 }

Line range hint 19-22: Consider improving configuration management.

The service has several hardcoded values that could be externalized to configuration:

  • enrichMembersWithActivityMoreThan = 10
  • cacheObsoleteAfterSeconds = 60 * 60 * 24 * 120 (120 days)

Consider moving these to environment variables or a configuration service for better maintainability and flexibility across environments.

services/apps/premium/members_enrichment_worker/src/sources/progai/service.ts (4)

Line range hint 121-125: Consider optimizing the async implementation.

While the async signature change aligns with the interface updates, the current implementation doesn't perform any async operations. Consider using Promise.resolve() for better performance:

  async isEnrichableBySource(input: IEnrichmentSourceInput): Promise<boolean> {
    const enrichableUsingGithubHandle = !!input.github?.value
    const enrichableUsingEmail = this.alsoUseEmailIdentitiesForEnrichment && !!input.email?.value
-   return enrichableUsingGithubHandle || enrichableUsingEmail
+   return Promise.resolve(enrichableUsingGithubHandle || enrichableUsingEmail)
  }

Line range hint 279-309: Enhance error handling and security for API calls.

The current error handling in API calls could be improved:

  1. Add specific error types for different failure scenarios
  2. Include retry mechanism for transient failures
  3. Validate API configuration at startup
  4. Add proper error context

Consider implementing this enhanced error handling:

class ProgAIError extends Error {
  constructor(message: string, public readonly cause?: Error) {
    super(message);
    this.name = 'ProgAIError';
  }
}

async getDataUsingGitHubHandle(githubUsername: string): Promise<IMemberEnrichmentDataProgAI> {
  if (!process.env['CROWD_ENRICHMENT_PROGAI_API_KEY']) {
    throw new ProgAIError('ProgAI API key not configured');
  }

  try {
    const url = new URL(`${process.env['CROWD_ENRICHMENT_PROGAI_URL']}/get_profile`);
    const config = {
      method: 'get',
      url: url.toString(),
      params: {
        github_handle: githubUsername,
        with_emails: true,
        api_key: process.env['CROWD_ENRICHMENT_PROGAI_API_KEY'],
      },
      headers: {},
    };

    const response = (await axios(config)).data;
    return response.profile;
  } catch (err) {
    if (axios.isAxiosError(err)) {
      if (err.response?.status === 404) {
        throw new ProgAIError(`Profile not found for GitHub handle: ${githubUsername}`);
      }
      if (err.response?.status === 401) {
        throw new ProgAIError('Invalid ProgAI API key');
      }
      throw new ProgAIError(`Failed to fetch profile: ${err.message}`, err);
    }
    throw new ProgAIError('Unexpected error during API call', err as Error);
  }
}

Line range hint 143-157: Add input validation for normalized data.

The normalize method handles complex data transformation but lacks input validation. Consider adding runtime checks to ensure data integrity:

interface DataValidator {
  validateData(data: IMemberEnrichmentDataProgAI): void;
}

class ProgAIDataValidator implements DataValidator {
  validateData(data: IMemberEnrichmentDataProgAI): void {
    if (data.work_experiences?.some(exp => !exp.company || !exp.title)) {
      throw new Error('Invalid work experience data: missing required fields');
    }
    if (data.skills?.some(skill => typeof skill.weight !== 'number')) {
      throw new Error('Invalid skill data: weight must be a number');
    }
    // Add more validation rules
  }
}

Add validation before normalization:

  normalize(data: IMemberEnrichmentDataProgAI): IMemberEnrichmentDataNormalized {
+   const validator = new ProgAIDataValidator();
+   validator.validateData(data);
    let normalized: IMemberEnrichmentDataNormalized = {
      identities: [],
      attributes: {},
      contributions: [],
      memberOrganizations: [],
    }
    // ... rest of the method
  }

Line range hint 251-262: Improve LinkedIn URL handling.

The current LinkedIn URL parsing is overly simplistic and might break with different URL formats. Consider implementing a more robust LinkedIn URL parser:

function parseLinkedInHandle(url: string): string | null {
  try {
    const linkedinUrl = new URL(url);
    if (!linkedinUrl.hostname.includes('linkedin.com')) {
      return null;
    }
    
    // Handle different LinkedIn URL formats
    const path = linkedinUrl.pathname.split('/').filter(Boolean);
    if (path[0] === 'in' && path[1]) {
      return path[1].split('?')[0]; // Remove query parameters
    }
    return null;
  } catch {
    return null;
  }
}

// Usage in fillPlatformData:
if (data.linkedin_url) {
  const linkedinHandle = parseLinkedInHandle(data.linkedin_url);
  if (linkedinHandle) {
    normalized = normalizeSocialIdentity(
      {
        handle: linkedinHandle,
        platform: PlatformType.LINKEDIN,
      },
      MemberIdentityType.USERNAME,
      true,
      normalized,
    );
  }
}
services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts (1)

535-550: Consider adding tenant isolation and optimizing query performance.

Since this function retrieves cache data from all sources, consider the following improvements:

  1. Add tenant isolation to prevent unauthorized access across tenants

  2. Add composite index for query optimization

  3. Add tenant isolation by modifying the query:

 select *
 from "memberEnrichmentCache"
 where 
-  "memberId" = $(memberId) and data is not null
+  "memberId" = $(memberId) 
+  and "tenantId" = $(tenantId)
+  and data is not null
  1. Add the following composite index to improve query performance:
CREATE INDEX IF NOT EXISTS idx_member_enrichment_cache_member_tenant 
ON "memberEnrichmentCache" ("memberId", "tenantId", "updatedAt" DESC)
WHERE data IS NOT NULL;
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (2)

59-62: Consider using optional chaining to simplify the conditions.

The conditions checking for the presence and validity of the input.linkedin object can be simplified using optional chaining. This will make the code more concise and easier to read.

Apply this diff to refactor the conditions:

-    return (
-      hasEnrichableLinkedinInCache ||
-      (input.linkedin && input.linkedin.value && input.linkedin.verified)
-    )
+    return hasEnrichableLinkedinInCache || input.linkedin?.value && input.linkedin.verified

...

-    if (
-      input.linkedin &&
-      input.linkedin.value &&
-      input.linkedin.verified &&
-      !linkedinUrlHashmap.get(input.linkedin.value)
-    ) {
+    if (input.linkedin?.value && input.linkedin.verified && !linkedinUrlHashmap.get(input.linkedin.value)) {

Also applies to: 94-99

🧰 Tools
🪛 Biome

[error] 61-61: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


144-157: Consider handling the case when normalizedProfiles is empty.

The normalize method currently returns null if normalizedProfiles is empty. However, the return type of the method is IMemberEnrichmentDataNormalized[], which does not include null.

To handle this case and maintain consistency with the return type, consider returning an empty array instead of null:

-    return normalizedProfiles.length > 0 ? normalizedProfiles : null
+    return normalizedProfiles
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d228e2c and be18b12.

📒 Files selected for processing (11)
  • services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts (3 hunks)
  • services/apps/premium/members_enrichment_worker/src/factory.ts (2 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/clearbit/service.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/progai/service.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/serp/service.ts (2 hunks)
  • services/apps/premium/members_enrichment_worker/src/types.ts (4 hunks)
  • services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (1 hunks)
  • services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts (1 hunks)
  • services/libs/types/src/enums/enrichment.ts (1 hunks)
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts

[error] 61-61: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 95-96: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (13)
services/apps/premium/members_enrichment_worker/src/factory.ts (2)

6-6: LGTM!

The import statement follows the established pattern and is correctly organized within the sources directory.


28-29: Verify configuration requirements for LinkedIn scraper.

Other enrichment services (e.g., ProgAI) utilize configuration flags like ALSO_USE_EMAIL_IDENTITIES_FOR_ENRICHMENT. Consider whether the LinkedIn scraper service needs similar configuration parameters for consistency and flexibility.

Let's check the implementation of the LinkedIn scraper service:

✅ Verification successful

Configuration flags are not needed for LinkedIn scraper service

The LinkedIn scraper service is designed to be a specialized service focused solely on enriching LinkedIn profiles, unlike the ProgAI service which supports multiple input types (GitHub, email). Key differences:

  • LinkedIn scraper has a single, fixed enrichment path using LinkedIn URLs
  • It already has built-in flexibility by consuming LinkedIn URLs from multiple sources (input, SERP cache, Clearbit cache)
  • The service doesn't need email-based enrichment as it's specifically for LinkedIn profile scraping
  • Configuration parameters like alsoUseEmailIdentitiesForEnrichment wouldn't add value to its focused functionality
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the LinkedIn scraper service uses any configuration flags
# and compare its implementation with other services

# Check the LinkedIn scraper service implementation
echo "Checking LinkedIn scraper service implementation:"
cat services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts

# Compare with ProgAI service to see configuration patterns
echo -e "\nComparing with ProgAI service implementation:"
cat services/apps/premium/members_enrichment_worker/src/sources/progai/service.ts

Length of output: 15989

services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (1)

27-27: LGTM! Verify timeout settings for LinkedIn scraping.

The addition of MemberEnrichmentSource.PROGAI_LINKEDIN_SCRAPER follows the existing pattern. However, LinkedIn scraping operations might require different timing configurations.

Let's verify the timeout and retry configurations in the LinkedIn scraper service:

✅ Verification successful

Timeout configuration is appropriate for LinkedIn scraping

The current implementation is well-configured for LinkedIn scraping operations:

  1. The parent workflow's 15-minute timeout is sufficient because:

    • The LinkedIn scraper makes a single API call to ProgAI service
    • The enrichMember activities have a 20-second timeout with proper retry configuration
    • The implementation includes proper error handling and retries
  2. The retry strategy in enrichMember is well-configured:

    • Initial interval: 5 seconds
    • Backoff coefficient: 2.0
    • Maximum interval: 30 seconds
    • Maximum attempts: 4
  3. The service includes cache management:

    • 120-day cache duration
    • Proper cache invalidation checks
    • Deduplication of LinkedIn URLs
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any specific timeout or rate limit configurations in the LinkedIn scraper service

# Test: Look for timeout or rate limit configurations
rg -A 5 "timeout|rate.?limit|delay" "services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/"

Length of output: 122


Script:

#!/bin/bash
# Let's try a broader search to understand the LinkedIn scraper implementation

# First, check if the directory exists and list its contents
fd . "services/apps/premium/members_enrichment_worker/src/sources"

# Then search for LinkedIn scraper related files in the entire codebase
rg -l "linkedin" "services/apps/premium/members_enrichment_worker/src/"

# Look for timeout configurations in the workflow file
rg -A 5 "timeout|executeChild" "services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts"

# Check the enum definition to understand available sources
rg "enum MemberEnrichmentSource" -A 10 "services/apps/premium/members_enrichment_worker/src/"

Length of output: 3138


Script:

#!/bin/bash
# Let's check the implementation of the LinkedIn scraper service and enrichMember workflow

# Check the LinkedIn scraper service implementation
cat "services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts"

# Check the enrichMember workflow to understand how sources are handled
cat "services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts"

# Look for retry configurations in the LinkedIn scraper
rg "retry|catch|error" "services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts"

Length of output: 9618

services/apps/premium/members_enrichment_worker/src/types.ts (4)

17-17: LGTM: Good addition of memberId field.

Adding a required memberId field improves the data structure by ensuring a stable identifier for member enrichment operations.


54-56: LGTM: Return type properly supports array normalization.

The normalize method's return type now correctly supports both single and array results, maintaining consistency with the IMemberEnrichmentData type changes.


40-40: LGTM: Good conversion to async operation.

Converting isEnrichableBySource to return a Promise is appropriate for external API checks. This matches the async pattern of hasRemainingCredits.

#!/bin/bash
# Verify all implementations handle the async change
ast-grep --pattern 'class $_ implements IEnrichmentService {
  $$$
  isEnrichableBySource($_) {
    $$$
  }
  $$$
}'

29-29: Verify handling of multiple ProgAI enrichment records.

The type now supports arrays of ProgAI enrichment data. Ensure that consumers of this type properly handle both single items and arrays.

✅ Verification successful

Based on the codebase analysis, I can now generate the final response since I have found sufficient evidence about how ProgAI enrichment data arrays are handled:

Array handling for ProgAI enrichment data is properly implemented

The codebase correctly handles arrays of ProgAI enrichment data:

  • EnrichmentServiceProgAILinkedinScraper properly returns and processes arrays in its getData() and normalize() methods
  • The normalize() method correctly handles arrays by iterating through profiles and normalizing each one
  • The IMemberEnrichmentData type union correctly includes both single IMemberEnrichmentDataProgAI and IMemberEnrichmentDataProgAI[]
  • The EnrichmentService interface's normalize() method is designed to handle both single items and arrays in its return type
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for places where IMemberEnrichmentData is consumed to verify array handling
rg -t typescript "IMemberEnrichmentData" -B 2 -A 5

Length of output: 118


Script:

#!/bin/bash
# Search for IMemberEnrichmentData usage with more context and including type definitions
rg -t typescript "IMemberEnrichmentData" -C 10

# Search for ProgAI related enrichment handling
rg -t typescript "ProgAI" -C 10

# Search for type imports of IMemberEnrichmentData
rg -t typescript "import.*IMemberEnrichmentData"

Length of output: 234


Script:

#!/bin/bash
# Search for IMemberEnrichmentData usage with more context
rg "IMemberEnrichmentData" -C 10

# Search for ProgAI related enrichment handling
rg "ProgAI" -C 10

# Look for files that might handle enrichment data
fd -e ts -e js enrichment

Length of output: 141360

services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts (1)

46-46: Verify proper handling of memberId across services.

The addition of memberId to enrichment input is architecturally sound and enables better tracking. However, we should verify its proper handling across services.

✅ Verification successful

Based on the results, I can now generate the final response:

The memberId usage in enrichment input is properly implemented and safe.

The verification shows that:

  • memberId is correctly defined as a required field in IEnrichmentSourceInput interface
  • It's properly used in cache operations across the enrichment workflow
  • The implementation in LinkedIn scraper service demonstrates consistent handling with proper cache lookups
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify memberId handling across enrichment services
# Expected: All enrichment services should handle memberId consistently

# Check enrichment service implementations
ast-grep --pattern 'class $_EnrichmentService {
  $$$
  getEnrichmentData($_: IEnrichmentSourceInput) {
    $$$
  }
}'

# Check cache operations with memberId
rg -A 5 "memberId.*IEnrichmentSourceInput"

Length of output: 165


Script:

#!/bin/bash
# Let's try a broader search to understand the enrichment service implementations

# Check for enrichment service classes
ast-grep --pattern 'class $_ implements IEnrichmentService {
  $$$
}'

# Look for IEnrichmentSourceInput interface definition and usage
rg -A 10 "interface IEnrichmentSourceInput"

# Search for memberId usage in enrichment context
rg -A 5 "memberId.*enrich"

# Look for cache operations
rg -A 5 "cache.*memberId"

Length of output: 3578

services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts (2)

4-4: LGTM: Import addition is consistent with the codebase pattern

The new import follows the established pattern of importing database functions from the data access layer.


45-47: Verify handling of array return type in downstream code

The return type change from a single object to potentially an array of normalized data suggests the LinkedIn scraper might return multiple profiles. This could impact downstream consumers expecting a single object.

Let's verify the consumers of this function:

✅ Verification successful

Let me analyze the consumer code further to understand how the array return type is handled.


Let me check one more aspect to understand how the data is being handled in the workflow.


Return type change is safely handled in the workflow

The array return type is safely handled in the workflow implementation:

  • The normalized data is stored in a toBeSquashed object keyed by source
  • There's a TODO for implementing a data squasher that will handle the consolidated data
  • The array return type actually provides more flexibility for the upcoming LLM-based data squashing implementation
  • No immediate consumers are making assumptions about single vs array return types
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for calls to normalizeEnrichmentData to verify array handling
rg -l "normalizeEnrichmentData" | while read -r file; do
  echo "=== $file ==="
  rg -A 5 "normalizeEnrichmentData" "$file"
done

Length of output: 2102


Script:

#!/bin/bash
# Check the implementation of data squashing logic and array handling
ast-grep --pattern 'const normalized = await normalizeEnrichmentData($source, $data)
  $$$'

Length of output: 93


Script:

#!/bin/bash
# Check the type definition to understand the data structure
rg -A 10 "interface IMemberEnrichmentDataNormalized" 

Length of output: 1780


Script:

#!/bin/bash
# Look for any data squashing implementation
rg -B 2 -A 5 "data squasher" 

Length of output: 521


Script:

#!/bin/bash
# Check the implementation details around the normalized data usage
rg -B 5 -A 10 "const normalized = await normalizeEnrichmentData"

Length of output: 1540

services/apps/premium/members_enrichment_worker/src/sources/clearbit/service.ts (1)

59-61: Verify if async operation is necessary for email validation.

The method has been made async but contains no asynchronous operations. While this change aligns with the interface, it adds unnecessary Promise overhead for a synchronous operation.

services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts (1)

535-550: Verify the usage pattern across the codebase.

Let's verify how this new function aligns with existing patterns and its usage.

✅ Verification successful

Let me generate additional verification scripts to check the implementation details and usage patterns more thoroughly.


Based on the comprehensive verification results, I can now provide the final response:

The implementation follows established patterns and is correctly integrated

The function findMemberEnrichmentCacheForAllSourcesDb is:

  • Properly typed with the generic IMemberEnrichmentCache<T> interface
  • Consistently uses the DbConnOrTx transaction pattern like other DB functions
  • Correctly used in the enrichment activities layer
  • Follows the same database access patterns as related functions

The only difference from findMemberEnrichmentCacheDb is the omission of the source filter, which is intentional as per the function's purpose to fetch all sources.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage patterns and similar functions

# Check for similar functions and their usage patterns
echo "Checking for similar functions and their patterns..."
rg --type ts "findMemberEnrichmentCache.*Db" -A 5

# Check for tenant isolation patterns in similar functions
echo "Checking tenant isolation patterns..."
rg --type ts "from \"memberEnrichmentCache\".*where" -A 5

# Look for potential callers of this function
echo "Looking for potential callers..."
rg --type ts "findMemberEnrichmentCacheForAllSourcesDb"

Length of output: 4552


Script:

#!/bin/bash

# Check the implementation of findMemberEnrichmentCache function to compare patterns
ast-grep --pattern 'function findMemberEnrichmentCache$_($$$) {
  $$$
}'

# Check for any enrichment-related interfaces and types
rg --type ts "interface.*MemberEnrichment" -A 5

# Look for any test files that might use these functions
fd -e test.ts -e spec.ts -x rg -l "findMemberEnrichmentCache"

# Check for any transaction usage patterns in similar functions
rg --type ts "tx: DbConnOrTx" -B 5 -A 10

Length of output: 16279

services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (1)

1-157: LGTM!

The implementation of the EnrichmentServiceProgAILinkedinScraper class looks good overall. It follows the expected structure, implements the necessary methods from the IEnrichmentService interface, and handles the retrieval and normalization of LinkedIn profile data using the ProgAI service.

The class effectively utilizes caching mechanisms, checks for enrichable identities, and ensures no duplicates are processed. Error handling is also implemented when making API calls to the ProgAI service.

The integration with the EnrichmentSourceServiceFactory and the updates to the getMembersToEnrich function ensure that this new enrichment source is properly incorporated into the existing system.

Great job on implementing this new feature!

🧰 Tools
🪛 Biome

[error] 61-61: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 95-96: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
services/apps/premium/members_enrichment_worker/src/sources/progai/service.ts (2)

Line range hint 121-126: Consider documenting the async change rationale.

While making the method async aligns with other enrichment services, the implementation remains synchronous. Consider adding a comment explaining why this method is async for future maintainers.

+  // Method is async to maintain consistency with other enrichment services
   async isEnrichableBySource(input: IEnrichmentSourceInput): Promise<boolean> {

312-342: Consider adding architectural safeguards.

The service makes external API calls but lacks several important architectural safeguards:

  1. Environment variable validation
  2. Retry mechanism for transient failures
  3. Rate limiting to prevent API quota exhaustion

Consider implementing these improvements:

  1. Add environment variable validation in the constructor
  2. Implement retry logic using exponential backoff
  3. Add rate limiting using a token bucket algorithm

Would you like me to provide implementation examples for any of these improvements?

services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (4)

59-60: Simplify condition using optional chaining

You can enhance readability by simplifying the condition with optional chaining.

Apply this diff:

     return (
       hasEnrichableLinkedinInCache ||
-      (input.linkedin && input.linkedin.value && input.linkedin.verified)
+      (input.linkedin?.value && input.linkedin.verified)
     )
🧰 Tools
🪛 Biome

[error] 60-60: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


108-108: Simplify condition using optional chaining

Similarly, simplify the condition here using optional chaining for better clarity.

Apply this diff:

-    if (input.linkedin && input.linkedin.value && input.linkedin.verified) {
+    if (input.linkedin?.value && input.linkedin.verified) {
🧰 Tools
🪛 Biome

[error] 108-108: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


145-146: Return an empty array instead of null

Returning an empty array when no profiles are found is preferable to returning null. This approach avoids potential null reference errors and simplifies the handling in consuming code.

Apply this diff:

     return profiles.length > 0 ? profiles : null
+    return profiles

180-181: Ensure consistent return type by returning an empty array

For consistency and to prevent null reference issues, consider returning an empty array instead of null when there are no normalized profiles.

Apply this diff:

     return normalizedProfiles.length > 0 ? normalizedProfiles : null
+    return normalizedProfiles
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between be18b12 and 3f6ca89.

📒 Files selected for processing (5)
  • services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/types.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/progai/service.ts (2 hunks)
  • services/apps/premium/members_enrichment_worker/src/types.ts (5 hunks)
  • services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • services/apps/premium/members_enrichment_worker/src/types.ts
  • services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts

[error] 60-60: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 108-108: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (4)
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/types.ts (2)

1-1: LGTM!

Clean import statement following TypeScript best practices.


3-8: Add JSDoc documentation to improve maintainability.

The interface structure is well-defined, but adding JSDoc documentation would improve maintainability and provide better context for the metadata fields.

Consider adding documentation like this:

+/**
+ * Represents enriched member data from LinkedIn scraping.
+ * Extends the base ProgAI enrichment data with LinkedIn-specific metadata.
+ */
 export interface IMemberEnrichmentDataProgAILinkedinScraper extends IMemberEnrichmentDataProgAI {
+  /**
+   * Additional metadata for LinkedIn enrichment
+   * @property {number} repeatedTimesInDifferentSources - Number of times this data appeared in different sources
+   * @property {boolean} isFromVerifiedSource - Indicates if the data comes from a verified LinkedIn source
+   */
   metadata: {
     repeatedTimesInDifferentSources: number
     isFromVerifiedSource: boolean
   }
 }

Also, consider adding runtime validation for these fields when processing the enrichment data to ensure data integrity.

✅ Verification successful

Based on the search results, I can see that these fields are actively used in the service implementation with proper validation and tracking logic. The service.ts file shows that:

  1. repeatedTimesInDifferentSources is properly tracked using a HashMap to count occurrences
  2. isFromVerifiedSource is set based on the source validation
  3. Both fields are properly initialized and updated throughout the enrichment process

JSDoc documentation would enhance code clarity, but the implementation shows proper field handling.

Consider adding the suggested JSDoc documentation to improve code readability and maintainability:

/**
 * Represents enriched member data from LinkedIn scraping.
 * Extends the base ProgAI enrichment data with LinkedIn-specific metadata.
 */
export interface IMemberEnrichmentDataProgAILinkedinScraper extends IMemberEnrichmentDataProgAI {
  /**
   * Additional metadata for LinkedIn enrichment
   * @property {number} repeatedTimesInDifferentSources - Number of times this data appeared in different sources
   * @property {boolean} isFromVerifiedSource - Indicates if the data comes from a verified LinkedIn source
   */
  metadata: {
    repeatedTimesInDifferentSources: number
    isFromVerifiedSource: boolean
  }
}
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if validation is implemented for these fields
rg -A 5 "repeatedTimesInDifferentSources|isFromVerifiedSource" --type ts

Length of output: 6590

services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (2)

42-55: Verify the necessity of the 'break' statements within the loop

The break statements at lines 51 and 54 may cause the for loop over caches to exit prematurely after processing the first matching cache. This could prevent subsequent caches from being evaluated, potentially missing additional enrichable LinkedIn identities.

Please verify if this immediate exit is intentional. If not, consider removing or adjusting the break statements to ensure all relevant caches are processed.


161-163: Improve error handling for Axios request

As previously noted, the getDataUsingLinkedinHandle method lacks specific error handling for the Axios request. Enhancing error handling will make the code more robust and provide better error insights.

Comment on lines +329 to +342
const url = `${process.env['CROWD_ENRICHMENT_PROGAI_URL']}/get_profile`
const config = {
method: 'get',
url,
params: {
email,
with_emails: true,
api_key: process.env['CROWD_ENRICHMENT_PROGAI_API_KEY'],
},
headers: {},
}

const response = (await axios(config)).data
return response?.profile || null
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Add error handling and reduce code duplication.

The method has similar issues as getDataUsingGitHubHandle and contains duplicated HTTP request logic.

  1. Add error handling and timeout:
   async getDataUsingEmailAddress(email: string): Promise<IMemberEnrichmentDataProgAI> {
     const url = `${process.env['CROWD_ENRICHMENT_PROGAI_URL']}/get_profile`
     const config = {
       method: 'get',
       url,
       params: {
         email,
         with_emails: true,
         api_key: process.env['CROWD_ENRICHMENT_PROGAI_API_KEY'],
       },
-      headers: {},
+      headers: {},
+      timeout: 5000, // 5 seconds timeout
     }
 
-    const response = (await axios(config)).data
-    return response?.profile || null
+    try {
+      const response = await axios<IMemberEnrichmentDataProgAIResponse>(config)
+      return response.data?.profile || null
+    } catch (error) {
+      this.log.error('Failed to fetch email profile from ProgAI', {
+        error,
+        email,
+      })
+      return null
+    }
   }
  1. Consider extracting the common HTTP request logic:
private async fetchProfile(params: Record<string, string>): Promise<IMemberEnrichmentDataProgAI> {
  const config = {
    method: 'get',
    url: `${process.env['CROWD_ENRICHMENT_PROGAI_URL']}/get_profile`,
    params: {
      ...params,
      with_emails: true,
      api_key: process.env['CROWD_ENRICHMENT_PROGAI_API_KEY'],
    },
    headers: {},
    timeout: 5000,
  }

  try {
    const response = await axios<IMemberEnrichmentDataProgAIResponse>(config)
    return response.data?.profile || null
  } catch (error) {
    this.log.error('Failed to fetch profile from ProgAI', {
      error,
      params,
    })
    return null
  }
}

Then update both methods to use it:

async getDataUsingGitHubHandle(githubUsername: string): Promise<IMemberEnrichmentDataProgAI> {
  return this.fetchProfile({ github_handle: githubUsername })
}

async getDataUsingEmailAddress(email: string): Promise<IMemberEnrichmentDataProgAI> {
  return this.fetchProfile({ email })
}

Comment on lines +312 to +325
const url = `${process.env['CROWD_ENRICHMENT_PROGAI_URL']}/get_profile`
const config = {
method: 'get',
url,
params: {
github_handle: githubUsername,
with_emails: true,
api_key: process.env['CROWD_ENRICHMENT_PROGAI_API_KEY'],
},
headers: {},
}

return response.profile
const response: IMemberEnrichmentDataProgAIResponse = (await axios(config)).data
return response?.profile || null
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling and request timeout.

The HTTP request lacks error handling and timeout configuration, which could lead to unhandled exceptions or hanging requests.

Consider applying these improvements:

   async getDataUsingGitHubHandle(githubUsername: string): Promise<IMemberEnrichmentDataProgAI> {
     const url = `${process.env['CROWD_ENRICHMENT_PROGAI_URL']}/get_profile`
     const config = {
       method: 'get',
       url,
       params: {
         github_handle: githubUsername,
         with_emails: true,
         api_key: process.env['CROWD_ENRICHMENT_PROGAI_API_KEY'],
       },
-      headers: {},
+      headers: {},
+      timeout: 5000, // 5 seconds timeout
     }
 
-    const response: IMemberEnrichmentDataProgAIResponse = (await axios(config)).data
-    return response?.profile || null
+    try {
+      const response = await axios<IMemberEnrichmentDataProgAIResponse>(config)
+      return response.data?.profile || null
+    } catch (error) {
+      this.log.error('Failed to fetch GitHub profile from ProgAI', {
+        error,
+        githubUsername,
+      })
+      return null
+    }
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const url = `${process.env['CROWD_ENRICHMENT_PROGAI_URL']}/get_profile`
const config = {
method: 'get',
url,
params: {
github_handle: githubUsername,
with_emails: true,
api_key: process.env['CROWD_ENRICHMENT_PROGAI_API_KEY'],
},
headers: {},
}
return response.profile
const response: IMemberEnrichmentDataProgAIResponse = (await axios(config)).data
return response?.profile || null
const url = `${process.env['CROWD_ENRICHMENT_PROGAI_URL']}/get_profile`
const config = {
method: 'get',
url,
params: {
github_handle: githubUsername,
with_emails: true,
api_key: process.env['CROWD_ENRICHMENT_PROGAI_API_KEY'],
},
headers: {},
timeout: 5000, // 5 seconds timeout
}
try {
const response = await axios<IMemberEnrichmentDataProgAIResponse>(config)
return response.data?.profile || null
} catch (error) {
this.log.error('Failed to fetch GitHub profile from ProgAI', {
error,
githubUsername,
})
return null
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (3)

61-61: Remove redundant break statement.

This break statement is redundant as there's already a break statement at line 59 within the same block.

-        break

75-131: LGTM with minor improvement suggestion.

The implementation effectively tracks and deduplicates LinkedIn identities while preserving metadata about verification status and occurrence frequency.

Consider using optional chaining for safer property access:

-if (input.linkedin && input.linkedin.value && input.linkedin.verified)
+if (input.linkedin?.value && input.linkedin?.verified)
🧰 Tools
🪛 Biome

[error] 115-115: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


133-165: LGTM with minor improvement suggestion.

The method effectively handles data fetching and cache management.

Consider using optional chaining for safer property access:

-if (existingProgaiCache && existingProgaiCache.data && (existingProgaiCache.data as IMemberEnrichmentDataProgAI).id == data.id)
+if (existingProgaiCache?.data && (existingProgaiCache.data as IMemberEnrichmentDataProgAI).id == data.id)
🧰 Tools
🪛 Biome

[error] 148-149: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3f6ca89 and b565f7a.

📒 Files selected for processing (1)
  • services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (1 hunks)
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts

[error] 67-67: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 115-115: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 148-149: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (3)
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (3)

23-42: LGTM! Well-structured class declaration with clear configuration.

The class is properly structured with appropriate inheritance and interface implementation. The configuration properties are well-defined, including a reasonable cache expiration time of 90 days for LinkedIn data.


185-201: LGTM! Well-implemented normalization logic.

The method correctly normalizes profiles while preserving metadata, following the established pattern of using the ProgAI service for base normalization.


167-183: ⚠️ Potential issue

Add proper error handling following established patterns.

The method needs robust error handling for API calls.

As mentioned in the previous review, implement proper error handling:

-    const response: IMemberEnrichmentDataProgAIResponse = (await axios(config)).data
-    return response?.profile || null
+    try {
+      const response: IMemberEnrichmentDataProgAIResponse = (await axios(config)).data
+      return response?.profile || null
+    } catch (err) {
+      if (axios.isAxiosError(err)) {
+        this.log.warn(
+          `Axios error occurred while getting ProgAI data: ${err.response?.status} - ${err.response?.statusText}`
+        )
+        if (err.response?.status === 404) {
+          throw new Error404NotFound('LinkedIn profile not found')
+        }
+        throw new Error(`ProgAI enrichment request failed with status: ${err.response?.status}`)
+      }
+      throw new Error500InternalServerError(`Unexpected error during ProgAI enrichment: ${err.message}`)
+    }

Comment on lines +71 to +73
async hasRemainingCredits(): Promise<boolean> {
return true
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement actual credit checking.

The method currently always returns true without verifying available credits. This could lead to issues if the ProgAI service has rate limits or credit restrictions.

Consider implementing actual credit checking by:

  1. Checking remaining credits from ProgAI API
  2. Maintaining a local counter for rate limiting
  3. Implementing proper error handling for when credits are exhausted

Would you like me to help implement this functionality?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants