Skip to content

[db/models] Deduplicate repos by repo_src_id on CLI add#3719

Draft
devs6186 wants to merge 1 commit intochaoss:mainfrom
devs6186:fix/3056-repo-uniqueness-src-id
Draft

[db/models] Deduplicate repos by repo_src_id on CLI add#3719
devs6186 wants to merge 1 commit intochaoss:mainfrom
devs6186:fix/3056-repo-uniqueness-src-id

Conversation

@devs6186
Copy link

Changeset

  • is_valid_github_repo now returns repo_src_id (GitHub numeric ID) alongside repo_type
  • add_cli_repo passes repo_src_id through to insert_github_repo and insert_gitlab_repo
  • insert_github_repo checks for an existing Repo row with the same repo_src_id before inserting; returns its repo_id if found

Notes

When a GitHub repo moves (e.g. openai/triton -> triton-lang/triton), both the old and new URLs can be added via CLI. Because deduplication was URL-only, two separate rows would be created, causing neither to finish collection. The GitHub numeric repo ID is stable across renames and transfers, so we now capture it from the validation API call and check it before inserting. No schema change is needed — repo_src_id already exists on the table; we just start using it for lookups.

Related issues/PRs

Description

  • Added repo_src_id capture and pre-insert lookup to prevent duplicate rows for renamed repos

This PR fixes #3056

Notes for Reviewers
This only applies to the CLI add path. The web/frontend path was fixed separately in PR #2929.

Signed commits

  • Yes, I signed my commits.

Fixes chaoss#3056

- is_valid_github_repo now returns repo_src_id (GitHub numeric ID)
  alongside repo_type; the numeric ID is stable across renames
- add_cli_repo passes repo_src_id through to insert_github_repo and
  insert_gitlab_repo
- insert_github_repo checks for an existing Repo row with the same
  repo_src_id before inserting; if found, returns its repo_id so the
  renamed repo is not ingested twice under a different URL

Signed-off-by: devs6186 <devyanshsomvanshi@gmail.com>
@MoralCode MoralCode added the stale Stuff that's abandoned or not making forward progress and may need taking over/reassignment/closing label Feb 19, 2026
@MoralCode MoralCode marked this pull request as draft February 24, 2026 04:04
@MoralCode
Copy link
Contributor

seems like an okay change. not reviewing in detail since IMO this needs a deeper look to see exactly where/how the frontend handles this behavior in case there is a chance for code reuse here.

Contributor indicated this PR is abandoned so marking as draft and keeping the stale tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Stuff that's abandoned or not making forward progress and may need taking over/reassignment/closing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change repo uniqueness to be based on repo_src_id not url

2 participants