Skip to content

feat(scripts): Add dependency version scanner tool#16867

Open
chalmerlowe wants to merge 38 commits intomainfrom
feat/add-version-scanner
Open

feat(scripts): Add dependency version scanner tool#16867
chalmerlowe wants to merge 38 commits intomainfrom
feat/add-version-scanner

Conversation

@chalmerlowe
Copy link
Copy Markdown
Contributor

@chalmerlowe chalmerlowe commented Apr 29, 2026

This adds a utility with the ability to scan for common references to dependencies (Python runtimes and package dependencies) to facilitate updating code when runtimes and dependencies change.

  • It can be run against an entire repo OR against specific packages within a monorepo
  • It is customizable with regex patterns and examples here
  • The test suite checks each regex against the examples to ensure the efficacy of the patterns
  • The current patterns account for edge cases such as finding < 3.8 when searching for references to 3.7 since they are semantically equivalent even if syntactically different.
  • The scanner produces a CSV report with:
path/filename, package name, line number, matching pattern, full line for context, etc.

@chalmerlowe chalmerlowe changed the title feat(scripts): Add dependency version scanner tool feat(scripts): [WIP] Add dependency version scanner tool Apr 29, 2026
@chalmerlowe chalmerlowe added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Apr 29, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new dependency version scanner, including a configuration-driven regex scanner, a benchmarking tool, and comprehensive unit and integration tests. The review feedback highlights several areas for improvement: optimizing regex compilation in the scanner to avoid performance bottlenecks, using the tempfile module in the benchmark script to prevent race conditions, removing redundant code, improving test robustness by checking subprocess exit codes, and adhering to PEP 8 by moving imports to the top of files.

Comment thread scripts/version_scanner/version_scanner.py Outdated
Comment thread scripts/version_scanner/benchmark.py Outdated
Comment thread scripts/version_scanner/benchmark.py Outdated
Comment thread scripts/version_scanner/tests/integration/test_scanner_integration.py Outdated
Comment thread scripts/version_scanner/tests/unit/test_version_scanner.py Outdated
Comment thread scripts/version_scanner/tests/unit/test_version_scanner.py Outdated
Comment thread scripts/version_scanner/version_scanner.py Outdated
@chalmerlowe chalmerlowe marked this pull request as ready for review May 5, 2026 13:03
@chalmerlowe chalmerlowe requested a review from a team as a code owner May 5, 2026 13:03
@chalmerlowe chalmerlowe removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label May 5, 2026
@chalmerlowe chalmerlowe changed the title feat(scripts): [WIP] Add dependency version scanner tool feat(scripts): Add dependency version scanner tool May 5, 2026
@chalmerlowe chalmerlowe added this to the Drop support for 3.7-3.9 milestone May 5, 2026
@parthea parthea self-assigned this May 6, 2026
@@ -0,0 +1,34 @@
import csv
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the copytight header is missing (applies to all code files)

Run the script from the repository root:

```bash
python3 scripts/version_scanner/version_scanner.py -d <dependency> -v <version> [options]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran this, I gt a ModuleNotFound error. is there a requirements.txt or anything that captures the dependencies?

This plan outlines the approach to update Python packages to drop support for end-of-life Python runtimes (3.7, 3.8, 3.9) OR for deprecated dependencies, and ensure the packages are configured for modern Python.

#### High-Level Strategy
- **One Branch Per Package**: To keep PRs manageable and isolated, we suggest a dedicated worktree and branch for each package (e.g., `feat/drop-<dependency>-<version>-<package-name>` i.e. `feat/drop-protobuf-4.25.8-google-cloud-bigquery`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for hand-written packages, right? I assume others would get their updates through the generator?

Should we recommend doing a generator update first, to clean up most of the packages?

@@ -0,0 +1,5 @@
packages/google-cloud-access-context-manager
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this?

self.variables = self._compute_variables()

def _compute_variables(self) -> Dict[str, str]:
"""Compute variables for interpolation from version string."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: more detailed comments/examples could be helpful for future maintainers. I'm not sure what a variable is, or the expected version string format

try:
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
skip_next = False
for line_num, line in enumerate(f, 1):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any issues with statements that span lines?

def upload_to_drive(csv_path: str, matches: List[Dict[str, str]], github_repo: str = None, branch: str = "main") -> str:
"""
Upload matches to a Google Sheet in Drive.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? It seems to add extra complexity, dependencies and test surface area, when Google Sheets makes it pretty easy to import a csv natively already

parts = rel_root.split(os.sep)

# Monorepo filtering
if target_packages and parts[0] == "packages":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's talk of separating the packages directory into separate ones for generated and handwritten libraries. Will that be easy to address here?


package_group.add_argument(
"--package",
help="Specific subdirectory filter (useful for monorepos)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specific to the structure of the monorepo's package directory? Os is this more of a generic subdirectory filter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants