Skip to content

Tree node listing helpers + server-authoritative S3 tagging#403

Draft
ErykKul wants to merge 13 commits intodevelopfrom
feature/configurable-uploads
Draft

Tree node listing helpers + server-authoritative S3 tagging#403
ErykKul wants to merge 13 commits intodevelopfrom
feature/configurable-uploads

Conversation

@ErykKul
Copy link
Copy Markdown
Contributor

@ErykKul ErykKul commented Dec 5, 2025

What this PR does / why we need it:

Brings the dataverse-client-javascript SDK in line with the
server-authoritative S3 tagging design and adds two new SDK surfaces
that the upcoming reusable-components / tree-view track will consume.

1. Server-authoritative S3 tagging

Removes FilesConfig and the client-side useS3Tagging flag. The
x-amz-tagging header value is now read from the upload destination
response (FileUploadDestination.tagging), which is populated by the
server based on its dataverse.files.<driverId>.disable-tagging JVM
setting.

This fixes a class of silent upload failures: x-amz-tagging is part
of the presigned URL signature, so the client must send the header
exactly when the server included it in the signature. A client-side
boolean flag that has to stay in sync with a server JVM setting drifts
in practice. The server is now the single source of truth.

Pairs with a small additive change in the dataverse repo
(S3AccessIO.generateTemporaryS3UploadUrls adds the tagging field
to its JSON response when tagging is enabled).

2. Tree node listing helpers (#6691 prep)

Two new use cases backing the upcoming
GET /api/datasets/{id}/versions/{versionId}/tree endpoint:

  • listDatasetTreeNode — single-page lookup. Accepts path, limit,
    cursor, include (all/folders/files), order
    (NameAZ/NameZA), includeDeaccessioned, originals.
  • iterateDatasetTreeNode — async generator that walks the cursor
    chain so callers can consume all children of a folder without
    driving pagination by hand.

Wire format (folders first, then files; opaque keyset cursor; per-file
downloadUrl) is documented in the matching backend PR's Sphinx
section. The transformer falls back to defaults on unknown
order/include values for forward-compat.

3. Public re-export of DataverseApiAuthMechanism

Adds DataverseApiAuthMechanism to core/index.ts alongside
ApiConfig, so consumers can import the enum from the public package
surface instead of the deep path
@iqss/dataverse-client-javascript/dist/core/infra/repositories/ApiConfig.
Non-breaking additive change.

4. CI test-helper update for IQSS/dataverse#12182

The integration test helper setStorageDriverViaApi was using the old
/api/admin/dataverse/{alias}/storageDriver endpoint that PR #12182
moved to /api/dataverses/{alias}/storageDriver. Updated.

5. Build hygiene

format and lint npm scripts scoped to ./src so npm run format
and the husky pre-commit hook don't fail on systems where
test/environment/docker-dev-volumes is owned by container users
(typical local-Docker setups).

Which issue(s) this PR closes:

Prepares the SDK side of:

Related Dataverse PRs:

  • A small additive change to dataverse (S3AccessIO adds the
    tagging field) is landing alongside this on the
    IQSS/dataverse#6691_reusable_components branch — this PR can merge
    before or after that one (the field is optional).
  • The matching dataverse-frontend consumer is on
    feature/standalone-file-uploader.

Special notes for your reviewer:

  • FilesConfig is removed, not deprecated. There were no released
    versions exposing it as a public API — it lived briefly on this
    branch only. CHANGELOG covers it.
  • The tree node helpers don't depend on the matching backend endpoint
    being deployed — they will return whatever the server returns, with
    defaults on unknown enum values. The frontend repository in the
    consumer PR has a fallback path for installations without the
    endpoint.
  • iterateDatasetTreeNode is a for await...of generator, not a list.
    Callers control consumption.

Disclosure

Implementation was iterated with Claude Code (Anthropic) in a
pair-with-AI workflow; reviewers should evaluate accordingly. All
commits are authored by a human reviewer.

Suggestions on how to test this:

  1. Unit tests: npm run test:unit — covers the transformer, both
    tree-node use cases, and the updated DirectUploadClient /
    FilesRepository.
  2. Integration tests: npm run test:integration — exercises
    single-part and multipart upload against the LocalStack-backed
    container, including the tagging behaviour.
  3. Manual: point the dataverse-frontend standalone uploader at a
    prerelease of this package and verify file upload works against a
    Dataverse with dataverse.files.<driverId>.disable-tagging=true
    (no tagging header expected) and =false / unset (header expected).

Is there a release notes or changelog update needed for this change?:

Yes. CHANGELOG.md has an "Unreleased" entry covering the tree-node
helpers and the DataverseApiAuthMechanism re-export. The
server-authoritative tagging change is documented inline in the
Removed section of the prior FilesConfig work.

Additional documentation:

  • The new SDK surfaces are exported from the package root and from
    src/datasets/index.ts.
  • Cross-repo plan + decisions live in
    dataverse-context/tree_view_plan.md (working doc, not part of this
    PR).

Copilot AI review requested due to automatic review settings December 5, 2025 14:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds configurable file upload options to support S3-compatible storage systems that may not fully implement all S3 features, particularly object tagging. The implementation introduces a FilesConfig class for runtime configuration with three options: useS3Tagging (to disable S3 tagging headers for incompatible storage), maxMultipartRetries (configurable retry count), and fileUploadTimeoutMs (configurable timeout).

  • Introduces FilesConfig class with static configuration pattern for file upload settings
  • Refactors DirectUploadClient constructor to accept a configuration object instead of a plain number
  • Implements conditional S3 tagging header inclusion based on configuration

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/unit/files/FilesConfig.test.ts New test suite covering FilesConfig initialization and configuration retrieval
test/unit/files/DirectUploadClient.test.ts Updated tests to verify S3 tagging behavior and use new config object pattern
src/files/infra/clients/DirectUploadClient.ts Implements DirectUploadClientConfig interface and conditional S3 tagging in single-part uploads
src/files/index.ts Adds FilesConfig class with lazy initialization pattern for DirectUploadClient and UploadFile instances
CHANGELOG.md Documents new features and breaking changes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md Outdated
Comment thread src/files/index.ts Outdated
Comment thread test/unit/files/FilesConfig.test.ts Outdated
Comment thread test/unit/files/DirectUploadClient.test.ts Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@qqmyers
Copy link
Copy Markdown
Member

qqmyers commented Dec 5, 2025

FYI - there's already a dataverse.files..disable-tagging setting - is that the same as the new one here?

@ErykKul
Copy link
Copy Markdown
Contributor Author

ErykKul commented Dec 5, 2025

FYI - there's already a dataverse.files..disable-tagging setting - is that the same as the new one here?

If there is one already, then we do not need the new one. However, I know there is one in Java, I did not know there is one in Javascript too? I made this PR a draft, it is a part of a bigger idea: reusing the SPA parts in other UI's. I think it is something for the tech hours to discuss first. There are other 3 PR's coming to illustrate what I mean (one in the new SPA, one in dvwebloader and one in the JSF frontend). The dvwebloader that uses the SPA file upload can function without these PR's ever being merged, and the dvwebloader one is also optional (it is something that we want to use at KU Leuven, but it is fine if it is not mainstream).

@qqmyers
Copy link
Copy Markdown
Member

qqmyers commented Dec 5, 2025

A tech hour makes sense for the bigger picture.

re: the tagging - the store is configured on the server and the signed URLs either will/won't allow a tag header based on that (calls will fail if they don't match the server setting). Do we need an API so you can discover that setting? (We've recently been adding more settings to the storageDriver GET API).

ErykKul added 7 commits May 4, 2026 12:24
The x-amz-tagging header value is now taken from destination.tagging,
which is populated by the server when tagging is enabled. This removes
the client-side FilesConfig/useS3Tagging flag that duplicated the
backend DISABLE_S3_TAGGING JVM setting and would drift silently.

- FileUploadDestination: add tagging?: string
- fileUploadDestinationsTransformers: pass tagging through both paths
- DirectUploadClient: use destination.tagging as header value; remove
  useS3Tagging field and config option
- files/index.ts: remove FilesConfig class and lazy-init pattern;
  uploadFile is now a plain UploadFile instance
- package.json: scope prettier/eslint scripts to ./src to avoid
  permission errors scanning test/environment/docker-dev-volumes
@ErykKul
Copy link
Copy Markdown
Contributor Author

ErykKul commented May 4, 2026

I reverted the template functional test isolation change from this PR to keep the upload PR scoped.

The failing test appears to be shared-state/flakiness around :root templates, not caused by the upload changes. A proposed fix existed in commit 5a9f204, which isolated SetTemplateAsDefault.test.ts by creating a temporary collection and cleaning it up.

That commit can be reused in a separate PR dedicated to functional test isolation.

@ErykKul ErykKul marked this pull request as ready for review May 4, 2026 18:35
@ErykKul ErykKul marked this pull request as draft May 5, 2026 08:20
ErykKul added 3 commits May 5, 2026 11:55
`npm run format` and `npm run lint:eslint` traverse the whole repo by
default, which fails on systems where `test/environment/docker-dev-volumes`
contains directories owned by container users (e.g. solr/data) and is
not readable by the developer's UID. The pre-commit hook then aborts.

Narrowing the globs to `./src` keeps the formatters and linters running
on what we actually care about — application source — and lets the
pre-commit hook succeed regardless of how container volumes are
provisioned.
Adds DataverseApiAuthMechanism to the existing core/index.ts re-export
alongside ApiConfig so consumers don't have to deep-import it from
`@iqss/dataverse-client-javascript/dist/core/infra/repositories/ApiConfig`.

This is the SDK side of a small two-line change agreed with the
dataverse-frontend reusable-components track: once a prerelease ships
this export, the standalone uploader can import the enum from the
package's public surface. Until then, consumers can keep the deep
import.

Non-breaking additive change.
New use cases backing the paginated dataset version tree endpoint:

  GET /api/datasets/{id}/versions/{versionId}/tree

- listDatasetTreeNode: single-page lookup. Accepts path, limit,
  cursor, include (all/folders/files), order (NameAZ/NameZA),
  includeDeaccessioned, originals.
- iterateDatasetTreeNode: async generator that walks the cursor
  chain so callers can consume one folder's children without
  driving pagination by hand.

Wire format mirrors the backend response 1:1 (folder items carry
optional `counts`, file items add id/size/contentType/access/
checksum/downloadUrl). Order/include parsing falls back to
defaults on unknown values for forward-compat.

Includes Jest unit tests for the use cases and the transformer.
@ErykKul ErykKul force-pushed the feature/configurable-uploads branch from 9771c3b to 397d662 Compare May 5, 2026 10:28
PR #12182 merged on dataverse develop and moved the per-collection
storage-driver endpoint:

  OLD: PUT /api/admin/dataverse/{alias}/storageDriver
  NEW: PUT /api/dataverses/{alias}/storageDriver

The CI integration tests on PR #403 now run against a Dataverse
container that includes the move, so setStorageDriverViaApi was
hitting the old admin path and getting 404, which cascaded into
every dataset/file test that depends on the directUploadTestCollection
having LocalStack as its storage driver.

Fix: update setStorageDriverViaApi to use the new public endpoint.
The endpoint still requires X-Dataverse-Key for write operations
(superuser only), so authentication is unchanged.
@ErykKul ErykKul changed the title feat: add configurable file upload options and related tests Tree node listing helpers + server-authoritative S3 tagging May 5, 2026
- docs/useCases.md: add 'List a Folder of a Dataset Version (Tree View)'
  and 'Iterate a Folder of a Dataset Version (Tree View)' under Datasets
  read use cases, with example calls and notes on cursor / ETag /
  ordering. Adds matching TOC entries.
- CHANGELOG.md (Unreleased): add a one-line note about re-exporting
  DataverseApiAuthMechanism from the public surface so the standalone
  reusable-component bundles can import it without a deep path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants