Skip to content

About the Top-K mechanism #3

@zhuhz22

Description

@zhuhz22

Thank you for the excellent work! I have a few questions regarding the top-k KV cache selection mechanism, and I would greatly appreciate your clarification.

  1. Is the top-k mechanism applied during training as well, or is it only used during inference?

  2. As I understand it, the top-k mechanism is only used to select which tokens participate in the actual attention computation, while the historical KV cache itself still remains in memory without eviction. If so, the memory usage would still grow linearly with the generated sequence length. Is this understanding accurate?

Thank you for your time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions