Archives
- 15 Jun Parameter Efficient Fine Tuning Notes
- 13 Jun Basic_principles_distributed_training
- 13 May Notes on PyTorch's Distributed Data Parallel (DDP)
- 22 Jan Different Transformers Attention Variants
- 28 Sep Context Parallelism in Transformers: A Brief Overview
- 28 Sep Distributed Tensor (DTensor) in PyTorch: Overview
- 28 Sep Zero Redunduncy Optimizer (ZeRO): Paper Summary
- 30 Aug Distributed training technologies for Transformers: Overview