transformers 10
- Parameter Efficient Fine Tuning Notes
- Notes on PyTorch's Distributed Data Parallel (DDP)
- Different Transformers Attention Variants
- Context Parallelism in Transformers: A Brief Overview
- Distributed Tensor (DTensor) in PyTorch: Overview
- Zero Redunduncy Optimizer (ZeRO): Paper Summary
- Distributed training technologies for Transformers: Overview
- Named Entity Recognition (NER) as Machine Reading Comprehension (MRC)
- Train BERT for Question Answering Task
- Abstractive Text Summarization with GPT2