deep-learning 9
- FP8 Quantization and Beyond: From First Principles to Intel Neural Compressor
- PyTorch Memory Deep Dive: view, reshape, transpose, permute and the Contiguity Puzzle
- Notes on PyTorch's Distributed Data Parallel (DDP)
- Different Transformers Attention Variants
- Distributed training technologies for Transformers: Overview
- Named Entity Recognition (NER) as Machine Reading Comprehension (MRC)
- Train BERT for Question Answering Task
- Abstractive Text Summarization with GPT2
- How to Improve YOLOv3