Lightning-AI · Borda · Jan 10, 2023 · Jan 4, 2023 · Jan 4, 2023 · Jan 5, 2023
@@ -60,4 +60,4 @@ Lightning supports dumping all reports to a directory to open using the tool.
     trainer = pl.Trainer(accelerator="ipu", devices=8, strategy=IPUStrategy(autoreport_dir="report_dir/"))
     trainer.fit(model)
 
-This will dump all reports to ``report_dir/`` which can then be opened using the Graph Analyser Tool, see `Opening Reports <https://docs.graphcore.ai/projects/graph-analyser-userguide/en/latest/graph-analyser.html#opening-reports>`__.
+This will dump all reports to ``report_dir/`` which can then be opened using the Graph Analyser Tool, see `Opening Reports <https://docs.graphcore.ai/projects/graph-analyser-userguide/en/latest/opening-reports.html>`__.
@@ -614,7 +614,7 @@ DeepSpeed ZeRO Stage 3
 ======================
 
 DeepSpeed ZeRO Stage 3 shards the optimizer states, gradients and the model parameters (also optionally activations). Sharding model parameters and activations comes with an increase in distributed communication, however allows you to scale your models massively from one GPU to multiple GPUs.
-**The DeepSpeed team report the ability to fine-tune models with over 40B parameters on a single GPU and over 2 Trillion parameters on 512 GPUs.** For more information we suggest checking the `DeepSpeed ZeRO-3 Offload documentation <https://www.deepspeed.ai/news/2021/03/07/zero3-offload.html>`__.
+**The DeepSpeed team report the ability to fine-tune models with over 40B parameters on a single GPU and over 2 Trillion parameters on 512 GPUs.** For more information we suggest checking the `DeepSpeed ZeRO-3 Offload documentation <https://www.deepspeed.ai/2021/03/07/zero3-offload.html>`__.
 
 We've ran benchmarks for all these features and given a simple example of how all these features work in Lightning, which you can see at `minGPT <https://github.com/SeanNaren/minGPT/tree/stage3>`_.
 

@@ -1210,7 +1210,7 @@ and the Trainer will apply Truncated Backprop to it.
 
 (`Williams et al. "An efficient gradient-based algorithm for on-line training of
 recurrent network trajectories."
-<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.7941&rep=rep1&type=pdf>`_)
+<https://ieeexplore.ieee.org/document/6797135>`_)
 
 `Tutorial <https://d2l.ai/chapter_recurrent-neural-networks/bptt.html>`_
 

@@ -412,3 +412,11 @@ def package_list_from_file(file):
 
 # ignore all links in any CHANGELOG file
 linkcheck_exclude_documents = [r"^(.*\/)*CHANGELOG.*$"]
+
+# ignore the following relative links (false positive errors during linkcheck)
+linkcheck_ignore = [
+    r"^starter/installation.html$",
+    r"^installation.html$",
+    r"^../cli/lightning_cli.html$",
+    r"^../common/trainer.html#trainer-flags$",
+]
@@ -16,7 +16,7 @@ In Lightning Transformers, we offer the following benefits:
 - Backed by `HuggingFace Transformers <https://huggingface.co/transformers/>`_ models and datasets, spanning multiple modalities and tasks within NLP/Audio and Vision.
 - Task Abstraction for Rapid Research & Experimentation - Build your own custom transformer tasks across all modalities with little friction.
 - Powerful config composition backed by `Hydra <https://hydra.cc/>`_ - simply swap out models, optimizers, schedulers task, and many more configurations without touching the code.
-- Seamless Memory and Speed Optimizations - Out-of-the-box training optimizations such as `DeepSpeed ZeRO <https://pytorch-lightning.readthedocs.io/en/latest/multi_gpu.html#deepspeed>`_ or `FairScale Sharded Training <https://pytorch-lightning.readthedocs.io/en/latest/multi_gpu.html#sharded-training>`_ with no code changes.
+- Seamless Memory and Speed Optimizations - Out-of-the-box training optimizations such as `DeepSpeed ZeRO <https://pytorch-lightning.readthedocs.io/en/stable/advanced/model_parallel.html#deepspeed>`_ or `FairScale Sharded Training <https://pytorch-lightning.readthedocs.io/en/stable/advanced/model_parallel.html#fairscale-sharded-training>`_ with no code changes.
 
 -----------------
 

@@ -61,10 +61,10 @@ Examples
 ********
 You can do pretty much anything with callbacks.
 
-- `Add a MLP to fine-tune self-supervised networks <https://lightning-bolts.readthedocs.io/en/stable/deprecated/callbacks/self_supervised.html#sslonlineevaluator>`_.
-- `Find how to modify an image input to trick the classification result <https://lightning-bolts.readthedocs.io/en/stable/deprecated/callbacks/vision.html#confused-logit>`_.
-- `Interpolate the latent space of any variational model <https://lightning-bolts.readthedocs.io/en/stable/deprecated/callbacks/variational.html#latent-dim-interpolator>`_.
-- `Log images to Tensorboard for any model <https://lightning-bolts.readthedocs.io/en/stable/deprecated/callbacks/vision.html#tensorboard-image-generator>`_.
+- `Add a MLP to fine-tune self-supervised networks <https://lightning-bolts.readthedocs.io/en/latest/callbacks/self_supervised.html#sslonlineevaluator>`_.
+- `Find how to modify an image input to trick the classification result <https://lightning-bolts.readthedocs.io/en/latest/callbacks/vision.html#confused-logit>`_.
+- `Interpolate the latent space of any variational model <https://lightning-bolts.readthedocs.io/en/latest/callbacks/variational.html#latent-dim-interpolator>`_.
+- `Log images to Tensorboard for any model <https://lightning-bolts.readthedocs.io/en/latest/callbacks/vision.html#tensorboard-image-generator>`_.
 
 
 --------------

@@ -115,7 +115,9 @@ Here is how you run DDP with 8 GPUs and `torch.bfloat16 <https://pytorch.org/doc
 
     lightning run model ./path/to/train.py --strategy=ddp --devices=8 --accelerator=cuda --precision="bf16"
 
-Or `DeepSpeed Zero3 <https://www.deepspeed.ai/news/2021/03/07/zero3-offload.html>`_ with mixed precision:
+
+Or `DeepSpeed Zero3 <https://www.deepspeed.ai/2021/03/07/zero3-offload.html>`_ with mixed precision:
+
 
 .. code-block:: bash
 

@@ -261,7 +261,7 @@ def _wrap_pruning_fn(pruning_fn: Callable, **kwargs: Any) -> Callable:
     def make_pruning_permanent(self, module: nn.Module) -> None:
         """Removes pruning buffers from any pruned modules.
 
-        Adapted from https://github.com/pytorch/pytorch/blob/1.7.1/torch/nn/utils/prune.py#L1176-L1180
+        Adapted from https://github.com/pytorch/pytorch/blob/v1.7.1/torch/nn/utils/prune.py#L1118-L1122
         """
         for _, module in module.named_modules():
             for k in list(module._forward_pre_hooks):

@@ -151,7 +151,8 @@ def custom_trigger_last(trainer):
             not be controlled by the callback.
 
     .. _PyTorch Quantization: https://pytorch.org/docs/stable/quantization.html#quantization-aware-training
-    .. _torch.quantization.QConfig: https://pytorch.org/docs/stable/torch.quantization.html#torch.quantization.QConfig
+    .. _torch.quantization.QConfig:
+        https://pytorch.org/docs/stable/generated/torch.quantization.qconfig.QConfig.html#qconfig
     """
 
     OBSERVER_TYPES = ("histogram", "average")

@@ -64,7 +64,7 @@ def __init__(
 
             device_iterations: Number of iterations to run on device at once before returning to host.
                 This can be used as an optimization to speed up training.
-                https://docs.graphcore.ai/projects/poptorch-user-guide/en/0.1.67/batching.html
+                https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/batching.html
             autoreport: Enable auto-reporting for IPUs using PopVision
                 https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html
             autoreport_dir: Optional directory to store autoReport output.