Yahoo Search Busca da Web

Resultado da Busca

  1. Há 4 dias · Machine learning has enabled major advances in the field of partial differential equations. This Review discusses some of these efforts and other ongoing challenges and opportunities for development.

  2. Há 4 dias · Pipeline parallelism improves both the memory and compute efficiency of deep learning training by partitioning the layers of a model into stages that can be processed in parallel. DeepSpeed’s training engine provides hybrid data and pipeline parallelism and can be further combined with model parallelism such as Megatron-LM.

  3. Há 5 dias · In this tutorial, we will apply the ZeRO optimizer to the Megatron-LM GPT-2 model. ZeRO is a powerful set of memory optimization techniques that enable effective training of large models with trillions of parameters, such as GPT-2 and Turing-NLG 17B. Compared to the alternative model parallelism approaches for training large models ...

  4. Há 5 dias · We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each ...

    • Andrew M. Saxe, James L. McClelland, Surya Ganguli
    • 2014
  5. Há 4 dias · In this tutorial we describe how to enable DeepSpeed-Ulysses. DeepSpeed-Ulysses is a simple but highly communication and memory efficient mechanism sequence parallelism approach for training of large transformer models with massive sequence lengths.

  6. Há 4 dias · The work by [] introduced the approach of solving IVPs using deep learning models by optimizing a model’s dynamics rather than solely its outputs. This approach allows deep learning models to approximate the dynamics of a system, provided an accurate description of these dynamics is known, typically in the form of differential equations.

  7. Há 1 dia · A deep learning framework, BubbleID, is proposed for bubble dynamic analysis. Unlike existing static image-based methods, our approach can mitigate the impact of bubble coalescence or overlapping on the accuracy of bubble dynamics analysis and is equipped with simultaneous static and dynamic feature extraction ability.