Get in Touch

Course Outline

Performance Concepts and Metrics

  • Latency, throughput, power consumption, and resource utilization
  • Differentiating system-level vs. model-level bottlenecks
  • Profiling techniques for inference versus training

Profiling on Huawei Ascend

  • Utilizing CANN Profiler and MindInsight
  • Diagnostics for kernels and operators
  • Understanding offload patterns and memory mapping

Profiling on Biren GPU

  • Performance monitoring via the Biren SDK
  • Kernel fusion, memory alignment, and execution queues
  • Power and temperature-aware profiling

Profiling on Cambricon MLU

  • Using BANGPy and Neuware performance tools
  • Kernel-level visibility and log interpretation
  • Integrating the MLU profiler with deployment frameworks

Graph and Model-Level Optimization

  • Strategies for graph pruning and quantization
  • Operator fusion and restructuring the computational graph
  • Standardizing input sizes and tuning batch parameters

Memory and Kernel Optimization

  • Optimizing memory layout and reuse
  • Efficient buffer management across different chipsets
  • Platform-specific kernel tuning techniques

Cross-Platform Best Practices

  • Performance portability: Abstraction strategies
  • Creating shared tuning pipelines for multi-chip environments
  • Example: Tuning an object detection model across Ascend, Biren, and MLU

Summary and Next Steps

Requirements

  • Experience with AI model training or deployment pipelines
  • Knowledge of GPU/MLU compute principles and model optimization
  • Basic familiarity with performance profiling tools and metrics

Audience

  • Performance engineers
  • Machine learning infrastructure teams
  • AI system architects
 21 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories