Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Overview of CANN Optimization Capabilities
- Understanding how inference performance is managed within CANN
- Defining optimization goals for edge and embedded AI systems
- Understanding AI Core utilization and memory allocation
Using Graph Engine for Analysis
- Introduction to the Graph Engine and execution pipeline
- Visualizing operator graphs and runtime metrics
- Modifying computational graphs for optimized performance
Profiling Tools and Performance Metrics
- Employing the CANN Profiling Tool (profiler) for workload analysis
- Analyzing kernel execution time and identifying bottlenecks
- Conducting memory access profiling and applying tiling strategies
Custom Operator Development with TIK
- Overview of TIK and the operator programming model
- Implementing custom operators using the TIK DSL
- Testing and benchmarking operator performance
Advanced Operator Optimization with TVM
- Introduction to TVM integration with CANN
- Auto-tuning strategies for computational graphs
- Guidelines for switching between TVM and TIK when appropriate
Memory Optimization Techniques
- Managing memory layout and buffer placement
- Techniques to minimize on-chip memory consumption
- Best practices for asynchronous execution and resource reuse
Real-World Deployment and Case Studies
- Case study: Performance tuning for a smart city camera pipeline
- Case study: Optimizing the inference stack for autonomous vehicles
- Guidelines for iterative profiling and continuous improvement
Summary and Next Steps
Requirements
- Deep understanding of deep learning model architectures and training workflows
- Practical experience deploying models via CANN, TensorFlow, or PyTorch
- Familiarity with Linux CLI, shell scripting, and Python programming
Target Audience
- AI performance engineers
- Specialists in inference optimization
- Developers working with edge AI or real-time systems
14 Hours