Utilization and Performance Optimization Intern - Non-CMU Graduate Student - Pit
Pittsburgh, PA 
Share
Posted 21 days ago
Job/Internship Description

Pittsburgh Supercomputing Center is a joint computational research center with Carnegie Mellon University and the University of Pittsburgh. Established in 1986, PSC is supported by several federal agencies, the Commonwealth of Pennsylvania and private industry.

PSC provides university, government and industrial researchers with access to several of the most powerful systems for high-performance computing, communications and data storage available to scientists and engineers nationwide for unclassified research. PSC advances the state of the art in high-performance computing, communications and data analytics and offers a flexible environment for solving the largest and most challenging problems in computational science.

We are seeking a motivated and technically skilled intern to join our team for a summer project focused on optimizing GPU utilization for the Bridges-2 supercomputer. The intern will engage in a comprehensive project aiming to monitor and enhance the efficiency of GPU jobs running on Bridges-2. This project is critical for ensuring that researchers are fully utilizing the resources they request, thereby enhancing computational efficiency and research output. The intern will gain hands-on experience in high-performance computing (HPC), data analysis, and software development while contributing to the advancement of computational research capabilities.

The intern will be responsible for developing and implementing a system to monitor GPU utilization across all jobs running on the Bridges-2 supercomputer. This includes the generation of alerts for researchers when their jobs underutilize the requested resources. The project comprises several key steps, as outlined below:

  • Framework Identification: Identify popular frameworks to establish a baseline for GPU utilization. This step is essential for understanding the current landscape and setting realistic benchmarks.
  • Base Code Development: Create base code examples for efficiently running jobs on Bridges-2, serving as templates for researchers.
  • Configuration Definition: Define optimal node/GPU amount configurations for various types of jobs on the clusters.
  • Performance Benchmarking: Obtain base performance numbers using NGC containers and Bridges-2 modules, comparing these against Bridges-2-validation and reference Nvidia performance numbers.
  • Automated Testing: Create unittest sbatch jobs or similar for automating job submissions and performance evaluations.
  • GPU Utilization Monitoring: Develop a method to measure GPU utilization automatically from running jobs and deploy this system to the real cluster environment.
  • Data Ingest Prototype: Implement a data ingest Slurm-buffer prototype configuration (burst buffer) for enhanced data handling efficiency.
  • Performance Comparison: Compare job performance numbers running with original data in Ocean, Jet, and any other specified locations to optimize data transfer and processing speeds.
  • Network Optimization: Configure multiple IB interfaces on machines equipped with them and develop methods to easily measure InfiniBand throughput.
  • Performance Re-Evaluation: Re-run jobs under optimized conditions, including RDMA support if applicable, and compare new performance metrics to initial benchmarks.

Our internships offer the opportunity to gain:
  • Practical experience in monitoring and optimizing GPU utilization in one of the most popular supercomputers in the US.
  • Knowledge of high-performance computing (HPC) practices and challenges.
  • Skills in data analysis, software development, and system optimization.
  • An opportunity to contribute to the efficiency and effectiveness of computational research.

Flexibility, excellence, and passion are vital qualities within PCS. Inclusion, collaboration and cultural sensitivity are valued competencies at CMU. Therefore, we are in search of a team member who is able to effectively interact with a varied population of internal and external partners at a high level of integrity. We are looking for someone who shares our values and who will support the mission of the university through their work.

You should demonstrate:
  • Strong programming skills, preferably in Python, or similar languages relevant to system monitoring and performance analysis.
  • Basic understanding of high-performance computing (HPC) environments and GPU computing.
  • Ability to work independently and collaboratively in a research-focused environment.
  • Keen interest in computational research and performance optimization.

Qualifications:
  • Candidates must be pursuing a Master's degree. Examples of relevant majors are computer science, computer engineering, or any major with a significant computational/programming component
  • Excellent communication skills and ability to work in a team environment.
  • Excellent problem-solving skills and creativity.

Are you interested in this exciting opportunity?! Apply today!

Location

Pittsburgh, PA

Job Function

Non-CMU Students

Position Type

Intern (Fixed Term)

Full Time/Part time

Part time

Pay Basis

Hourly

More Information:

  • Please visit "" to learn more about becoming part of an institution inspiring innovations that change the world.

  • Click to view a listing of employee benefits

  • Carnegie Mellon University is an Equal Opportunity Employer/Disability/Veteran.


Carnegie Mellon University considers applicants for employment without regard to, and does not discriminate on the basis of, gender, race, protected veteran status, disability, or any other legally protected status.

 

Position Summary
Start Date
As soon as possible
Employment Type
Full or Part Time
Period of Employment
Open
Type of Compensation
Paid
College Credits Earned
No
Tuition Assistance
No
Required Student Status
Open
Preferred Majors
Other
Email this Job to Yourself or a Friend
Indicates required fields