
Reminder and update regarding One-U Responsible AI Initiative resources on CHPC systems
Reminder: Resources from the One-U Responsible AI Initiative are available to researchers with AI-related projects
Computational resources and storage from the One-U Responsible AI Initiative (One-U RAI) are available to groups with AI-related research projects. You can apply for priority access to the GPU nodes and access to storage quickly by email; to do so, please send an email with a brief description of your AI-related project(s) and a request to access One-U RAI resources to [email protected]. Additionally, please note that the GPU nodes can be used by all CHPC users in a preemptable (guest) fashion.
The One-U RAI resources include ten GPU nodes (eight in the General Environment and two in the Protected Environment) and 1.8 PB of storage (50 TB per user) in the General Environment. Each of the GPU nodes has eight NVIDIA H200 GPUs, though some are configured as Multi-Instance GPUs (MIG) to increase throughput.
If you have any questions or concerns about available resources, please contact us by emailing [email protected].
One-U Responsible AI Initiative GPU reconfiguration on Redwood
The CHPC runs hardware purchased through the university's One-U Responsible AI (RAI) Initiative, which includes GPU compute nodes and storage. Two of these GPU compute nodes are in the Protected Environment (PE) in the Redwood cluster. One compute node has eight (full) H200 GPUs, while the other has these GPUs divided into smaller slices via the Multi-Instance GPU (MIG) feature. The smaller slices are not utilized as widely as the full H200 GPUs. Therefore, we are reconfiguring the node to contain six full H200s with two H200s partitioned into MIG instances (eight h200_1g.18gb instances and three h200_2g.35gb instances). We hope this change will improve utilization of the node while still supporting high-throughput jobs with more modest GPU requirements.