NVIDIA NIM
GPU-accelerated inference with TensorRT-LLM optimization.
Overview
NVIDIA NIM (NVIDIA Inference Microservices) provides GPU-accelerated AI model serving with TensorRT-LLM optimization for enterprise-grade performance.
Official Website: https://build.nvidia.com
Key Features
- GPU Acceleration — TensorRT-LLM optimized inference
- Enterprise-Grade — High availability and security
- Self-Hosted Options — Deploy on your own infrastructure
- Comprehensive Catalog — LLMs, vision, multimodal models
- Free Tier — Available for prototyping
Usage Example
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.yuhuanstudio.com/v1"
)
response = client.chat.completions.create(
model="model-id",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)Available Models
Use the Models API to query available models:
curl https://api.yuhuanstudio.com/v1/models?provider=nvidia \
-H "Authorization: Bearer YOUR_API_KEY"Models and pricing are synced automatically from NVIDIA. Check the dashboard for current availability and rates.
Official Resources
How is this guide?