By the end of this lesson, you will be able to:
Understanding and measuring the performance of Manus AI systems:
Metric | Description | Target Range | Measurement Method |
---|---|---|---|
Response Time (P95) | 95th percentile of time to complete requests | < 2000ms | Application logs, APM tools |
Throughput | Requests processed per second | > 100 RPS | Load testing, monitoring tools |
CPU Utilization | Percentage of CPU capacity used | 60-80% | System monitoring |
Memory Usage | Percentage of available memory used | 70-85% | System monitoring |
Error Rate | Percentage of requests resulting in errors | < 0.1% | Application logs, error tracking |
Approaches for establishing performance baselines and comparisons:
Figure 1: Performance Optimization Process
# Example load testing script using Locust
from locust import HttpUser, task, between
class ManusAPIUser(HttpUser):
wait_time = between(1, 3) # Wait 1-3 seconds between tasks
@task(3)
def generate_content(self):
self.client.post("/api/generate", json={
"prompt": "Write a product description for a smartphone",
"max_tokens": 500,
"temperature": 0.7
}, headers={"Authorization": "Bearer ${TOKEN}"})
@task(2)
def analyze_text(self):
self.client.post("/api/analyze", json={
"text": "The new product exceeded our expectations with its innovative features.",
"analysis_type": "sentiment"
}, headers={"Authorization": "Bearer ${TOKEN}"})
@task(1)
def summarize_document(self):
self.client.post("/api/summarize", json={
"url": "https://example.com/article",
"length": "medium"
}, headers={"Authorization": "Bearer ${TOKEN}"})
# Run with: locust -f locustfile.py --host=https://api.manus.ai
Continuous tracking of system performance:
Improving the efficiency of API interactions:
// Example of request batching in JavaScript
async function batchProcessDocuments(documents) {
// Instead of sending one request per document
// Send a batch of documents in a single request
const batchSize = 10;
const results = [];
for (let i = 0; i < documents.length; i += batchSize) {
const batch = documents.slice(i, i + batchSize);
try {
const response = await fetch('https://api.manus.ai/batch/analyze', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer ' + API_KEY
},
body: JSON.stringify({
documents: batch.map(doc => ({
id: doc.id,
text: doc.content,
analysis_type: 'sentiment'
}))
})
});
const batchResults = await response.json();
results.push(...batchResults);
// Respect rate limits with a small delay between batches
if (i + batchSize < documents.length) {
await new Promise(resolve => setTimeout(resolve, 100));
}
} catch (error) {
console.error('Error processing batch:', error);
// Handle error appropriately
}
}
return results;
}
Using caching to improve performance and reduce load:
Figure 2: Multi-Level Caching Architecture
# Example of response caching in Python with Redis
import redis
import json
import hashlib
from functools import wraps
# Initialize Redis client
redis_client = redis.Redis(host='localhost', port=6379, db=0)
def cache_response(expiration=3600):
"""
Decorator to cache API responses in Redis
Args:
expiration: Cache expiration time in seconds (default: 1 hour)
"""
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
# Create a cache key based on function name and arguments
key_parts = [func.__name__]
key_parts.extend([str(arg) for arg in args])
key_parts.extend([f"{k}:{v}" for k, v in sorted(kwargs.items())])
# Create a hash of the key parts for a compact cache key
cache_key = hashlib.md5(":".join(key_parts).encode()).hexdigest()
# Try to get from cache
cached_result = redis_client.get(cache_key)
if cached_result:
return json.loads(cached_result)
# If not in cache, call the original function
result = await func(*args, **kwargs)
# Store in cache
redis_client.setex(
cache_key,
expiration,
json.dumps(result)
)
return result
return wrapper
return decorator
# Example usage
@cache_response(expiration=1800) # Cache for 30 minutes
async def analyze_sentiment(text):
# This would normally call the Manus AI API
# For expensive operations, caching saves time and resources
response = await manus_client.analyze(
text=text,
analysis_type='sentiment'
)
return response
Efficient use of computational resources:
# Example of connection pooling in Python
import aiohttp
import asyncio
from contextlib import asynccontextmanager
class ManusClientPool:
def __init__(self, api_key, max_connections=10):
self.api_key = api_key
self.max_connections = max_connections
self.semaphore = asyncio.Semaphore(max_connections)
self.session = None
async def initialize(self):
"""Initialize the HTTP session pool"""
if self.session is None or self.session.closed:
self.session = aiohttp.ClientSession(
headers={"Authorization": f"Bearer {self.api_key}"}
)
async def close(self):
"""Close the HTTP session pool"""
if self.session and not self.session.closed:
await self.session.close()
@asynccontextmanager
async def acquire(self):
"""Acquire a connection from the pool"""
await self.initialize()
async with self.semaphore:
yield self.session
async def generate_content(self, prompt, **kwargs):
"""Generate content using a connection from the pool"""
async with self.acquire() as session:
async with session.post(
"https://api.manus.ai/generate",
json={"prompt": prompt, **kwargs}
) as response:
return await response.json()
async def analyze_text(self, text, analysis_type, **kwargs):
"""Analyze text using a connection from the pool"""
async with self.acquire() as session:
async with session.post(
"https://api.manus.ai/analyze",
json={"text": text, "analysis_type": analysis_type, **kwargs}
) as response:
return await response.json()
# Usage example
async def main():
client_pool = ManusClientPool(api_key="YOUR_API_KEY", max_connections=5)
try:
# Process multiple requests efficiently using the connection pool
tasks = []
for i in range(20):
if i % 2 == 0:
tasks.append(client_pool.generate_content(
f"Write a short paragraph about topic {i}",
max_tokens=200
))
else:
tasks.append(client_pool.analyze_text(
f"This is sample text {i} for analysis",
analysis_type="sentiment"
))
results = await asyncio.gather(*tasks)
print(f"Processed {len(results)} requests")
finally:
await client_pool.close()
# Run the example
asyncio.run(main())
Processing multiple operations simultaneously:
// Example of concurrent processing in Node.js
const axios = require('axios');
const pLimit = require('p-limit');
// Create a concurrency limit of 5 simultaneous requests
const limit = pLimit(5);
async function processDocuments(documents) {
try {
// Map each document to a limited promise
const promises = documents.map(doc => {
return limit(() => processDocument(doc));
});
// Wait for all promises to resolve
const results = await Promise.all(promises);
console.log(`Successfully processed ${results.length} documents`);
return results;
} catch (error) {
console.error('Error in batch processing:', error);
throw error;
}
}
async function processDocument(document) {
try {
// First analyze the document
const analysisResult = await axios.post('https://api.manus.ai/analyze', {
text: document.content,
analysis_type: 'comprehensive'
}, {
headers: { 'Authorization': `Bearer ${process.env.MANUS_API_KEY}` }
});
// Then generate a summary based on the analysis
const summaryResult = await axios.post('https://api.manus.ai/generate', {
prompt: `Summarize the following document, focusing on ${analysisResult.data.key_topics.join(', ')}:\n\n${document.content.substring(0, 500)}...`,
max_tokens: 200
}, {
headers: { 'Authorization': `Bearer ${process.env.MANUS_API_KEY}` }
});
return {
document_id: document.id,
analysis: analysisResult.data,
summary: summaryResult.data.content
};
} catch (error) {
console.error(`Error processing document ${document.id}:`, error.message);
return {
document_id: document.id,
error: error.message
};
}
}
Adding more instances to handle increased load:
Figure 3: Horizontal vs. Vertical Scaling
Increasing the resources of existing instances:
Designing systems that operate across multiple locations or environments:
# Example Kubernetes configuration for a distributed Manus AI deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: manus-api-gateway
namespace: manus-system
spec:
replicas: 3
selector:
matchLabels:
app: manus-api-gateway
template:
metadata:
labels:
app: manus-api-gateway
spec:
containers:
- name: api-gateway
image: manus/api-gateway:v1.2.3
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: manus-content-generator
namespace: manus-system
spec:
replicas: 5
selector:
matchLabels:
app: manus-content-generator
template:
metadata:
labels:
app: manus-content-generator
spec:
containers:
- name: content-generator
image: manus/content-generator:v1.2.3
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "2000m"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: manus-analyzer
namespace: manus-system
spec:
replicas: 3
selector:
matchLabels:
app: manus-analyzer
template:
metadata:
labels:
app: manus-analyzer
spec:
containers:
- name: analyzer
image: manus/analyzer:v1.2.3
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "2Gi"
cpu: "1000m"
---
apiVersion: v1
kind: Service
metadata:
name: manus-api-gateway
namespace: manus-system
spec:
selector:
app: manus-api-gateway
ports:
- port: 80
targetPort: 8080
type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
name: manus-content-generator
namespace: manus-system
spec:
selector:
app: manus-content-generator
ports:
- port: 8080
targetPort: 8080
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: manus-analyzer
namespace: manus-system
spec:
selector:
app: manus-analyzer
ports:
- port: 8080
targetPort: 8080
type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: manus-content-generator-hpa
namespace: manus-system
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: manus-content-generator
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Anticipating resource needs before they occur:
Balancing performance with resource costs:
# Example Terraform configuration for cost-optimized AWS deployment
module "manus_api_service" {
source = "./modules/ecs-service"
name = "manus-api"
cluster_id = aws_ecs_cluster.main.id
# Use a mix of instance types for cost optimization
instance_types = {
on_demand = {
type = "t3.medium"
count = 2 # Minimum guaranteed capacity
weight = 1
}
spot = {
types = ["t3.medium", "t3a.medium", "t2.medium"]
count = 8 # Maximum additional capacity
weight = 3
}
}
# Auto-scaling based on demand
auto_scaling = {
min_capacity = 2
max_capacity = 10
target_cpu_utilization = 70
scale_in_cooldown = 300
scale_out_cooldown = 60
}
# Schedule-based scaling for known peak times
scheduled_scaling = [
{
name = "business-hours"
schedule = "cron(0 8 ? * MON-FRI *)" # 8 AM weekdays
min_capacity = 4
max_capacity = 10
},
{
name = "evening-scale-down"
schedule = "cron(0 18 ? * MON-FRI *)" # 6 PM weekdays
min_capacity = 2
max_capacity = 6
},
{
name = "weekend-scale-down"
schedule = "cron(0 0 ? * SAT-SUN *)" # Midnight on weekends
min_capacity = 2
max_capacity = 4
}
]
# Lifecycle policy to terminate instances based on cost
termination_policies = [
"OldestLaunchTemplate",
"OldestLaunchConfiguration",
"ClosestToNextInstanceHour"
]
}
Optimizing for particular scenarios:
Develop a performance optimization plan for a Manus AI implementation:
Which of the following is NOT a key performance metric for Manus AI systems?
Code complexity is not a key performance metric for Manus AI systems. While code complexity can affect maintainability and development efficiency, it is not directly related to runtime performance. The key performance metrics include response time (how long it takes to complete a request), throughput (number of requests processed per unit of time), resource utilization, and error rate.
Which caching strategy involves storing and reusing intermediate results during complex operations?
Partial result caching involves storing and reusing intermediate results during complex operations. This strategy is particularly useful when operations involve multiple steps or calculations, allowing the system to avoid repeating work that has already been done. For example, in a multi-stage analysis pipeline, the results of early stages can be cached and reused for different final outputs.
Which scaling approach involves adding more CPU, memory, or storage to existing instances?
Vertical scaling involves adding more CPU, memory, or storage to existing instances. This approach increases the capacity of individual servers rather than adding more servers (which would be horizontal scaling). Vertical scaling is often simpler to implement but has upper limits based on the maximum capacity of available hardware.
You've completed the quiz on Performance Optimization for Manus AI.
You've earned your progress badge for this lesson!