mirror of
https://github.com/Zie619/n8n-workflows.git
synced 2025-11-25 03:15:25 +08:00
Added multiple markdown reports summarizing repository status, integration landscape, workflow analysis, and executive summaries. Introduced new Python modules for performance testing, enhanced API, and community features. Updated search_categories.json and added new templates and static files for mobile and communication interfaces.
10 KiB
10 KiB
N8N Workflow Documentation
- Troubleshooting Guide
Overview
This document details the challenges encountered during the workflow documentation process and provides solutions for common issues. It serves as a guide for future documentation efforts and troubleshooting similar problems.
Approaches That Failed
- Browser Automation with Playwright
What We Tried
javascript
// Attempted approach
await page.goto('<https://localhost:8000>');
await page.selectOption('#categoryFilter', 'Business Process Automation');
await page.waitForLoadState('networkidle');
```text
text
#
### Why It Failed
- **Dynamic Loading Bottleneck**: The web application loads all 2,055 workflows before applying client-side filtering
- **Timeout Issues**: Browser automation timed out waiting for the filtering process to complete
- **Memory Constraints**: Loading all workflows simultaneously exceeded browser memory limits
- **JavaScript Complexity**: The client-side filtering logic was too complex for reliable automation
#
### Symptoms
- Page loads but workflows never finish loading
- Browser automation hangs on category selection
- "Waiting for page to load" messages that never complete
- Network timeouts after 2
+ minutes
#
### Error Messages
```text
text
TimeoutError: page.waitForLoadState: Timeout 30000ms exceeded
Waiting for load state to be NetworkIdle
```text
text
#
##
2. Firecrawl with Dynamic Filtering
#
### What We Tried
```text
javascript
// Attempted approach
firecrawl_scrape({
url: "<https://localhost:8000",>
actions: [
{type: "wait", milliseconds: 5000},
{type: "executeJavascript", script: "document.getElementById('categoryFilter').value = 'Business Process Automation'; document.getElementById('categoryFilter').dispatchEvent(new Event('change'));"},
{type: "wait", milliseconds: 30000}
]
})
```text
text
#
### Why It Failed
- **60-Second Timeout Limit**: Firecrawl's maximum wait time was insufficient for complete data loading
- **JavaScript Execution Timing**: The filtering process required waiting for all workflows to load first
- **Response Size Limits**: Filtered results still exceeded token limits for processing
- **Inconsistent State**: Scraping occurred before filtering was complete
#
### Symptoms
- Firecrawl returns incomplete data (1 workflow instead of 77)
- Timeout errors after 60 seconds
- "Request timed out" or "Internal server error" responses
- Inconsistent results between scraping attempts
#
### Error Messages
```text
text
Failed to scrape URL. Status code:
408. Error: Request timed out
Failed to scrape URL. Status code:
500. Error: (Internal server error)
- timeout
Total wait time (waitFor
+ wait actions) cannot exceed 60 seconds
```text
text
#
##
3. Single Large Web Scraping
#
### What We Tried
Direct scraping of the entire page without category filtering:
```text
bash
curl -s "<https://localhost:8000"> | html2text
```text
text
#
### Why It Failed
- **Data Overload**: 2,055 workflows generated responses exceeding 25,000 token limits
- **No Organization**: Results were unstructured and difficult to categorize
- **Missing Metadata**: HTML scraping didn't provide structured workflow details
- **Pagination Issues**: Workflows are loaded progressively, not all at once
#
### Symptoms
- "Response exceeds maximum allowed tokens" errors
- Truncated or incomplete data
- Missing workflow details and metadata
- Unstructured output difficult to process
#
# What Worked: Direct API Strategy
#
## Why This Approach Succeeded
#
###
1. Avoided JavaScript Complexity
- **Direct Data Access**: API endpoints provided structured data without client-side processing
- **No Dynamic Loading**: Each API call returned complete data immediately
- **Reliable State**: No dependency on browser state or JavaScript execution
#
###
2. Manageable Response Sizes
- **Individual Requests**: Single workflow details fit within token limits
- **Structured Data**: JSON responses were predictable and parseable
- **Metadata Separation**: Workflow details were properly structured in API responses
#
###
3. Rate Limiting Control
- **Controlled Pacing**: Small delays between requests prevented server overload
- **Batch Processing**: Category-based organization enabled logical processing
- **Error Recovery**: Individual failures didn't stop the entire process
#
## Technical Implementation That Worked
```text
bash
# Step 1: Get category mappings (single fast call)
curl -s "${API_BASE}/category-mappings" | jq '.mappings'
# Step 2: Group by category
jq 'to_entries | group_by(.value) | map({category: .[0].value, count: length, files: map(.key)})'
# Step 3: For each workflow, get details
for file in $workflow_files; do
curl -s "${API_BASE}/workflows/${file}" | jq '.metadata'
sleep 0.05
# Small delay for rate limiting
done
```text
text
#
# Common Issues and Solutions
#
## Issue 1: JSON Parsing Errors
#
### Symptoms
```text
text
jq: parse error: Invalid numeric literal at line 1, column 11
```text
text
#
### Cause
API returned non-JSON responses (HTML error pages, empty responses)
#
### Solution
```text
bash
# Validate JSON before processing
response=$(curl -s "${API_BASE}/workflows/${filename}")
if echo "$response" | jq -e '.metadata' > /dev/null 2>&1; then
echo "$response" | jq '.metadata'
else
echo "{\"error\": \"Failed to fetch $filename\", \"filename\": \"$filename\"}"
fi
```text
text
#
## Issue 2: URL Encoding Problems
#
### Symptoms
- 404 errors for workflows with special characters in filenames
- API calls failing for certain workflow files
#
### Cause
Workflow filenames contain special characters that need URL encoding
#
### Solution
```text
bash
# Proper URL encoding
encoded_filename=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$filename'))")
curl -s "${API_BASE}/workflows/${encoded_filename}"
```text
text
#
## Issue 3: Missing Workflow Data
#
### Symptoms
- Empty fields in generated documentation
- "Unknown" values for workflow properties
#
### Cause
API response structure nested metadata under `.metadata` key
#
### Solution
```text
bash
# Extract from correct path
workflow_name=$(echo "$workflow_json" | jq -r '.name // "Unknown"')
# Changed to
workflow_name=$(echo "$response" | jq -r '.metadata.name // "Unknown"')
```text
text
#
## Issue 4: Script Timeouts During Bulk Processing
#
### Symptoms
- Scripts timing out after 10 minutes
- Incomplete documentation generation
- Process stops mid-category
#
### Cause
Processing 2,055 API calls with delays takes significant time
#
### Solution
```text
bash
# Process categories individually
for category in $categories; do
generate_single_category "$category"
done
# Or use timeout command
timeout 600 ./generate_all_categories.sh
```text
text
#
## Issue 5: Inconsistent Markdown Formatting
#
### Symptoms
- Trailing commas in integration lists
- Missing or malformed data fields
- Inconsistent status display
#
### Cause
Variable data quality and missing fallback handling
#
### Solution
```text
bash
# Clean integration lists
workflow_integrations=$(echo "$workflow_json" | jq -r '.integrations[]?' 2>/dev/null | tr '\n' ', ' | sed 's/, $//')
# Handle boolean fields properly
workflow_active=$(echo "$workflow_json" | jq -r '.active // false')
status=$([ "$workflow_active" = "1" ] && echo "Active" || echo "Inactive")
```text
text
#
# Prevention Strategies
#
##
1. API Response Validation
Always validate API responses before processing:
```text
bash
if ! echo "$response" | jq -e . >/dev/null 2>&1; then
echo "Invalid JSON response"
continue
fi
```text
text
#
##
2. Graceful Error Handling
Don't let individual failures stop the entire process:
```text
bash
workflow_data=$(fetch_workflow_details "$filename" || echo '{"error": "fetch_failed"}')
```text
text
#
##
3. Progress Tracking
Include progress indicators for long-running processes:
```text
bash
echo "[$processed/$total] Processing $filename"
```text
text
#
##
4. Rate Limiting
Always include delays to be respectful to APIs:
```text
bash
sleep 0.05
# Small delay between requests
```text
text
#
##
5. Data Quality Checks
Verify counts and data integrity:
```text
bash
expected_count=77
actual_count=$(grep "^###" output.md | wc -l)
if [ "$actual_count" -ne "$expected_count" ]; then
echo "Warning: Count mismatch"
fi
```text
text
#
# Future Recommendations
#
## For Similar Projects
1. **Start with API exploration*
* before attempting web scraping
2. **Test with small datasets*
* before processing large volumes
3. **Implement resume capability*
* for long-running processes
4. **Use structured logging*
* for better debugging
5. **Build in validation*
* at every step
#
## For API Improvements
1. **Category filtering endpoints*
* would eliminate need for client-side filtering
2. **Batch endpoints*
* could reduce the number of individual requests
3. **Response pagination*
* for large category results
4. **Rate limiting headers*
* to guide appropriate delays
#
## For Documentation Process
1. **Automated validation*
* against source API counts
2. **Incremental updates*
* rather than full regeneration
3. **Parallel processing*
* where appropriate
4. **Better error reporting*
* and recovery mechanisms
#
# Emergency Recovery Procedures
#
## If Process Fails Mid-Execution
1. **Identify completed categories**: Check which markdown files exist
2. **Resume from failure point**: Process only missing categories
3. **Validate existing files**: Ensure completed files have correct counts
4. **Manual intervention**: Handle problematic workflows individually
#
## If API Access Is Lost
1. **Verify connectivity**: Check tunnel/proxy status
2. **Test API endpoints**: Confirm they're still accessible
3. **Switch to backup**: Use alternative access methods if available
4. **Document outage**: Note any missing data for later completion
This troubleshooting guide ensures that future documentation efforts can avoid the pitfalls encountered and build upon the successful strategies identified.