Files
n8n-workflows/Documentation/troubleshooting.md
dopeuni444 ae8cf6dc5b Add comprehensive analysis and documentation files
Added multiple markdown reports summarizing repository status, integration landscape, workflow analysis, and executive summaries. Introduced new Python modules for performance testing, enhanced API, and community features. Updated search_categories.json and added new templates and static files for mobile and communication interfaces.
2025-09-29 05:10:12 +04:00

10 KiB

N8N Workflow Documentation

  • Troubleshooting Guide

Overview

This document details the challenges encountered during the workflow documentation process and provides solutions for common issues. It serves as a guide for future documentation efforts and troubleshooting similar problems.

Approaches That Failed

  1. Browser Automation with Playwright

What We Tried


javascript
// Attempted approach
await page.goto('<https://localhost:8000>');
await page.selectOption('#categoryFilter', 'Business Process Automation');
await page.waitForLoadState('networkidle');
```text

text

#

### Why It Failed

- **Dynamic Loading Bottleneck**: The web application loads all 2,055 workflows before applying client-side filtering

- **Timeout Issues**: Browser automation timed out waiting for the filtering process to complete

- **Memory Constraints**: Loading all workflows simultaneously exceeded browser memory limits

- **JavaScript Complexity**: The client-side filtering logic was too complex for reliable automation

#

### Symptoms

- Page loads but workflows never finish loading

- Browser automation hangs on category selection

- "Waiting for page to load" messages that never complete

- Network timeouts after 2

+ minutes

#

### Error Messages
```text

text
TimeoutError: page.waitForLoadState: Timeout 30000ms exceeded
Waiting for load state to be NetworkIdle
```text

text

#

##

 2. Firecrawl with Dynamic Filtering

#

### What We Tried
```text

javascript
// Attempted approach
firecrawl_scrape({
  url: "<https://localhost:8000",>
  actions: [
    {type: "wait", milliseconds: 5000},
    {type: "executeJavascript", script: "document.getElementById('categoryFilter').value = 'Business Process Automation'; document.getElementById('categoryFilter').dispatchEvent(new Event('change'));"},
    {type: "wait", milliseconds: 30000}
  ]
})
```text

text

#

### Why It Failed

- **60-Second Timeout Limit**: Firecrawl's maximum wait time was insufficient for complete data loading

- **JavaScript Execution Timing**: The filtering process required waiting for all workflows to load first

- **Response Size Limits**: Filtered results still exceeded token limits for processing

- **Inconsistent State**: Scraping occurred before filtering was complete

#

### Symptoms

- Firecrawl returns incomplete data (1 workflow instead of 77)

- Timeout errors after 60 seconds

- "Request timed out" or "Internal server error" responses

- Inconsistent results between scraping attempts

#

### Error Messages
```text

text
Failed to scrape URL. Status code:

 408. Error: Request timed out
Failed to scrape URL. Status code:

 500. Error: (Internal server error)

 - timeout
Total wait time (waitFor

 + wait actions) cannot exceed 60 seconds
```text

text

#

##

 3. Single Large Web Scraping

#

### What We Tried
Direct scraping of the entire page without category filtering:
```text

bash
curl -s "<https://localhost:8000"> | html2text
```text

text

#

### Why It Failed

- **Data Overload**: 2,055 workflows generated responses exceeding 25,000 token limits

- **No Organization**: Results were unstructured and difficult to categorize

- **Missing Metadata**: HTML scraping didn't provide structured workflow details

- **Pagination Issues**: Workflows are loaded progressively, not all at once

#

### Symptoms

- "Response exceeds maximum allowed tokens" errors

- Truncated or incomplete data

- Missing workflow details and metadata

- Unstructured output difficult to process

#

# What Worked: Direct API Strategy

#

## Why This Approach Succeeded

#

###

 1. Avoided JavaScript Complexity

- **Direct Data Access**: API endpoints provided structured data without client-side processing

- **No Dynamic Loading**: Each API call returned complete data immediately

- **Reliable State**: No dependency on browser state or JavaScript execution

#

###

 2. Manageable Response Sizes

- **Individual Requests**: Single workflow details fit within token limits

- **Structured Data**: JSON responses were predictable and parseable

- **Metadata Separation**: Workflow details were properly structured in API responses

#

###

 3. Rate Limiting Control

- **Controlled Pacing**: Small delays between requests prevented server overload

- **Batch Processing**: Category-based organization enabled logical processing

- **Error Recovery**: Individual failures didn't stop the entire process

#

## Technical Implementation That Worked

```text

bash

# Step 1: Get category mappings (single fast call)
curl -s "${API_BASE}/category-mappings" | jq '.mappings'

# Step 2: Group by category  
jq 'to_entries | group_by(.value) | map({category: .[0].value, count: length, files: map(.key)})'

# Step 3: For each workflow, get details
for file in $workflow_files; do
    curl -s "${API_BASE}/workflows/${file}" | jq '.metadata'
    sleep 0.05  

# Small delay for rate limiting
done
```text

text

#

# Common Issues and Solutions

#

## Issue 1: JSON Parsing Errors

#

### Symptoms
```text

text
jq: parse error: Invalid numeric literal at line 1, column 11
```text

text

#

### Cause
API returned non-JSON responses (HTML error pages, empty responses)

#

### Solution
```text

bash

# Validate JSON before processing
response=$(curl -s "${API_BASE}/workflows/${filename}")
if echo "$response" | jq -e '.metadata' > /dev/null 2>&1; then
    echo "$response" | jq '.metadata'
else
    echo "{\"error\": \"Failed to fetch $filename\", \"filename\": \"$filename\"}"
fi
```text

text

#

## Issue 2: URL Encoding Problems

#

### Symptoms

- 404 errors for workflows with special characters in filenames

- API calls failing for certain workflow files

#

### Cause
Workflow filenames contain special characters that need URL encoding

#

### Solution
```text

bash

# Proper URL encoding
encoded_filename=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$filename'))")
curl -s "${API_BASE}/workflows/${encoded_filename}"
```text

text

#

## Issue 3: Missing Workflow Data

#

### Symptoms

- Empty fields in generated documentation

- "Unknown" values for workflow properties

#

### Cause
API response structure nested metadata under `.metadata` key

#

### Solution
```text

bash

# Extract from correct path
workflow_name=$(echo "$workflow_json" | jq -r '.name // "Unknown"')

# Changed to
workflow_name=$(echo "$response" | jq -r '.metadata.name // "Unknown"')
```text

text

#

## Issue 4: Script Timeouts During Bulk Processing

#

### Symptoms

- Scripts timing out after 10 minutes

- Incomplete documentation generation

- Process stops mid-category

#

### Cause
Processing 2,055 API calls with delays takes significant time

#

### Solution
```text

bash

# Process categories individually
for category in $categories; do
    generate_single_category "$category"
done

# Or use timeout command
timeout 600 ./generate_all_categories.sh
```text

text

#

## Issue 5: Inconsistent Markdown Formatting

#

### Symptoms

- Trailing commas in integration lists

- Missing or malformed data fields

- Inconsistent status display

#

### Cause
Variable data quality and missing fallback handling

#

### Solution
```text

bash

# Clean integration lists
workflow_integrations=$(echo "$workflow_json" | jq -r '.integrations[]?' 2>/dev/null | tr '\n' ', ' | sed 's/, $//')

# Handle boolean fields properly
workflow_active=$(echo "$workflow_json" | jq -r '.active // false')
status=$([ "$workflow_active" = "1" ] && echo "Active" || echo "Inactive")
```text

text

#

# Prevention Strategies

#

##

 1. API Response Validation
Always validate API responses before processing:
```text

bash
if ! echo "$response" | jq -e . >/dev/null 2>&1; then
    echo "Invalid JSON response"
    continue
fi
```text

text

#

##

 2. Graceful Error Handling
Don't let individual failures stop the entire process:
```text

bash
workflow_data=$(fetch_workflow_details "$filename" || echo '{"error": "fetch_failed"}')
```text

text

#

##

 3. Progress Tracking
Include progress indicators for long-running processes:
```text

bash
echo "[$processed/$total] Processing $filename"
```text

text

#

##

 4. Rate Limiting
Always include delays to be respectful to APIs:
```text

bash
sleep 0.05  

# Small delay between requests
```text

text

#

##

 5. Data Quality Checks
Verify counts and data integrity:
```text

bash
expected_count=77
actual_count=$(grep "^###" output.md | wc -l)
if [ "$actual_count" -ne "$expected_count" ]; then
    echo "Warning: Count mismatch"
fi
```text

text

#

# Future Recommendations

#

## For Similar Projects

1. **Start with API exploration*

* before attempting web scraping

2. **Test with small datasets*

* before processing large volumes

3. **Implement resume capability*

* for long-running processes

4. **Use structured logging*

* for better debugging

5. **Build in validation*

* at every step

#

## For API Improvements

1. **Category filtering endpoints*

* would eliminate need for client-side filtering

2. **Batch endpoints*

* could reduce the number of individual requests

3. **Response pagination*

* for large category results

4. **Rate limiting headers*

* to guide appropriate delays

#

## For Documentation Process

1. **Automated validation*

* against source API counts

2. **Incremental updates*

* rather than full regeneration

3. **Parallel processing*

* where appropriate

4. **Better error reporting*

* and recovery mechanisms

#

# Emergency Recovery Procedures

#

## If Process Fails Mid-Execution

1. **Identify completed categories**: Check which markdown files exist

2. **Resume from failure point**: Process only missing categories

3. **Validate existing files**: Ensure completed files have correct counts

4. **Manual intervention**: Handle problematic workflows individually

#

## If API Access Is Lost

1. **Verify connectivity**: Check tunnel/proxy status

2. **Test API endpoints**: Confirm they're still accessible

3. **Switch to backup**: Use alternative access methods if available

4. **Document outage**: Note any missing data for later completion

This troubleshooting guide ensures that future documentation efforts can avoid the pitfalls encountered and build upon the successful strategies identified.