Gospider Methods
Fast web crawling and content discovery tool written in Go. Complete guide with installation, crawling techniques, and automation methods.
7 Categories20+ CommandsCopy Ready
Phase 1
Installation
1Install Gospider using Go (requires Go 1.11+)
GO111MODULE=on go install github.com/jaeles-project/gospider@latest2Verify installation and check version
gospider -vPhase 2
Basic Crawling
1Crawl single site with 10 concurrent requests, depth 1
gospider -s "https://google.com/" -o output -c 10 -d 12Crawl multiple sites from list
gospider -S sites.txt -o output -c 10 -d 13Crawl 20 sites at same time with 10 bots each
gospider -S sites.txt -o output -c 10 -d 1 -t 20Phase 3
3rd Party Sources
1Get URLs from Archive.org, CommonCrawl, VirusTotal, AlienVault
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source2Include subdomains from 3rd party sources
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --include-subsPhase 5
Blacklist & Filtering
1Blacklist file extensions (default: jpg,jpeg,gif,css,tif,tiff,png,ttf,woff,woff2,ico)
gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(jpg|jpeg|gif|css|tif|tiff|png|ttf|woff|woff2|ico)"2Blacklist extensions and filter by response length
gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(woff|pdf)" --length --filter-length "6871,24432"Phase 6
Output & Integration
1Quiet output mode
gospider -s "https://google.com/" -o output -c 10 -d 1 -q2Save output to file for further processing
gospider -s "https://google.com/" -o gospider_output.txt -c 10 -d 13Pipe output to grep for API documentation discovery
gospider -S targets.txt -o - -c 10 -d 1 | grep "swagger|openapi|redoc|rapidoc"Phase 7
Features Overview
1High-performance crawling using Go's goroutines
Fast web crawling with Go concurrency2Automatically discovers and parses XML sitemaps
Brute force and parse sitemap.xml3Respects and parses robots.txt directives
Parse robots.txt4Extracts URLs from JS files for complete coverage
Generate and verify link from JavaScript files5Detects S3 bucket references in page source
Find AWS-S3 from response source6Extracts subdomains found in responses
Find subdomains from response source7Rotates User-Agent between mobile and web formats
Random mobile/web User-AgentTools