Gospider Methods

Fast web crawling and content discovery tool written in Go. Complete guide with installation, crawling techniques, and automation methods.

7 Categories20+ CommandsCopy Ready
Phase 1

Installation

1Install Gospider using Go (requires Go 1.11+)
GO111MODULE=on go install github.com/jaeles-project/gospider@latest
2Verify installation and check version
gospider -v
Phase 2

Basic Crawling

1Crawl single site with 10 concurrent requests, depth 1
gospider -s "https://google.com/" -o output -c 10 -d 1
2Crawl multiple sites from list
gospider -S sites.txt -o output -c 10 -d 1
3Crawl 20 sites at same time with 10 bots each
gospider -S sites.txt -o output -c 10 -d 1 -t 20
Phase 3

3rd Party Sources

1Get URLs from Archive.org, CommonCrawl, VirusTotal, AlienVault
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source
2Include subdomains from 3rd party sources
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --include-subs
Phase 4

Custom Headers & Cookies

1Use custom headers and cookies
gospider -s "https://google.com/" -o output -c 10 -d 1 -H "Accept: */*" -H "Test: test" --cookie "testA=a; testB=b"
2Integrate with Burp Suite (export request from Burp)
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --burp burp_req.txt
Phase 5

Blacklist & Filtering

1Blacklist file extensions (default: jpg,jpeg,gif,css,tif,tiff,png,ttf,woff,woff2,ico)
gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(jpg|jpeg|gif|css|tif|tiff|png|ttf|woff|woff2|ico)"
2Blacklist extensions and filter by response length
gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(woff|pdf)" --length --filter-length "6871,24432"
Phase 6

Output & Integration

1Quiet output mode
gospider -s "https://google.com/" -o output -c 10 -d 1 -q
2Save output to file for further processing
gospider -s "https://google.com/" -o gospider_output.txt -c 10 -d 1
3Pipe output to grep for API documentation discovery
gospider -S targets.txt -o - -c 10 -d 1 | grep "swagger|openapi|redoc|rapidoc"
Phase 7

Features Overview

1High-performance crawling using Go's goroutines
Fast web crawling with Go concurrency
2Automatically discovers and parses XML sitemaps
Brute force and parse sitemap.xml
3Respects and parses robots.txt directives
Parse robots.txt
4Extracts URLs from JS files for complete coverage
Generate and verify link from JavaScript files
5Detects S3 bucket references in page source
Find AWS-S3 from response source
6Extracts subdomains found in responses
Find subdomains from response source
7Rotates User-Agent between mobile and web formats
Random mobile/web User-Agent
Tools

Tools & Resources