Gospider Methods

Fast web crawling and content discovery tool written in Go. Complete guide with installation, crawling techniques, and automation methods.

7 Categories20+ CommandsCopy Ready

Phase 1

Installation

1Install Gospider using Go (requires Go 1.11+)

GO111MODULE=on go install github.com/jaeles-project/gospider@latest

2Verify installation and check version

gospider -v

Phase 2

Basic Crawling

1Crawl single site with 10 concurrent requests, depth 1

gospider -s "https://google.com/" -o output -c 10 -d 1

2Crawl multiple sites from list

gospider -S sites.txt -o output -c 10 -d 1

3Crawl 20 sites at same time with 10 bots each

gospider -S sites.txt -o output -c 10 -d 1 -t 20

Phase 3

3rd Party Sources

1Get URLs from Archive.org, CommonCrawl, VirusTotal, AlienVault

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source

2Include subdomains from 3rd party sources

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --include-subs

Phase 4

Custom Headers & Cookies

1Use custom headers and cookies

gospider -s "https://google.com/" -o output -c 10 -d 1 -H "Accept: */*" -H "Test: test" --cookie "testA=a; testB=b"

2Integrate with Burp Suite (export request from Burp)

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --burp burp_req.txt

Phase 5

Blacklist & Filtering

1Blacklist file extensions (default: jpg,jpeg,gif,css,tif,tiff,png,ttf,woff,woff2,ico)

gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(jpg|jpeg|gif|css|tif|tiff|png|ttf|woff|woff2|ico)"

2Blacklist extensions and filter by response length

gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(woff|pdf)" --length --filter-length "6871,24432"

Phase 6

Output & Integration

1Quiet output mode

gospider -s "https://google.com/" -o output -c 10 -d 1 -q

2Save output to file for further processing

gospider -s "https://google.com/" -o gospider_output.txt -c 10 -d 1

3Pipe output to grep for API documentation discovery

gospider -S targets.txt -o - -c 10 -d 1 | grep "swagger|openapi|redoc|rapidoc"

Phase 7

Features Overview

1High-performance crawling using Go's goroutines

Fast web crawling with Go concurrency

2Automatically discovers and parses XML sitemaps

Brute force and parse sitemap.xml

3Respects and parses robots.txt directives

Parse robots.txt

4Extracts URLs from JS files for complete coverage

Generate and verify link from JavaScript files

5Detects S3 bucket references in page source

Find AWS-S3 from response source

6Extracts subdomains found in responses

Find subdomains from response source

7Rotates User-Agent between mobile and web formats

Random mobile/web User-Agent

Tools

Tools & Resources

Gospider GitHub Repository

Official Gospider repository with full documentation

Go Language Download

Download Go to install Gospider (requires Go 1.11+)

CommonCrawl

One of the 3rd party sources integrated with Gospider

Archive.org

Wayback Machine integration for historical URL discovery