Overview

Websites are their own dedicated beast when it comes to enumeration. There are countless combinations of ports, web server configurations, and applications that could be the weakness onto the host.

Firewalls

Public facing instructure should be properly hardened to prevent an attack. As an attacker, verifying if a web app firewall exists is a good first step.

# detect firewall (stops after a signature match)
wafw00f https://target.xyz

# full signature detection
wafw00f -a https://target.xyz 

VHOST Enumeration

Virtual Hosts, or VHOSTS, allow a single server to host multiple websites and therefore multiple subdomains from one IP. This can be done through configuration of web servers such as Apache, NGinx or IIS. There are 3 methods for vhosting: - Name-Based - uses the HTTP Host header to determine where to direct the traffic. This can be accomplished with only a single IP and port on the server - tends to be the most common as it is easy to setup and cost-effective - IP-Based - each website has to have it's own unique IP, even if it's hosted on the same server. - tends to be more complex to setup, but offers better isolation on websites - Port-Based - each website has it's own unique port on the same IP. - again more complex, and not as common.

# map an IP to domain in hosts file
gobuster vhost -u http://<url> -w <wordlist> --append-domain -o <output_file>

Crawling

Crawling (or spidering) is an automated process to build a full map of a website. It starts at the homepage, finds all links on this page, then navigates into the child links recursively. Search engines use this functionality to build their indexes that are visible when performing a search.

Types

  • Breadth-First Crawling
    • top-down approach. It only goes one-level at a time
    • if a starting page has 5 links, it will capture these first, then go another layer deep and crawl all available links
  • Depth-First Crawling
    • immediately goes as deep as possible on the first link it sees before backtracking to root and going down the next links.

Scrapy is a python package that can be used to build a custom site scraper.

# scrapy runspider quotes_spider.py -o quotes.jsonl
import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        "https://quotes.toscrape.com/tag/humor/",
    ]

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "author": quote.xpath("span/small/text()").get(),
                "text": quote.css("span.text::text").get(),
            }

        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

Information

  • Links
  • Comments
  • Metadata
  • Sensitive Files (backup files, config files, log files.etc)

Source and Directory Browsing

Viewing the source code of the website or network tab to monitor requests may reveal further information - look for notes in comments - record the folders listed in any local assets (images, css, js). Good for checking for seeing if directory browsing is open, or to use for gobuster

robots.txt

This file is a standard that informs a particular user-agent (typically a crawler) what portions of the site it is allowed to crawl. It may contain Disallow lines that point to secrets or interesting locations.

Well-Known URLs

Follows the RFC 8615 standard, which outlines a directory on a website that contains critical metadata. Full list of URIs can be found here. For example, /.well-known/openid-configuration contains information on how OAUTH2 is configured.