Not exactly wizard stuff today, more like back to basics perhaps – but sometimes they’re worth revisiting. I’ve had some good DirBuster finds three tests in a row so I thought I’d write them up as a case study. It’s a reminder that there’s some very low-hanging fruit out there that may not always get picked. I’ve also put together a walk-through for many of DirBuster’s features and I aim to show that, as with many tools, a few minutes of manual work can produce a faster set of more meaningful results.
If you know what DirBuster is then you can skip this paragraph. If you don’t, then DirBuster is designed to brute-force directory and file names on web servers, the point being to find content to which there are no links. It’s an OWASP project and you can find it here. While you can run it in a pure brute-force mode, you’ll most likely be using a dictionary to maximise your chances of finding something in the time available. DirBuster comes with a set of dictionaries that were generated by crawling the internet for real directory and file names.
Cheer number 1
On a test of a web portal DirBuster found pages at /users/
and /organisations/
. The portal was a closed system used by the owner to exchange financial information with many other organisations in (what was supposed to be) an isolated way. Sorry to be vague but you understand why! Navigating to /users/
opened up a whole user management area, with full names, email addresses, roles, last login etc. At /organisations/
there was an organisation management area, from where you could access the same user details from other organisations. Oops. While unauthorised data access was possible, attempts to execute administrative functions failed – but the fact that these functions were exposed was useful in itself because there was no CSRF protection. Moreover it was simple to target an administrator (of any organisation) because you could look them up from the user listings. The only saving grace was that you had to be authenticated – a point I’ll return to later.
Cheer number 2
On a public website for a high-street company, DirBuster found the page /staff/
. This revealed a staff discount page where you could go through and order stuff at significant discounts, meaning lost revenue to the client. Of course, this sort of thing has a habit of getting out on to discount sites and the like. The page was available unauthenticated (although since anyone could register for an account, that’s by the bye).
Cheer number 2½: DirBuster also found a page that had a special offer for readers of a particular publication. Not as important this one since it was obviously there for the taking but it clearly wasn’t designed to be available to all.
Cheer number 3
On a test of a web portal, while authenticated, DirBuster found a positive response from /admin
. This turned out to be an authorisation flaw and a short time later, after some fuzzing of user IDs, I had some 2,300 usernames and email addresses as well as plaintext passwords for about a third of those accounts. This portal was used by many different organisations – and a user from one of them could log in to another user’s account from another organisation. Oops.
In fact I had a fourth cheer yesterday, where I found a page that allowed me to self-register unauthenticated on (what was supposed to be) a closed site! But “four cheers for DirBuster” sounds a bit naff.
Walk-through
The rest (and majority) of this article is a walk-through of the main DirBuster configuration options. Note that I’m describing a general case in what follows and obviously there may be times when you need to do things differently. That’s an important part of pentesting: adapting your test to suit the target. Having said that, let’s take a look at the starting screen (of version 1.0 RC1, on which this article is based):
Target URL
For the “Target URL” consider HTTP vs HTTPS. HTTP is obviously faster but a website will often redirect some or all requests to the HTTPS equivalent whether the page is actually there or not, which will spoil your results. You can enable “Follow Redirects” from the Options menu but that’s a considerable overhead if it’s happening with every request. If the redirect happens only when the page exists then a HTTP-based scan should be speedier. My personal preference is that if the site is happy delivering HTTP pages over HTTPS, which is normal, I’ll go for HTTPS. Despite the overhead slowing down the request rate, it does tend to rule out excessive redirects since it would be unusual for a HTTPS request to be redirected to a HTTP equivalent. Redirects may also confuse the “fail case”, which DirBuster uses to decide how it knows whether or not a guess is correct, which could lead to false negatives as well as false positives. More on this later.
A similar situation may arise with the domain in that https://site.com/page may always redirect to https://www.site.com/page so use https://www.site.com:443 as your base URL.
Work Method
The default “Auto Switch” mode is probably best for the majority of cases. DirBuster will first try to see if it can get sensible results from HEAD requests, the reason being that the responses will be smaller. Even though it makes a GET request on 200 responses, this will save time when the 404 message (or equivalent) is relatively large. On the site I was looking at when writing this bit, the full HTML 404 response was about 19kB bigger than the disembodied 404 set of headers you’d get with HEAD. A crude bit of testing showed this took on average twice as long to arrive and be processed, adding 200ms to the response time. Given that you’re getting 404s most of the time this could mean a saving, even with the small dictionary, of over 1.4 gigabytes or 4 hours of waiting!
Number Of Threads
Running DirBuster with a high number of threads can slow down the target server, which may not go down too well if you’re testing a live site. You’ll probably find the default (10) to be a little over-enthusiastic, especially as you’ll be running other tests simultaneously. If you examine the number of threads in the DirBuster process (javaw.exe) while it’s running, you’ll see it jump up by more than the number you set in this field. I haven’t looked at the source code but I’m assuming that DirBuster is indeed honouring this field. I imagine that the “number of threads” refers to “Workers” that handle the actual requests and responses over the network while the other threads, for example, manage different queues depending on what you tick at the bottom of the screen.
As an aside, I’ve noticed that when you run a number of scans without re-starting DirBuster, the number of threads at rest tends to increase. I’m not sure if this is an issue that could degrade performance but just bear it in mind. (I did try to contact the project lead, James Fisher, to ask about threading but I got no reply. And it’s not that big a deal to warrant rummaging through the source code!)
I have DirBuster running on another monitor so I can keep an eye on the requests per second and any sudden scrolling, which usually means errors! Bear in mind that, say, 20 requests per second over HTTPS will be working the server harder than 20 requests per second over HTTP. A nice feature is that once the scan is running, you can dynamically change the number of threads.
Dictionary
Assuming you opt for “List based brute force” you’ll now need to choose a dictionary – and for this you need to know whether or not your directories are case sensitive. Although you can often guess this from the server in use, e.g. IIS isn’t case sensitive, it’s always best to check. So test a page that you know to exist, i.e. does /page
return the same as /Page
? Even when the server is case-sensitive, a look over the site map in your web proxy may show that all the pages you’ve requested are in fact lower case. But don’t go thinking that using the case-sensitive lists will take all that much longer. Clicking “List Info” brings up some statistics on the dictionaries, a portion of which is shown below:
You can see that the case-sensitive lists are nowhere near even twice the size of the lowercase versions, which you might have imagined as a minimum. That’s because the lists are based on real names found by crawling the internet. The file “directory-list-2.3-small.txt” has 87,650 entries while the lowercase version has 81,629 entries so it’s only 6,021 entries longer (about 7% bigger). For the medium-sized lists the numbers are 220,546 vs 207,629 so the case-sensitive version is 12,917 entries longer (about 6% bigger). So using the case-sensitive lists may not involve as big a hit as you might expect. (You can also see from the List Info what the actual difference is between big, medium and little: the entries were found on at least 1, 2 and 3 hosts respectively.)
Before you even start your attack you could consider putting together a small dictionary of a few directories and files you’ve found, together with some gibberish entries, to use on a test run. If you don’t see the results you expect, review your configuration bearing in mind some of the points from this article. A short test run might save you hours of wasted effort.
Starting options
The “Standard start point” will assume directories end with / and files end with whatever you configure underneath. The “URL Fuzz” option allows you to insert the dictionary entries into the URL in a non-standard way. A good illustration is to discuss why there’s an Apache user enumeration list included in the set of dictionaries (apache-user-enum-2.0.txt). This is because if the userdir
module is enabled (more on this here) you can go hunting for usernames based on the fact that the user “bob” will have a folder mapped to http://site.com/~bob/. So in this example the URL to fuzz would be /~{dir}/
where {dir}
is a placeholder for the words in the chosen dictionary.
The remaining options are self-explanatory but there are still a few things to consider. Obviously the more options you tick the longer the scan will take. So look first at the style of URL the website uses. For example, you might find that requests to /page
produce redirects to /page/
or that both of these return the same response. Either way, don’t run “Brute Force Dirs” together with “Brute Force Files”+”Use Blank Extension” because you’re doing twice the amount of work to get the same result. Conversely if you spot that there doesn’t seem to be much content in directories, i.e. none of the pages end with a / character, then don’t run “Brute Force Dirs”, rely on “Brute Force Files” instead.
If you enable the “Be Recursive” option, remember that DirBuster’s multi-threaded approach means that all those queues of work will be competing for a limited set of Workers. It’s easy to get into a situation where the Workers are looking in sub-folders of no real interest, slowing down the search for better candidates. In a time-limited test you could try looking at just the root content first by disabling this option. Where you go from there can be both manual and automated – and there’s always the option to create a custom dictionary for further scans based on the results of the first scan.
Options Menu
I’ve already mentioned “Follow Redirects” – in general, tick this only if you have to because it has the capacity to slow down the scan. Without this ticked, you’ll see 301 and 302 responses in the final results and you can just manually target the ones of interest later.
Choosing “Debug Mode” will only make a difference if you’re launching DirBuster from a command window that remains open in the background:
The references to Worker[n] are to the threads doing the networking so for n threads that you set you’ll see Workers from [0] to [n-1].
The option “Parse HTML”, which is on by default, instructs DirBuster to read the HTML of files that it discovers, looking for files and folders it then doesn’t have to guess. These can be found, for example, in the href
attributes of <a>
tags. You might decide this is overkill since DirBuster will quickly begin to download a lot of stuff you’ll see elsewhere during testing e.g. in Burp’s Proxy and Site Map. Overall this may add an overhead for results you simply don’t need – at least not from this tool on the first scan. There’s another possible benefit to disabling this when running authenticated scans, which we’ll come to momentarily.
Advanced Options
I’ll skip the first two tabs, which are self-explanatory, and start with the tab that’s active in the screenshot above…
Http Options
First, DirBuster allows you to add custom headers to your requests so you could, for example, add an authenticated session management cookie. Whoa! Did you say run an automated scanning tool authenticated? Yes I did. After getting a feel of the site you may be comfortable doing this – it can pull out some interesting finds (as shown by the case studies at the start of this article). Anything you find authenticated that you didn’t find unauthenticated is really worth a look. Although the risk of side effects is much lower than running a full-on active web application scanner authenticated across a site, of course I have to say that it’s not without risk! I disable “Parse HTML” and “Be Recursive” as a safety measure.
Underneath is the “Http User Agent” and you can see the default looks nothing like a real User-Agent string. If you’re getting odd results from DirBuster that you’re not seeing in Burp, you could try changing that option, e.g. to “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0″.
Lastly, the option to use a proxy is useful for troubleshooting – as well as learning! You could also take advantage of your upstream proxy’s features to handle more complex cases (adding an overhead, of course).
Scan Options
Here lies the all-important “Fail Case String”, which by default is “thereIsNoWayThat-You-CanBeThere”. The response from this page is used to determine whether or not a guessed page/directory is there so it’s critical for the success of the scan. DirBuster will request this often in fact – for every file type in every directory that it finds. So starting from / with all the scan options enabled (directories, files, recursive and blank), having found /admin/users/
, for example, DirBuster will request:
/admin/users/thereIsNoWayThat-You-CanBeThere/
/admin/users/thereIsNoWayThat-You-CanBeThere
/admin/users/thereIsNoWayThat-You-CanBeThere.php
If you’re getting strange results from DirBuster, consider changing this string. It may even be worth getting into the habit of manually testing the fail case string as a directory and page before you start a lengthy scan.
DirBuster Options
The last tab serves as a reminder that most of the Options and Advanced Options discussed above get reset when you re-start DirBuster. Only the proxy settings persist beyond the options listed in this tab, which cover the default number of threads, dictionary and file extensions. These options will be pre-populated when you start DirBuster from fresh. Although you’ll lose many of your options on restart, being forced to reconsider them maybe isn’t such a bad thing.
And finally
It’s worth starting DirBuster relatively early on in the test because it can take a while to complete, and obviously you want some time left over to explore anything interesting it finds. Keep an eye on the results while it’s running to make sure you’re getting something sensible – and that you’re not causing a slew of 500 errors. Version 1.0 RC1 will pause automatically after 20 consecutive errors but that’s client-side errors, not 500 responses. Equally if you’re getting mostly redirects, try to alter your parameters or, as a last resort, enable the “Follow Redirects” option.
Despite – or because of – your efforts to optimise your scan, you can often get a large number of hits. On the reporting side, the CSV option is useful because you get the Location, Response Code and Content Length on one line so you can quickly begin to process this and weed out the cruft.
Finally, note that you can invoke a command line interface by running DirBuster in headless mode. Check out the options with java -jar <DirBuster_jar_file> -h
. The parameters don’t comprehensively match the GUI options, though, so if you need a command-line scanner of this type and DirBuster isn’t up to the job, try dirb
(on Kali).
Unfortunately DirBuster is an inactive project
However we (the OWASP ZAP Team) have essentially forked it. It is now included in the ZAP Marketplace as a ZAP add-on rather than as a stand-alone tool.
I’ve update the DirBuster homepage with this info.
Simon
Nice explanation
Thank you for sharing this information.