It’s with great regret that I need to shutdown the hardware and say goodbye to my gitdigger project. There are a few reasons and I will try and address them in this post.

Bandwidth

Where I am located my only option is DSL (and yes, I know how much this service sucks). After two years the family is a bit tired of the slow internet and the crappy looking netflix since I am hogging all the bandwidth. My only option is to purchase a second business account (either cable or dsl) which are both pretty pricey.

Time

There are so many changes I’d like to make to the code base to make it better and faster, but I just don’t have the time to do so. At this point any change requires a good bit of thought on how to make that change over the existing dataset, which is huge. I also don’t have time to write the code to pull out all the expliots and interesting things I find as I get more and more into my new job.

Cost

To truely take this project to the next step its going to involve funding for bandwidth, hosting, and data analitics. Without funding I’ll only ever be able to look at a small (and outdated) part of the picture.

Desire

I think its time for something else, perhaps something with a bit more of a challenge. Not much more I can say about this, but I’m sure many of you will understand what I mean.

I want to thank everyone who has helped support this project over the last few years I’ve had a great time doing talks about it and the many things that the data can be used for (both good and bad).

Thank you,

Jaime

After DEF CON I thought it would be wise to maybe blog about my Vegas antics and the classes I attended. I also thought it would be a smart idea to update octopress. The former was a good idea; the latter caused my site to die. I’ve rebuilt everything from scratch with the latest and greatest so if you notice any errors on the site please let me know.

Thank you!

At my new job (and some of my personal projects) I found myself looking at a ton of text. Upon reviewing all this data, which could be anything from json, csv or just plain text, my eyes tend to bleed. It can quite quickly become confusion as to which data I’ve looked at or how that data relates to everything else.

With all of that being said, it was time to find a new way of looking at things. This is where python-graphviz has come in handy. The first thing I needed to do (after spending some time googling and then finally learning about graphviz anyway) was to find a way to normalize my data.

In an attempt to keep things nice and simple we are just going to look at some basic data in a csv.

The CSV Data

1
2
3
4
5
6
7
8
9
10
FC_EVENT,jar_cache234234343.tmp,WindowsXP
FC_EVENT,jar_cache543454345.tmp, Windows7
FC_EVENT,~spawn098343434.tmp,WindowsXP
FC_EVENT,~spawn093565831.tmp,Windows7
FD_EVENT,fred.txt,WindowsXP
FD_EVENT,wilma.txt,Windows7
FD_EVENT,wilma.txt,WindowsXP
FD_EVENT,betty.txt,WindowsXP
FD_EVENT,betty.txt,Windows7
FD_EVENT,betty.txt,WindowsXP

The Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import hashlib
import argparse
from graphviz import Digraph

def setup():
    parser = argparse.ArgumentParser()
    parser.add_argument("-f", "--file", action="store", dest="file", required=True, help="File to create dot file from")
    global args
    args = parser.parse_args()
    
def main():
    setup()
    fp = open('%s' % args.file, 'r')
    fp_lines = fp.readlines()
    fp.close()
    
    normalized_data = []
    
    for line in fp_lines:
        if not line.strip().startswith("#") and line.strip() != "":
            line_data = line.strip().split(",")
            tmp_data = {
                "action": line_data[0],
                "file": line_data[1],
                "os": line_data[2],
            }
            normalized_data.append(tmp_data)
                
    dot = Digraph(comment='No Comments Here')
    dot.node_attr['shape'] = 'box'
    dot.graph_attr['ranksep'] = '1.5'
    dot.graph_attr['nodesep'] = '0.8'
    dot.graph_attr['splines'] = "ortho"
    n = 0
    lookup = {}
    existing_edges = []
                
    if normalized_data:
        for root in sorted(normalized_data):
            for itm in root.items():
                if not (itm[1] in lookup):
                    lookup[itm[1]] = n
                    dot.node('n%d' % n, '%s' % itm[1])
                    n += 1

            t_hash = hashlib.md5()
            t_hash.update("%s%s" % (root["action"], root["file"]))
            if not t_hash.hexdigest() in existing_edges:
                    existing_edges.append(t_hash.hexdigest())
                    dot.edge('n%s' % lookup[root["action"]], 'n%s' % lookup[root["file"]])
                    
            t_hash = hashlib.md5()
            t_hash.update("%s%s" % (root["file"], root["os"]))
            if not t_hash.hexdigest() in existing_edges:
                    existing_edges.append(t_hash.hexdigest())
                    dot.edge('n%s' % lookup[root["file"]], 'n%s' % lookup[root["os"]])
                    
            fp = open('%s.dot' % os.path.basename(args.file), 'w')
            fp.write(dot.source)
            fp.close()
            
if __name__ == "__main__":
    main()

The Pretty Results

The Results

The following results can be saved as a .dot file and then opened up into graphviz directy in order to see the pretty chart we just made.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// No Comments Here
digraph {
  graph [nodesep=0.8 ranksep=1.5 splines=ortho]
  node [shape=box]
      n0 [label=FC_EVENT]
      n1 [label=WindowsXP]
      n2 [label="jar_cache234234343.tmp"]
          n0 -> n2
          n2 -> n1
      n3 [label=" Windows7"]
      n4 [label="jar_cache543454345.tmp"]
          n0 -> n4
          n4 -> n3
      n5 [label=Windows7]
      n6 [label="~spawn093565831.tmp"]
          n0 -> n6
          n6 -> n5
      n7 [label="~spawn098343434.tmp"]
          n0 -> n7
          n7 -> n1
      n8 [label=FD_EVENT]
      n9 [label="betty.txt"]
          n8 -> n9
          n9 -> n5
          n9 -> n1
      n10 [label="fred.txt"]
          n8 -> n10
          n10 -> n1
      n11 [label="wilma.txt"]
          n8 -> n11
          n11 -> n5
          n11 -> n1
}

Summary

This is a VERY simple example of what you can do with python-graphviz. I would also like to mention that I used hashlib to help make sure multiple nodes of the same thing did not get created which wasn’t really needed in this example, but may prove useful later on with larger datasets.

The Book:

The Hacker Playbook was written by Peter Kim. You can find it on Amazon in Paperback and Kindle versions HERE.

The Good:

The Hacker Playbook is filled with tips and tricks from setting up your pentesting rig to virus scanner evasion. This book covers a wide range of tools (almost all open source) with a very nice section of resources for keeping up to date with the lastest security news.

The Bad:

I’m not much of a sports fan, and the chapters are broken up into football terms.

Summary:

This is a book that everyone should have in his or her collection. It’s laid out well and even thought its meant to be read from start to finish, it makes for a great reference book.

ENJOY!

The Book:

RTFM: Red Team Field Manual was written by Ben Clark. You can find it on Amazon in Paperback version HERE.

The Good:

The Red Team Field Manual (RTFM) is a no fluff, but thorough reference guide for serious Red Team members who routinely find themselves on a mission without Google or the time to scan through a man page. The RTFM contains the basic syntax for commonly used Linux and Windows command line tools, but it also encapsulates unique use cases for powerful tools such as Python and Windows PowerShell. The RTFM will repeatedly save you time looking up the hard to remember Windows nuances such as Windows wmic and dsquery command line tools, key registry values, scheduled tasks syntax, startup locations and Windows scripting.

The Bad:

This is not a book you purchase to sit down and read from front to back. It’s more of a desk reference for when you get stuck and have no access to google. It is a peek inside of mine of a red teamer and his notes and it shows in the layout of the book.

Summary:

I would recommend this book for those just starting out or looking for other methods for how to do things. I personally found some good tips and tricks that I didn’t know and made the book totally worth the purchase.

ENJOY!