One of the single most important techniques/activities when gathering intelligence (i.e. intel) from open source repositories is analytic reading. The second is properly presenting data/intel with relevant context.
This isn’t the kind of reading you do in the summer with a children’s book and litter of rug rats gathered at your feet, this is the kind of reading one does where you look for hints or clues about a person based on phrasing or word choice. Now you don’t need to have a degree in psychology or grammar to do this, you simply have to pay attention, take notes, and apply a little common sense.
Let’s take my request for help from the #InfoSecFam on ideas for my first blog. Here were the responses I got (thank you to the brave souls who dare support me) :p
- Well, you could start with those lovely examples of people posting pics of credit cards…
- Then folks posting about going on vacation on their facebooks…
- Maybe some military types posting pics with intact exif?
- <graphic> #internetfeds
- google hacking is still incredibly viable, and it’s a huge OSINT fail.
- specifically anonymous FTP servers indexed by google.
- <graphic> bad admins everywhere. Really bad. Ive seen some sh1t man
- Boarding passes are now a big thing… “I Know Where You LIve: all the sh*t that people post”
- You could do reviews of OSINT web-tools
- ok, an oldie being forgotten, ‘don’t run with admin/root’.
Just a Little Intel…
So, let’s analyze what we’ve read. [Note this example is very trivial, however the principles presented are not.]
- Q1: What’s the culture/industry of the authors here?
- A1: #InfoSec
- Q2: What are underlying characteristics of this group’s communication styles?
- A2: InfoSec culture is heavily sarcastic
- Q3: Are there clues to anyone’s profession/hobby listed in these comments?
- A3: Yes – acronym and word choice: FTP, intact exif, bad admins everywhere, ‘admin/root’
- Q4: Any clues to age or experience?
- A4: Yes – still incredibly viable, oldie but goodieI
The list of questions above is a trivial example of how to glean the not-so-obvious intel that is implied. Nonetheless, the questions asked and answered, should be driven by a few things, two at minimum: a profile template and a threat model [otherwise you’re out there going all Willy Nilly and traipsing through minefields of soggy cow patties.] SO! Before you even start gathering Intel, your leadership should have identified WHAT they want to know (identified in the threat model) and HOW you will collect it (defined in the documentation standards and profile template). So as you do answer these very valuable questions, you’re looking for the same data points, all the time, essentially filling in the pieces of a puzzle one at a time. Keep in mind, they may not all be present, but at least you’re looking for them. As you get them, you should be capturing them in a profile template.
The list of questions could go on and on depending on how much of the ocean you’re planning on boiling, and tools such as the IBM Tone Analyzer (demo link here) or the IBM Personality Analyzer can offer valuable insight as well, but tools are no replacement for instinct. While these tools may enhance or even expedite the analysis process, they cannot replace an Analyst’s instinct and skills of discernment as they read something and decide what “box” to put it in, if it is relevant, indicates personality traits, warrants in/exclusion or is a thread that needs to be pulled to see what else unravels.
Takeaway: Read closely, carefully, and never under estimate the human factors at work. Read between the letters AND the lines. You may find clues you need when building a profile or finding a target simply by the nuances in their tiniest commentary.
So let’s talk about the biggest mistake with the list…. It’s in numerical order! If you were only reading this an OSINT report, you might think these came from 10 different people or one person provided 10 ideas. So, by creating a pure LIST of comments rather than a LIST with logical grouping, we lose context because multiple comments were made by some of the same individuals,
Let’s fix that….
P1-1. Well, you could start with those lovely examples of people posting pics of credit cards…
P1-2. Then folks posting about going on vacation on their facebooks…
P1-3. Maybe some military types posting pics with intact exif?
P2-1. <graphic of chat> #internetfeds
P2-2. <graphic of man hiding in a chair> bad admins everywhere. Really bad. Ive seen some sh1t man (BTW @MyTinehNimjeh I <3 u man LOL)
P3-1. google hacking is still incredibly viable, and it’s a huge OSINT fail.
P3-2. specifically anonymous FTP servers indexed by google.
P4-1. Boarding passes are now a big thing… “I Know Where You LIve: all the sh*t that people post”
P5-1. You could do reviews of OSINT web-tools
P6-1. ok, an oldie being forgotten, ‘don’t run with admin/root’.
Now you see there were actually 6, not 10 people who replied (P# meaning Person 1, Person 2, Person 3…-1, -2 being the comment number they made).
Additionally, this context represents something else taken for granted by the statisticians an API monkeys – it isn’t always the total volume that matters, sometimes it’s the volume of one person, or even the lack of replies to others who may have forked a conversation thread. If this thread were listed as a statistic, stating that there were 10 comments, that too would also be incorrect. There were actually a few different forks, some took a humorous path, others were simply “neutral” suggestions, AND there were more than a total of 10 interactions. This list however, only represented those comments that were actually relevant to the request for help with ideas which were extracted and placed in this article. Again, in your OSINT reports, ensure you represent relevant intel accurately, and provide the reader proper context through commentary and presentation.
Takeaway – ensure that HOW you present data in a report represents it with as much relevant context as possible.