Category Archives: Analysis and Presentation

OSINT Data Sources: Trust but Verify

Thanks to @seamustuohy and @ginsberg5150 for editorial contributions

For new readers, welcome, and please take a moment to read a brief message From the Author.  This article’s primary audience is analysts however if you are in leadership and seek to optimize or maximize the analysis your threat intelligence program is producing you may find this walk through valuable.


In the recent blog Outlining a Threat Intel Program,, steps 4 and 5 discussed identifying information needs/requirements and resources.  This blog is going to expand on the latter, identifying the resources.  Previously we discussed that when identifying resources, you start with what you have inside your organization and that sometimes the information is in a source to which you do not have access or it is sold from a vendor.  There is another category not previously mentioned, an obvious one, open source, freely available (usually on the Internet).  Free is good. Free is affordable. Free can be dangerous.  Before you decide to trust/use a free source you need to adequately vet that source.  Below I will talk about a recent project, a free source that was considered, how it was vetted and the takeaways which are 1) ensure accuracy and completeness of OSINT sources before they are considered reliable or relevant; 2) know the limitations of your OSINT sourced data; and 3) thoroughly understanding any filtering and calculations that occur before source data is provided to you.


Raw Data on github here.  file name 20170719_OSINT_Data_Sources_Trust_but_Verify_Dyn_Original_Data.7z

Over the years I have worked on a few projects with ties to the Ukraine.  During a conversation with colleagues, we came up with a question we wanted to answer:  Is there any early warning indicator of a cyb3r attack on a power grid that might be found in network outages?  We reasoned that even though businesses, data centers and major communications companies have very robust UPS/generators available, an extended power outage would surely tap them and thereby cripple communications which could have a ripple effect during an armed conflict, biological epidemic, civil unrest etc.  So, I decided to see what data was out there with respect to network outages and then collect data related to power outages.  I settled on my first source,  Dyn’s Internet Outage bulletins from here (filtered on Ukraine). who according to their site the organization “helps companies monitor, control, and optimize online infrastructure for an exceptional end-user experience. Through … unrivaled, objective intelligence into Internet conditions… “

The effort to script this versus the time it would take to copy/paste the few pages of data I needed to retrieve led me to opt to do this old school and copy the data straight off the web pages.  If this is something that I find I will do more than two or three times, then I will consider asking for API access and automating this task. However, for the time being, this was a one-off-I-am-curious-and-bored investigation & manual labor settled it.  I scraped the approximately 480 entries available at the time, going back to the first entries in March 2012 (there are more now of course).

I used a mix of regex-fu & normal copy/paste to morph the data into semicolon delimited lines of data.  I chose semicolons because there were already commas, spaces, and dashes in the data that I was not sure I wanted to touch and since I had less than 2 million rows of data and I love MS Office, I opted to use semicolons so that I could manipulate it in MS Excel more easily.

Below is an example of a typical entry:

6 networks were restored in the Ukraine starting at 16:03 UTC on April 11. This represents less than 1% of the routed networks in the country.

100% of the networks in this event reached the Internet through: UARNet (AS3255).

Let the metamorphosis begin:

replace -> networks were restored in the Ukraine starting at ->

6;16:03 UTC on April 11. This represents less than 1% of the routed networks in the country.

100% of the networks in this event reached the Internet through: UARNet (AS3255).

delete ->  on

6;16:03 UTC April 11. This represents less than 1% of the routed networks in the country.

100% of the networks in this event reached the Internet through: UARNet (AS3255).

replace -> . This represents -> YYYY; [I saved each year’s data as separate text files to make this search/replace easier later so that I can do one year per file]

6;16:03 UTC April 11 YYYY; less than 1% of the routed networks in the country.

100% of the networks in this event reached the Internet through: UARNet (AS3255).

replace -> less than ->  with < less than symbol

6;16:03 UTC April 11 YYYY; < 1% of the routed networks in the country.

100% of the networks in this event reached the Internet through: UARNet (AS3255).

replace ->  of the routed networks in the country. (including the newline characters) -> ;

6;16:03 UTC April 11 YYYY; < 1% ; 100% of the networks in this event reached the Internet through: UARNet (AS3255).

replace ->  of the networks in this event reached the Internet through:  -> ;

6;16:03 UTC April 11 YYYY; < 1% ; 100;UARNet (AS3255).

replace -> (AS -> ;AS

6;16:03 UTC April 11 YYYY; < 1% ; 100;UARNet;3255).

delete -> ).

6;16:03 UTC April 11 YYYY; < 1% ; 100;UARNet;3255


At this point, each of the individual original files contains mostly lines that look like above with the added caveat that I replaced YYYY with the appropriate year for each file.

And like any good analyst I began to scroooooooolllllllllll through over 400 lines hoping my eyes would not catch anything that wasn’t uniform.  Of course, I wasn’t that lucky, and instead I found entries that were ALMOST worded the same, but just different enough to make me wonder if this “reporting” was automated or if it was human.

  • 59 networks experienced an outage…
  • 30 networks experienced a momentary outage…
  • 6 networks were restored…

Do you see it?  I now had three kinds of entries, “restored” which implies that there was an outage.  I also had “outage” and “momentary outage”.  I performed a few more rounds of semicolon replacement so that I could keep moving and noted the issues.


I noted that I had inconsistencies in the entries, but I wasn’t exactly ready to write off the source.  After all, they are a large, global, reputable firm and I believed that there could be some reasonable explanation, and continued to prepare the data for analysis.

The statement “This represents less than 1%” isn’t an exact number so I reduced the statement to “;<1%” to get it into my spreadsheet for now. I wanted to perform some analysis on the trends related to the impact of these outages and planned to use the ‘percentage of networks affected’ value.  To do this, I was going to need a number, and converting them seemed like it should be relatively easy.  I considered that new networks are probably added regularly, although routable networks not very often, and decided I should try to estimate the number of networks that existed just prior to any reported outage/restoration based on a known number.  So, I turned to the statements that were more finite.  Unfortunately, to my surprise, there were some (more) serious discrepancies.  For example, on 5/8/2012 there were two entries one for 14:18 UTC, the other 11:39 UTC.  The first stated that 97 networks were affected which represented 1% of the routed networks in the country (not less than 1%, exactly 1%), and math says 97/.01 = 9,700 networks. The second entry stated that 345 networks were affected which was 5% of the routed networks, and math says that 345/.05 = 6,900 networks.  How could this be?  In less than three hours, did someone stand up 2,800 routable networks in Ukraine?  I don’t think so.  I performed the same calculations on other finite numbers for very close entries most of them on the same day and found the same problem in the data.  This is no small discrepancy and I can assume that the percentages being reflected on the web page have undergone some serious rounding or truncation, and therefore decided to remove this column from the data set and not perform any analysis related to it.  Instead I kept the value that, for the time being, I trusted, the exact number of networks affected.

In the data set, there were 10 entries that did not list a percentage of networks that were able to reach the Internet either during the outage or after the restoration (the range of networks affected was from 7 to 105), and these 10 entries were removed from the data set.  These 10 entries also lacked data for the affected ASN(s) as well. To avoid skewing the analysis, these incomplete entries were removed.  There was one entry of 23 affected networks 2/17/2014, where the affected ASN was not provided, but all other information was available, and an entry of UNK (unknown) was entered for this value.

Additional name standardization was completed where the AS number (ASN) was the same, but the AS names varied slightly such as LLC FTICOM vs. LLC “FTICOM”.

The final data set content that remained was 97.8947% of the original data set with data from March 22, 2012 16:55 UTC to July 5, 2017 21:40 UTC.

Let’s review the list of concerns/modifications with the data set:

  1.  inconsistent reporting language
  2.  lack of corresponding outage and restoration entries
  3.  inconsistent calculations for the percentage of routable networks affected
  4.  incomplete entries in the data set
  5.  AS names varied while the AS number remained the same


First, I started with the low hanging fruit.

  • Smallest number of networks affected: 6
  • Largest number of networks for one ASN affected: 651 (happened on 5/18/2013)
  • Average number of networks affected: 74.93534483
  • How many times were more than one ASN affected? 253 which is slightly more than half of the total events.
  • Were there any outages of the same size for the same provider that were recurring? YES
    • ASN 31148, Freenet (O3), had three identical outages of 332 networks 10/2/2014, 1/8/2015, 11/25/2015
    • ASN 31148, Freenet (O3), had four identical outages of 331 networks 8/11/2014, 10/6/2014, 11/5/2015, & 8/3/2016

At a glance, it seems like the low hanging fruit are what was expected, but I am still not sold that this data set is all that it is cracked up to be, and I decided to check for expected knowns.

The Ukraine power grid was hit on Dec 23, 2015 with power outage windows of 1-6 hours; however, the only network outages reflected in the data for Dec 2015 are on 1, 2, 8, 27, 30 & 31 December. The next known major outage was more recent, just before midnight on 17 December 2016 and again, there is not a single network outage recorded in the data set.  This puzzled me, so I began to look for other network outages that one would reasonably expect to occur.  In India, 2012 there were outages on 30 & 31 July and according to The Guardian, sources “Power cuts plunge 20 of India’s 28 states into darkness as energy suppliers fail to meet growing demand” and if you follow the article, one of many, you get reports that power was out up to 14 hours.  [I encourage readers to check sources, so you can find 41 citations on this wiki page].  Then I went back to the data source, filtered on India, and magically there were no reported network outages that paralleled the power outages.  Then I decided to start spot checking USA outages listed here because this list has the following criteria:

  1.     The outage must not be planned by the service provider.
  2.     The outage must affect at least 1,000 people and last at least one hour.
  3.     There must be at least 1,000,000 person-hours of disruption.

And sadly, I chose three different 2017 power outage events none of which seemed to be reflected in the data set of network outages.  It was time to decide if the data set I wanted to use was going to meet my needs.  After reviewing what I had learned after normalizing and analyzing the data I summarized the concerns:

  1.  inconsistent reporting language
  2.  lack of corresponding outage and restoration entries
  3.  inconsistent calculations for the percentage of routable networks affected
  4.  incomplete entries in the data set
  5.  AS names varied while the AS number remained the same
  6. known events aren’t found in the data set

All things considered, I opted not to use this OSINT data source for my analysis as there were too many discrepancies, especially the lack of “known” events being reflected.


Because this was a sudo personal endeavor that only paralleled some of the work I have previously done, I took a very loose approach to the analysis and I did not verify that the data set I was considering met my needs regarding completeness & accuracy.  I did however assume that because the data was being made public by a large, reputable, global company that they themselves were exerting some level of quality control in the reporting and posting of the data.  This assumption of course is flawed.  Currently, I am unable to come up with an explanation for so many multiple data points being absent from the dataset.  It is possible that this major global Internet backbone company simply has no sensors in the affected 71% of India or they are only in the 29% that weren’t affected.  Perhaps all their sensors in the Ukraine were down on the exact days of the power outages there.  It is also very possible that there were enough UPSs and generators in place in Ukraine and India that despite power going out, the networks never went down.  At this point, I’ve decided to search for answers elsewhere and am not spending time to try to figure out where the data is as this was just a personal project.


Data sources need to be vetted for accuracy and completeness before they are considered reliable or relevant.  When considering a source, check it for known events that should be reflected.  If you do not find these events and you wish to use the source, ensure you understand WHY the events are not being reflected.  Knowing the limitations of your OSINT sourced data, is critical and thoroughly understanding any filtering and calculations that occur before it is provided to you is just as vital to performing successful analysis.  This kind of verification and validation should also be repeated if a source is used in OSINT collection.  Because a data source is not appropriate for the kind of analysis that you wish to do for one project does not mean that it is invalid for other projects.  All OSINT sources have some merit, even if it is that they are an example of what you DO NOT want.




Outlining a Threat Intel Program

(estimated read time 27min)

For new readers, welcome, and please take a moment to read a brief message From the Author.

Executive Summary

I recently crunched the high level basics of setting up a threat intelligence (abbreviated as Threat Intel) program into a 9-tweet thread, which was met with great appreciation and the feedback solicited unanimously agreed I should expand on the thread in a blog so here we go.

This blog elaborates on a nine-step process for creating a Threat Intel program. It is packed full of thought provoking questions, suggestions, and even a few lessons learned to help you avoid bumps in the road. The concepts shared here aren’t necessarily earth shattering; in fact they come from military experience, time spent in combat zones, 24/7 shifts in intelligence facilities, information assurance, governance/risk/compliance, and information security (InfoSec) programs in both government and civilian sectors. Additionally, I take every opportunity to pick the brain of someone (anyone) who has been doing threat intel or InfoSec and occasionally even sit still long enough to read a book, article, or paper on the topic. Threat Intel isn’t anything new. It’s been around since humans have been at odds with each other and doing anything from sending out spies to eavesdropping in a bar, but we seem to struggle with developing a program around it in the digital space. This blog aims to close that gap, and provide you a practical outline for designing your own Threat Intel program.


Many of you are used to the long standing saying “You can have your project: fast, cheap, or right. You’re only allowed to choose two.” But what about quality? I remember when I first learned to drive my mother gave me $5 told me to be back in 15 minutes and to bring her some dish detergent. I ran to the store grabbed the bargain brand, hurried back home and handed it to her. She looked and shrieked “What’s this!?” I learned more about dish detergent in the 15 minutes that followed than I care to remember. The lesson here is that, I had completed the task, on time, under budget, and provided exactly what she required. It was fast, cheap AND right, but it didn’t meet her preferred standard of quality.

Taking this lesson learned, I include a fourth constraint for tasks/projects: Quality. Imagine our four factors like a diamond, perfectly balanced, with four equal sections. The rules are simple, if you wish to increase volume in one of the sections, you must decrease volume in another. For this threat intel discussion we label our four sections: time, money, design/accuracy, and quality. Threat intel is rarely, if ever, black and white, therefore we will use the term ‘accuracy’ instead of the ‘right’ as it implies binary thinking ‘right or wrong’. As we discuss building out a Threat Intel program in this blog, we’ll refer back to our balanced diamond, to help remind us of something Tim Helming so eloquently commented ( that at the end of the day the micro (1’s & 0’s of threat hunting) have to translate to the macro (a valuable Threat Intel program that pays the bills).



The first tweet in the series starts simply with “list your top 3-5 assets”. This may sound very straightforward however I suspect that if you individually asked each C-level executive, you’d probably wind up with a very diverse list. Try to answer 1) what is it that your organization actually DOES and 2) what assets do you need to do it?

I’d encourage you to have your top two leadership tiers submit their answers via survey or host them at a collaborative meeting where all participants come with write ups on their thoughts, then toss them out on a whiteboard to avoid “group think”. You can have as many as you want, but understand that when hunting threats, you are time constrained and the quality of data is important. There’s a finite value in automation, and at the end of the day threat analysts and threat hunters have “eyes on glass” reading, analyzing, interpreting, and reporting. If your list of “most critical assets” is more than five (and usually three is optimal if there’s stark diversity) then the hunting & analysis teams efforts will usually be proportionally divided according to weight of priorities so that they may perform their jobs to the best of their abilities. A large list will mean you’ll need to invest commensurate amounts of money in staffing to achieve adequate accuracy, quality (and thoroughness) of investigation, analysis and the level of reporting desired.


Tweet number two in the series calls for an organization to consider “who would kill to have/destroy those assets? (think of what lethal/strategic value they hold to another)”. This is an exercise in not only giving names to the boogeymen that keep you up at night, but also in identifying who’s the most feared. This sounds simple enough right? When asking groups to do this, there are usually three adversaries named 1) your largest competitor(s), 2) hostile former/current employees, & 3) “hackers”. That third group is a bit too vague for your hunting team to effectively and efficiently execute their duties or provide you a quality threat assessment/intel report. Imagine your threat intelligence report template as “$threat tried to hack/attack us…”, now substitute “hacker” for $threat and read that aloud. [Be honest, you’d probably fire someone for that report.]

Obviously “hacker” needs to be refined. Let’s break that term down into the following groups:

  • advanced persistent threats (APT): one or more actors who are PERSISTENT, which usually means well funded and they don’t stop, ever, they don’t go find ‘an easier’ target, & rarely take holidays, or sleep, or at least so it seems; they are your nemesis. A nation state [hacker] actor (someone working for a foreign country/government) is an APT, but not all APTs are nation states! They ARE all persistent.
  • criminals: entities driven by monetary gain, resorting to anything from phishing & fraud to malware and 0-days
  • hacktivists: a group seeking to promote a political agenda or effect social change, usually not in it for the money
  • Script kiddies: usually seek bragging rights for disrupting business

Now, using these groups instead of “hacker”, try to think of someone (or some group) who meets one of these definitions and would go to great lengths to steal/destroy the assets listed in step one. Depending on what services or products your organization provides your answers will vary. A video game company probably has very different threats than a banker, unless of course the owners or employees bank with the banker. A stationary company will have different threats than a pharmaceutical company. Sometimes however, threats are target-neutral, these threats would be addressed by your security operations center (SOC) first, then escalated to your threat hunters/analysts if necessary. Remember, your threat intel team can’t chase every boogeyman 24/7.

Another thing you’ll want to do is score the threat actors. There are a number of systems out there and the specifics of that activity are beyond the scope of this article. However, it may be helpful when trying to prioritize what/who threatens you by using a matrix. For example, on a scale of 1 to 5, 1 being the lowest, what is each threat actor’s:

  1. level of determination
  2. resources
  3. skill
  4. team size


Next in the tweet thread I asked “…What [are] the 3-5 most important things to prevent? Physical/Virtual Theft? Destruction? Corruption? Manipulation? Modification?…” You may think of these within any context you wish, and some to consider are data, hosts/nodes, code execution, people & processes.

During a debate over what was the minimum security requirement for something highly sensitive, an executive said, to paraphrase, that he didn’t care who could READ the documents, just as long as they couldn’t STEAL them. Needless to say, explaining digital thievery left his brain about to explode and me with carte blanche authority to deny access to everyone and everything as I saw fit. The takeaway is, identify and understand what endstate is beyond your acceptable risk threshold, this unacceptable risk is what you MUST stop.

For example, in some cases a breach of a network segment may be undesirable but it is data exfiltration from that segment that you MUST stop. Another example might be an asset for which destruction is an acceptable risk because you are capable of restoring it quickly. However that asset becoming manipulated, remaining live and online might have far greater reaching consequences. Think of a dataset that has Black Friday’s pricing (in our oversimplified and horribly architected system). The data is approved and posted to a drop folder where a cron job picks it up, pushes price changes to a transactional database and it’s published to your e-commerce site. If an attacker were to destroy or corrupt the file, you’re not alarmed because there’s an alert that will sound and a backup copy from which you can restore. However, consider a scenario in which an attacker modifies the prices, the “too-good-to-be-true” prices are pushed to the database and website, and it takes two hours to detect this, on Black Friday.

Perhaps you have something that is a lethal agent, thus you MUST prevent physical theft by ensuring the IoT and networked security controls have 5-9’s uptime (not down for more than 5 min per year), are never compromised, or that an unauthorized person is never allowed to access or control it. These are just a couple scenarios to get you thinking, but the real importance lies in ensuring your list of “must stops” is manageable and each objective can be allocated sufficient manpower/support when hunting for threats and your SOC is monitoring events that they’ll escalate to your Threat Intel Team

Identifying and understanding the activities that must be prevented will drive and prioritize the corresponding hunting activities your teams will conduct when looking for bad guys who may already be in your systems. Referring back to our balanced diamond, consider that an investment in technologies to support constant monitoring should probably not be part of the budget for your threat intel team, however analytic tools used on the historical outputs from your continuous monitoring systems, security sensors, logs etc. probably would be. Also consider the cost for manpower, time to be spent performing activities in support of these strategic objectives, and how the quality of the investigations and reporting will be affected by available manpower and tools.


Next in the series of tweets comes

“4) identify the data/information you [would] NEED to have to prevent actions…[from step] 3 (not mitigate to acceptable risk, PREVENT)”

After completing the first three steps we should know 1) what we need to protect, 2) who we believe we’ll be defending against/hunting for, and 3) what we must prevent from happening. So what is the most critical resources needed for us to achieve our goals? Data and Information. At this point in the process we are simply making a list. I recommend a brainstorming session to get started. You may be in charge of developing the Threat Intel program, but you can’t run it by yourself. This step in the process is a great way to give your (potential) team members a chance to have some skin in the game, and really feel like they own it. Before you consider asking C-levels for input on this, be considerate of their time and only ask those who have relevant experience, someone who has been a blue/red/purple team member.

Here’s a suggestion to get you started. Gather your security geeks and nerds in a room, make sure everyone understands 1-3, then ask them to think of what data/information they believe they would need to successfully thwart attackers. Next, put giant post-it-note sheets on the walls, title them “Network”, “Application”, “Host”, “Malware”, “Databases” and “InfoSec Soup”, give them each a marker, then give everyone five minutes to run around the room and brain dump information on each sheet (duplication among participants is fine). Whatever doesn’t fit into the first five categories listed goes on the last one (something like 3rd-party svc provider termination of disgruntled employee reports so you can actually revoke their credentials in your own system expeditiously). After the five minutes are up, take some time to go over the entries on each sheet, not in detail, just read them off so you make sure you can read them. Allow alibi additions as something on the list may spark an idea from someone. Then walk away. You may even repeat this exercise with your SOC, NOC, and developers. You’d be surprised how security minded some of these individuals are (you might even want to recruit them for you Threat Intel team later). If your team is remote, a modified version of this could be a survey.

Come back the next day with fresh eyes, take the note sheets, review and organize them into a list. Follow up with the teams and begin to prioritize the list into that which exists and we NEED versus WANT, and my favorite category ‘Unicorns and Leprechauns’ better known as a wishlist, which are things which as far as we know do not exists but might be built/created.


Some feedback I received regarding the next tweet where I ask if “you [can] get this information from internal sources in sufficient detail to PREVENT items in 3? If not can you get there?” was that it could be combined with the previous step. Depending on the organization, this is a true statement. However, I expect that in order to complete the task above, there will be multiple meetings and a few iterations of list revision before the step is complete. From a project management view, having these as separate milestones makes it easier to track progress toward the goal of creating the program. Additionally, seeing another milestone complete, has immeasurable positive effects as it creates a sense of accomplishment. Whether you combine or separate them, once it is complete, we now have a viable list of information sources we’ve identified as necessary, and now we can start working on identifying how we might source the information.

Information is data that has been analyzed and given context. In some cases, we trust the data analysis of a source, and we are comfortable trusting the information it produces, such as our internal malware reverse engineers, a vetted blacklist provider, or even just “a guy I know” (which ironically sometimes provides the most reliable tips/information out there). In other cases, such as a pew-pew map, we want to see the raw data so that we may perform our own analysis and draw our own conclusions. The challenge in this step, for internal sources, is to identify all the data sources. This will have secondary and tertiary benefits as you will not only identify redundant sources/reporting (which can help reduce costs later) but you will have to decide on which source is your source of truth. You may also discover other unexpected goodies some sources provide that you hadn’t thought of. As an example (not necessarily an endorsement) log files will be on your list of necessary data, and perhaps you find that only portions of these files are pumped into Splunk versus the raw log files which contain data NOT put into Splunk. In most cases when hunting, the raw data source is preferred. However by listing both sources, your discovery of this delta in the sources may even prompt a modification to data architecture to allow the extra fields you want to be added to the Splunk repository.

In other cases, the data which you seek is not currently captured, such as successful login attempts to a resource listed in step one, but it could be if someone turned on that logging. Finally, the data/information you’ve listed, simply is not something you have access to, such as underground extremist threats against your industry or gang activity in the region of an asset from step one. However you still need this information and listing all possible sources for this usually identifies a need for relationships to be established and/or monitoring of open sources to be created. Another data point that will emerge are potential vendors that market/promise that they have the kind(s) of information you want. These will each require a cost/benefit analysis and a “bake off” between vendors to see who truly provides something that is value added to your program and meets your needs. NOTE: most threat intel feeds are at best industry-specific, not organizational or even regionally-specific so be mindful of purchasing “relative” threat intelligence feeds.


The next step in the process mentioned here, is identifying gaps between what data/information you need but don’t have. “6) if no to 5, can you buy this information? If yes, what’s your budget? Can you eventually generate it yourself?” It’s not surprising to anyone, that sometimes the information we’d like to have is closely held by state and federal agencies. If you’re building this program from the ground up, you will want to establish relationships with these agencies and determine if there’s a cost associated with receiving it. As mentioned earlier ISACs for your industry might be a good source, but most of them are not free.

Other information you might be able to generate, but someone else already develops it. In many cases, not only do they develop it, they do it well, it’s useful, and you couldn’t generate it to the quality standards they do unless that was absolutely the only thing on which your team worked. For example, consider Samuel Culper’s Forward Observer He provides weekly executive summaries and addresses current indicators of:

  • Systems disruption or instability leading to violence
  • An outbreak of global conflict
  • Organized political violence
  • Economic, financial, or monetary instability

All of the above, could be used to cover the tracks of, or spawn a digital (cyber) attack. As an independent threat researcher, this information is something I do not have the time to collect & analyze, and it costs me about the same as grits & bacon once a month at my favorite breakfast place.

In considering our balanced diamond, money/cost is a resource that if we need a lot of it for one area of of our program, we usually have to give up something else inside that same category, and it is usually manpower or tools, as everyone is pushed to “do more with less”. So how do we prioritize the allocation of funds? Use the ABC prioritization rules: acquire, buy, create. First, see if you can acquire what you need in-house, acquire it from another team, tools, repository etc as this is the cheapest route. If you cannot acquire it, can you buy it? This may be more expensive, but depending on your timeline and availability of personnel in-house to create it, this is sometimes cheaper than the next option, creating it. Finally, if you cannot acquire it or buy it, then consider creating it. This is probably the most time-consuming and costly option (from a total cost of ownership perspective) when first standing up a program; however, it may be something that goes on a roadmap for later. Creating a source can allow greater flexibility, control, and validation over your threat intelligence data/information.

Whether or not to choose A,B or C will depend on your balanced diamond. If time is not a resource you have, and the program needs to be stood up quickly, you may take the hit on the cost section of your diamond as you need to buy the data/information from a source. Also, the talent pool from which you have to choose may also affect your decision, the time and cost associated with hiring the talent (if you can’t train someone up) may force your hand into buying instead of creating. In some instances the cost of the data may be prohibitive, and you do not have it in-house thus you may have to adjust your time section on your diamond to allow you to hire that resource in. The bottom line is that there is no cookie-cutter “right” answer to how you go about selecting each data resource, and one way or another you must select something and you may need to revise your needs, goals, and long term objectives.



The next tweet in the series is where we really start to get into the “HOW” of our program “7) Once you get the information, how will you evaluate, analyze & report on it? How much manpower will you need? How will you assess ROI?” There’s a lot packed into this tweet and the questions build on each other. Beginning with the first question, you’ll be looking at your day-to-day and weekly activities. How will you evaluate the data & information received? Take for example, an aggregate customized social media feed, will the results need manual review? If so, how often? Will you be receiving threat bulletins from an Intel Sharing and Analysis Center (ISAC)? Who’s going to read/take action on them?  One key thing to include in your reporting in the WHO, not just the when and how.  A great tool for this is a RACI chart.

For each information source you listed in steps 5 & 6, you should have a plan to evaluate, analyze & report on it. You will find, that as your team analyzes and evaluates these sources, some of them will become redundant.

The second question in the tweet was “How much manpower will you need?” There are a variety of estimating models, but I urge you to consider basing it on 1) the number of information sources you’ve identified as necessary and 2) the number of employees in your organization. What’s the point of having a source, if you don’t have anyone to use/analyze/report on or mine it?  Your own employees are sensors, sometimes they’re also an internal threat. Another point to consider is how much of each analysis effort will be manual at first, that can become automated? Remember, you can never fully automate all analyses, because you can never fully predict human behavior, and every threat still has a human behind it.

The third question in the tweet, “How will you assess ROI?” is critical. Before you begin your program, you want to define HOW you will evaluate these. Will it be based on bad actors found? The number of incoming reports from a source that you read, but tell you nothing new? Remember our balanced diamond, there are finite finances, and time that can be invested into the program. As the daily tasks go on, new information and talent needs will emerge but more importantly, the internal data and information sources will either prove to be noise or niche. Other sources, such as an Intel feed, or membership in an ISAC might not prove to be producing valuable information or intelligence. I’d recommend at minimum, annual evaluation (using your pre-defined metrics for your qualitative ROI) if not semi-annual review of any external/paid sources to ensure they are reliable, and providing value. If your team tracks this at least monthly, it’ll be much easier when annual budget reviews convene.

REMINDER: Defining the metrics for ROI in advance does not mean you cannot add or refine the metrics as the program progresses. I recommend reviewing them every 6 months to determine if they need revising. Also, don’t forget that new information needs will emerge as your program grows. Take them, and go back through steps 5-7 before asking for them.


Good advice I’ve heard time and time again is, always begin with the end in mind. The next tweet in the series touches on this by asking “8) what will success look like? # of compromises? Thwarted attempts? Time before bad guys detected? Lost revenue? Good/Bad press hits?” Granted 140 characters is not nearly enough to list all of the possible metrics one could use, but the objective of that tweet and this blog are not to list them for you, rather to encourage you to think of your own.

Before you start hunting threats and developing a threat intelligence program, you’ll need a measuring stick for success, for without one how will you know if you’re on the right path or have achieved your goals? As with everything in business, metrics are used to justify budgets and evaluate performance (there’s a buzz word called key performance indicators KPIs you should become familiar with, also known as status or stop light reporting red, yellow, green).

In a very young program, I’d encourage you to include a list of “relationships” you need/want to establish outside vs inside the organization, and the number of them that you do create. You can find other ideas for metrics with this search:



The final tweet in the series addresses the three most important things, that in my expereince, are heavily overlooked, if not completely forgotten, in most threat intelligence (and InfoSec) programs. Summed up in three questions to fit into the 140 character limit: “9) How can you continue to improve? How will you training & staying current? How will you share lessons learned with the community?”

Addressing them in reverse order, sharing experiences (and threat intelligence) can be likened to your body’s ability to fight off disease. If you’re never exposed to a germ, your body won’t know how to fight it off. If you have an immune deficiency (lack of threat intel and InfoSec knowledge) your body is in a weakened state and you get sick (compromised) more easily. Sharing what you know/learn at local security group meetings, conferences, schools and universities etc. not only helps others it will help you. It pays dividends for years to come. Additionally, people will come to trust you, and will share information with you that you might not get anywhere else except the next news cycle and by then it is too late.

Next, once you’ve designed this awesome threat intelligence program, how are you going to keep this finely tuned machine running at top notch levels? The answer is simple, invest in your people. Pay for them to attend security conferences, and yes it is fair to mandate they attend specific talks and provide a knowledge sharing summary. It is also important to understand that much of the value of attending these events, lies in the networking that goes on and the information shared at “lobby-con” and “smoker-con” where nerds are simply geeking out and allowing their brains to be picked. Additionally, you can find valuable trainings at conferences, sometimes at discounted prices that you won’t find anywhere else. Also, theses are great places to find talent if you’re looking to build or expand a team.

Speaking of training, include in your budget funds to send your people to at least one training per year if not more. Of course you want to ensure they stay on after you pay for it so it is understandable if you tie a prorated repayment clause to it. It is easier to create a rock star than it is to hire one.

Finally, how can you continue to improve? The answer for each team will be different, but if you aren’t putting it on your roadmaps and integrating it into your one-on-one sessions with your employees, you’ll quickly become irrelevant and outdated.  Sometimes a great idea for improvement pops into your head and then two hours later you cannot remember it.  Create a space (virtual or physical) where people can drop ideas that can later be reviewed in a team meeting or a one-on-one sessions.  I find that whiteboard walls are great for this (paint a wall with special paint that allows it to act as a whiteboard).  Sometimes an IRC-styled channel, shared do, or wiki page will work too.


This blog provides a practical outline for designing a threat intelligence program in the digital realm also known as cyberspace, and introduced a four-point constraint mode: time, money, design/accuracy, and quality.


As with any threat intelligence, we must understand the digital landscape and know what want it is that must be protected.  In order to protect it, we must have good visibility and simply having more data does not mean we have better visibility or better intelligence. Instead, an abundance of data, that isn’t good data (or is redundant) becomes noise. Discussed above was the next critical step in the defining the program   identify what we need to know, where we can get the answers and information we need, and how much, if anything, those answers and information will cost.  Some programs will run on a shoestring budget while others will be swimming in a sea of money.  Either way, reasonable projections and responsible spending are a must.


Once the major outlining is done, we start to dig a little deeper into the actual executions of the program, and we discussed figuring out exactly how we will (or would like to) develop and report the threat intelligence so that you can adequately source/hire the manpower and talent needed to meet these goals. Then we highlighted the all important task of defining success for without a starting definition, how can we show whether or not we are succeeding or failing?  Remember to revisit the definition and metrics regularly, at least semi-annually, and refine them as needed.


Finally, we close out the program outline by remembering to plan growth into our team.  That growth should include training, sharing lessons learned internally and externally.  Remember to leverage your local security community social groups, and the multi faceted benefits of security conferences which include networking, knowledge from talks, and knowledge/information gained by collaborating in the social hangout spots.  
Thank you for your time. Please share your experiences and constructive commentary below and share this blog on your forums of choice. For consultation inquiries, the fastest way to reach me is via DM on Twitter.

Phishing the Affordable Care Act

Recently, while working on a project I was asked to gather some information on Blue Cross Blue Shield (BCBS) and something scary began to unfold.  I noticed that states have individual BCBS websites, and that there is no real consistency in the URL naming convention.  Then I began imagining the methods an attacker could use to exploit this. This is especially disconcerting since tax season is here and, thanks to the Affordable Care Act, we’ll all be needing forms showing proof of medical coverage, but more on that later. Back to the BCBS domains….

The first thing I noticed was the inconsistent use of the dash (-) character.  For example if I want to visit Georgia’s BCBS site I can use use,, or  I found that only four other states returned a 200 status for names with the dash ex: bcbs-$

  • is under construction, and the owner listed is BlueCross BlueShield of Vermont
  • resolves to
  • and are currently parked for free at GoDaddy, and the owner information is not available.

I have not inquired with SC/NC BCBS to determine if they own the domains listed above (the ones with the dash).  I also cannot elaborate as to why there is no DNS record resolving each of the Carolina domains above to a primary one as MT did.  It is possible a malicious actor/s own/s the NC/SC domains, although currently that is purely speculation. The final observation that made me decide to script this out and just see how much room there is  for nefarious activity was finding that some states don’t even use BCBS in the URL for example

Deciding where to start wasn’t very difficult.  There are many logical names that could be used for a phishing expedition, but I wanted to stay as close as possible to the logical and already known naming conventions. So I opted not to check for domains like “” or iterations with the state spelled out.  I settled on eight different possible combinations.   As seen with the domains for BCBS of GA, the state abbreviation always appears after BCBS, so I checked for domains with the state at the front as well, and both an HTTP and HTTPS response.  I also checked for domains with the dash before and after the state abbreviation.  Math says that 8 combinations (seen below) * 50 states = 400 possible domains.


The results were a bit unnerving…

It took ~13.5 minutes using 18 lines of Python (could be fewer but I was being lazy) on a old, slow laptop, to check the 400 possibilities to learn the following:

  • 200 status = 69 domains
  • 403 status = 02 domains
  • 404 status = 02 domains

Leaving 329 domains available for purchase, and the price for many of them was less than $10.  Keep in mind, I did not verify ownership of the 69 domains, but if I’m a bad guy, I don’t really care who owns them because I’m only looking for what’s available for me to use.

Now back to the tax forms I mentioned earlier….

We teach users not to click on links or open emails that they aren’t expecting, so can you blame them if they click on a link in an email that says “click here to download your 2017 proof of medical coverage, IRS form 1095”?  After all, the IRS website even tells us that we will receive them, and that for the B & C forms the “Health insurance providers (for example, health insurance companies) will send Form 1095-B to individuals they cover, with information about who was covered and when.  And, certain employers will send Form 1095-C to certain employees, with information about what coverage the employer offered.”

Remember all that information lost in the Anthem breach a few years ago? Or the Aug 2016 BCBS breach in Kansas? Hrmmm, I wonder how those might play into potential phishing attacks.



How you choose to mitigate this vulnerability is up to you and the solution(s) you come up with will vary depending on your company size, geographic dispersement of employees, and network architecture among other things.  Some of you may choose to update your whitelists, blacklists or both.  Some of you may use this opportunity as an educational phishing exercise soon, but whatever your solution is, I hope includes pro-active messaging and education for your users.

Finally, if you or someone you know works at a healthcare provider and has the ability to influence them to purchase domains that could be used to phish the employees and/or individuals they cover, I strongly encourage you to share this article with them. You can also try convincing management that not only are you preventing a malicious actor from having them, you could use them for training. While BCBS is the example used here, they are not the only provider out there and this problem is not unique to BCBS or its affiliates.  However, if BCBS licenses it’s affiliates, then enforcing 1) standardized naming conventions for URL’s and 2) requiring them to purchase a minimum set of domains to minimize risk of malicious phishing doesn’t seem unreasonable.  Considering the prudent man rule, I think a prudent man would agree the financial burden of purchasing a few extra domains, is easily justified by the impact of the risk reduction.

Thanks for taking time to read, and for those of you with mitigation ideas, please share your knowledge in the comments, and if you’re new to infosec and want to ask a question about mitigations please ask it.  I only require that comments be constructive and helpful, not negative, insulting, derogatory or anything else along those lines.

Specific details for the 1095 forms can be found here.

Thank you my dear friends for your proofreading, for the laughs, and most of all your time and support.

What’s Under that Threshold?

This blog post is meant to be short, sweet and to the point so please forgive the brevity if you were looking for something in depth this time….


Many of us are trained to get the big fish, find the next cutting edge threat, defend against the big blob of red in the graphic of some ridiculous C-level slide presentation. We sit, eyes locked on some SOC tool waiting for bells & whistles to go off, the emails to start flying, the lights to flash to wake us up because we’ve fallen asleep from boredom all because we’ve place our trust in a tool to tell us on what we should focus our attention. So, how often do you go digging, or lift up the lid on something peeking to see what’s inside? What are you doing about the quiet, smart bad guy who’s tiptoeing in just under your alert criteria? You know, the one who isn’t making a lot of noise on your network, the customer doing the dirtiest of deeds, just under your thresholds for your automated alarms?


Well, if you know what your thresholds are for automated alerts, why aren’t YOU looking at what lies beneath it? Is it because you think nobody with malicious intent would take the time to do X in such small quantities because it wouldn’t pay off? Is it because your tool is awesome and perfect *cough*cough*cough*cries*grabs water*? If you answered yes, to the 2nd or 3rd question please allow me to share some good ol’ country advice that has served me well is “He who underestimates his enemy, has lost the first battle in the war.”


So without divulging the details to my current research, I’ll share a few things I’ve been noticing lately. First is bad guys doing a little here, a little there regarding purchasing domains. Instead of buying in bulk, they’re buying a few each day at a time. So, if you’re selling domains, maybe you want to take a look at any customers who are buying in quantities just below your “alarm” threshold and who are NOT buying via your bulk discount programs. I mean seriously, what does one individual need with a couple hundred domains, that he/she wouldn’t want to take advantage of bulk discounts? I mean, they could just be a legit business that doesn’t know any better, but I’m gonna guess not. It might be worth checking those domains out using tools such as OpenDNS, Domain Tools, Threat Grid, and Virus Total. Are the domains registered, more than 30 days old and still do not have a websites? What’s the aggregate amount of domains purchased in the last 30 days and how old is the customer account? Does the data on the domain registrations, match that on your customer’s account? Does the data on the domain registration match ANOTHER customer account? If you find that your customer’s domains are popping hot, ya just might want to take a leeeetle-bit closer look at their activities.

Let’s look at another OSINT source you have….customer access logs. The second thing I’ve been noticing is bad guys creating DNS entries a little here, a little there. So you found a guy, flying below the radar (could be a girl, but just go with me here) with the daily number of domains being purchased under your alarm level. Maybe you provide infrastructure not domains, so you offer DNS, and you have a customer flying below the radar making lots of DNS records. Do your tools alert you when a customer logs into his/her account from multiple ASN’s or ASN’s in different countries? I mean if a guy logs in for <5 minutes, makes DNS records, and logs out all from from Romania on Sunday, Russia on Monday, Great Britain on Tues....etc either he's racking up some serious frequent flyer miles or he might be up to no good. AGAIN, there COULD be a perfectly legitimate explanation (none come to mind immediately) but you won't even know unless you go looking. If you're providing website hosting, do you have a customer that has hundreds of completely unrelated domains pointing to a single IP? I once found a guy with over 900 malicious domains all pointing/pointed to a single IP...I wanted to say to the provider "Seriously you don't notice?" *SUMMARY* So the point of today's topic - start looking BELOW your automated thresholds for the really bad guys. Be pro-active, stop waiting for bad guys to wave, shake your hand, and say hello. Thanks again for taking time to read the blog and feel free to share comments, DM me on twitter, or just tag and say hi!

Shodan – A Boogeyman’s BFF

If you’ve ever heard me talk on OSINT one of the points I drive home is one I learned early from a colleague, Ian Amit (@iiamit) that if what you present doesn’t cause a change in behavior, it isn’t threat intel, it is intel/information.  Here’s a story on how I used OSINT techniques on my own organization in multiple ways, to cause a change in behavior.

Once upon a time in a land far far far away….there were device administrators that secured their devices properly….

/me wakes up disappointed

During my governance, risk and compliance days, before OSINT was a buzz word in the industry, one of the things organizations wanted to know (without hiring/contracting a pen-tester) was how vulnerable they were to “hackers” [I use that word sparingly as it has a very evil connotation to the ignorant masses].   Knowing they just asked me to boil the ocean, I worked to get them to narrow it down, and identify three things:

  1.  WHAT are you worried about being attacked (i.e. specific assets)

And let me be the first to say that if the org doesn’t have a decent Asset and Data Classification Policy that’s actually implemented HA! sucks to be you.

2.  WHICH attack vectors concern you the most

3.  HOW do you want me to answer you (reporting format)

So after getting those nailed down,  I decided to finally put all the hours of education to good use so I felt less guilty about spending all that money getting a degree just to get past the HR gremlins that eat resumes.

We didn’t exactly have a threat model, and being in the “Risk Department”  (pfft!) they weren’t going to listen to me tell them they needed one.  [BTW Risk Analysis != Threat Modeling] Nonetheless, I realized the scope of concern they had included threats to network assets [as opposed to software, people, places etc].  Thus I went forth to identify vulnerabilities that c/would be exploited, and immediately went to a wonderful sight called Shodan 

screen capture from
Shodan most popular searches

that will tell you all kind of “wonderful” things about an organization’s threat vectors.  Leveraging a little knowledge of SQL and URL hacking I began running queries to check for some basic vulnerabilities that were not only available for my own perusal, but they were equally available for every other evil derp that didn’t like “us”.    I proceeded to exclaim rather loudly in the office “Are you Fuc41n6 Kidding ME?!” as I saw the results pour in.  So – now I knew it was not just bad…it was like Satan just gave a free pass on the bullet train straight to hell and you could hear him laughing like it was a carnival ride.

I hung my head in dismay, thinking – how am I going to communicate to “Management” just how bad this is?  Afterall they get vulnerability scans quarterly, monthly, weekly and in some cases daily – and they STILL don’t think the problem is “that bad.”  Technically, the Shodan results are nothing more than another data set reflecting vulnerabilities.

Then I remembered some very wise words

The supreme art of war is to subdue the enemy without fighting -Sun Tzu

So I put together an initial OSINT report of generic threat actor profiles that would like (and probably already were) exploiting that exposed via Shodan, but I didn’t send it. Instead, first I took what I learned in Shodan and I created a “How to Sho-Dan” (pun on a C-levles name)  slide deck.  I mean, nobody is ever going to believe my report, I’ll be lucky if 1/3rd of them click on a single link and even luckier if 1/10th of them even understand what they’re reading/clicking on.

Then, I OSINT’d (ummm yeah that’s a word now just roll with me here) so I OSINT’d my fellow employees.  I read their social media profiles, eavesdropped at the water cooler, socially engineered (SE’d) them over coffee to figure out what were 1) their favorite & most hated places for work-hosted events 2) their favorite conference room 3) their idea of “fun” learning at work was.  Then I SE’d my boss into spending money, used his corporate credit card (with his approval), and set up a Lunch & Learn for non-security IT people including devs, netops team, help desk etc.  With food & drink in hand, and a promise of a prize for anyone who could tell me what the query revealed we began learning How To Sho-Dan.


When it was all over they realized some very critical things:

  1. NONE of them had to even create an account to run a query…wut?! this is Open Source?!
  2. They didn’t have to know SQL or URL hacking, they only had to know key words and use the search boxes
  3. If they did have an account, they could get even more comprehensive reports

THE SINGLE MOST IMPORTANT LESSON:  If they could do it – so could bad guys, and there were definitely some serious boogeymen in the world.


I had successfully moved from data to information to intel to threat intel because the Lunch & Learn, combined with the OSINT report I provided caused a change in behavior, otherwise it was just intel and more vulnerability data.

I sent the OSINT report to the managers that had signed up for (even those that didn’t attend) the Lunch & Learn, and now with them empowered with context and a better understanding of the threat vectors,  I watched change explode.

  1. The vulnerability remediation tickets started getting a lot more love by all departments.
  2. The network team implemented changes to their firewall approval process, patching firmware, and network architecture.
  3. The developers began reconsidering what ports they really needed
  4. The server team modified their provisioning process to include a security review/approval milestone that was a show stopper.
  5. I even convinced C-levels to plan for an internal pen-testing team.


  1. If minimally tech savvy people can do/google/youtube it then so can the bad guys
  2. OSINT on your own team is not evil 🙂
  3. Sometimes an OSINT report is far less valuable than an OSINT hands-on


If you want to see a very hilarious and scary presentation go watch my colleague Dan Tentler’s (@Viss) talk from #DEFCON2015 as he exposes ridiculously huge #Fail of things accessible via the Internet.

Below are a list of the (sterilized) Shodan Queries that I used during the training and to generate a report on an OSINT tool that could/was being leveraged by threat actors targeting the organization.

  1. Hosts found w/ banner details stating “230 – Any Password will work”
  2. Hosts found with banner stating “Use ‘passwd’ to set your login password this will disable telnet and enable SSH”
  3. Hosts found with banner stating “230 Anonymous access granted, restrictions apply””230+Anonymous”+”root”+org%3A”Company_Name”
  4. FTP Servers reflected as allowing Anonymous access
  5. Anything Company_Name”Company_Name”
  6. Company_Name & Default Passwords
  7. Company_Name, Password
  8. Company_Name and OpenSSH Ports
  9. Company_Name and Splunk on port 8089
  10. Company_Name, MySQL on port 3306
  11. Company_Name, “200 OK”, “Set-Cookie expires 2016”

For use with the Search Box if you don’t like the URLs

  • city:”$city”
  • country:$country
  • geo:$lat,$lon
  • os:$operatingSystem
  • net:$ipRange/$cidr
  • org:”$OrgName”
  • product:”$product name in here”
  • isp:”$ISP Name Here”
  • asn:”AS######”
  • devicetype:”firewall”
  • ports:80, 443

Words Matter

One of the single most important techniques/activities when gathering intelligence (i.e. intel) from open source repositories is analytic reading. The second is properly presenting data/intel with relevant context.


This isn’t the kind of reading you do in the summer with a children’s book and litter of rug rats gathered at your feet, this is the kind of reading one does where you look for hints or clues about a person based on phrasing or word choice. Now you don’t need to have a degree in psychology or grammar to do this, you simply have to pay attention, take notes, and apply a little common sense.

Let’s take my request for help from the #InfoSecFam on ideas for my first blog. Here were the responses I got (thank you to the brave souls who dare support me) :p

  1. Well, you could start with those lovely examples of people posting pics of credit cards…
  2. Then folks posting about going on vacation on their facebooks…
  3. Maybe some military types posting pics with intact exif?
  4. <graphic> #internetfeds
  5. google hacking is still incredibly viable, and it’s a huge OSINT fail.
  6. specifically anonymous FTP servers indexed by google.
  7. <graphic> bad admins everywhere. Really bad. Ive seen some sh1t man
  8. Boarding passes are now a big thing… “I Know Where You LIve: all the sh*t that people post”
  9. You could do reviews of OSINT web-tools
  10. ok, an oldie being forgotten, ‘don’t run with admin/root’.

Just a Little Intel…

So, let’s analyze what we’ve read. [Note this example is very trivial, however the principles presented are not.]

  • Q1: What’s the culture/industry of the authors here?
    • A1: #InfoSec
  • Q2: What are underlying characteristics of this group’s communication styles?
    • A2:  InfoSec culture is heavily sarcastic
  • Q3:  Are there clues to anyone’s profession/hobby listed in these comments?
    • A3:  Yes – acronym and word choice: FTP, intact exif, bad admins everywhere, ‘admin/root’
  • Q4:  Any clues to age or experience?
    • A4: Yes –  still incredibly viable, oldie but goodieI

The list of questions above is a trivial example of how to glean the not-so-obvious intel that is implied.  Nonetheless, the questions asked and answered, should be driven by a few things, two at minimum: a profile template and a threat model [otherwise you’re out there going all Willy Nilly and traipsing through minefields of soggy cow patties.]  SO! Before you even start gathering Intel, your leadership should have identified WHAT they want to know (identified in the threat model) and HOW you will collect it (defined in the documentation standards and profile template).  So as you do answer these very valuable questions, you’re looking for the same data points, all the time, essentially filling in the pieces of a puzzle one at a time.  Keep in mind, they may not all be present, but at least you’re looking for them.  As you get them, you should be capturing them in a profile template.

The list of questions could go on and on depending on how much of the ocean you’re planning on boiling, and tools such as the IBM Tone Analyzer (demo link here) or the IBM Personality Analyzer can offer valuable insight as well, but tools are no replacement for instinct.  While these tools may enhance or even expedite the analysis process, they cannot replace an Analyst’s instinct and skills of discernment as they read something and decide what “box” to put it in, if it is relevant, indicates personality traits, warrants in/exclusion or is a thread that needs to be pulled to see what else unravels.

Takeaway:  Read closely, carefully, and never under estimate the human factors at work.  Read between the letters AND the lines.  You may find clues you need when building a profile or finding a target simply by the nuances in their tiniest commentary.


So let’s talk about the biggest mistake with the list…. It’s in numerical order! If you were only reading this an OSINT report, you might think these came from 10 different people or one person provided 10 ideas. So, by creating a pure LIST of comments rather than a LIST with logical grouping, we lose context because multiple comments were made by some of the same individuals,

Let’s fix that….

P1-1. Well, you could start with those lovely examples of people posting pics of credit cards…
P1-2. Then folks posting about going on vacation on their facebooks…
P1-3. Maybe some military types posting pics with intact exif?

P2-1. <graphic of chat> #internetfeds
P2-2. <graphic of man hiding in a chair> bad admins everywhere. Really bad. Ive seen some sh1t man (BTW @MyTinehNimjeh I <3 u man LOL)

P3-1. google hacking is still incredibly viable, and it’s a huge OSINT fail.
P3-2. specifically anonymous FTP servers indexed by google.

P4-1. Boarding passes are now a big thing… “I Know Where You LIve: all the sh*t that people post”
P5-1. You could do reviews of OSINT web-tools
P6-1. ok, an oldie being forgotten, ‘don’t run with admin/root’.

Now you see there were actually 6, not 10 people who replied (P# meaning Person 1, Person 2, Person 3…-1, -2 being the comment number they made).

Additionally, this context represents something else taken for granted by the statisticians an API monkeys – it isn’t always the total volume that matters, sometimes it’s the volume of one person, or even the lack of replies to others who may have forked a conversation thread.  If this thread were listed as a statistic, stating that there were 10 comments, that too would also be incorrect.  There were actually a few different forks, some took a humorous path, others were simply “neutral” suggestions, AND there were more than a total of 10 interactions.  This list however, only represented those comments that were actually relevant to the request for help with ideas which were extracted and placed in this article.  Again, in your OSINT reports, ensure you represent relevant intel accurately, and provide the reader proper context through commentary and presentation.

Takeaway – ensure that HOW you present data in a report represents it with as much relevant context as possible.