Tag Archives: threat intelligence

When and How to Hire a Threat Intelligence Analyst

WHEN…

Threat Intelligence has become the latest marketing buzzword, often abused and misused in an effort to impress a customer base. So, when do you need threat intelligence and when is the right time to hire someone to “provide customers” with threat intelligence? Well, you should never hire someone specifically to provide customers with threat intelligence, unless that is the product you are specifically in business to produce. You can read more about this in the blog “Three Myths about Threat Intelligence.

Typically, you would be ready to hire a threat intelligence analyst once you’ve established mature security practices for your organization. This is not to say that a Threat Intelligence team cannot be set up and designed to grow as the company grows, however, it is typically a strategic investment where the Threat Intelligence team’s first role is to serve internally, supporting decision makers, it also serves to strengthen the security posture and proactively detect, deter, and destroy/avoid threats. While start-ups would benefit from understanding threats to their products, people, facilities, and customer data, they do not typically plan for the capital investment to support threat intelligence efforts. Additionally, Threat Intelligence teams do not normally generate products for revenue; rather, they serve to inform decision makers about potential threats on the horizon, protect the organization from internal and external threats to people, property, and assets, and in rare instances provide competitive advantage. In short, you are probably ready to hire once you are ready to make a strategic investment and take a proactive approach to security and threat detection, deterrence, and avoidance.

Below is a brief checklist of things an organization should achieve before being ready to hire a threat intelligence analyst.

  • Mature security processes and culture in place
  • Obtained CEO, CFO, CIO support and buy-in from Legal, Marketing, Physical & Information Security
  • Structured the Director of Threat Intelligence and his/her team to report directly to a C-level officer, optimally the Chief Security Officer
  • Completed a threat intelligence program charter and program outline
  • Defined the immediate intelligence requirements
  • Defined communications plans for intelligence dissemination internally and externally

ONE PERSON CANNOT EFFECTIVELY SERVE TWO MASTERS

Once you’ve completed the tasks above, you should be ready for the next phase – hiring in preparation for collection and analysis. You should not have started any intelligence collection aside of what may already be generating inside individual departments: network logs, market reports, incident reports, etc.

Your first hire should be a managerial role that will oversee the persons performing collection and analysis. While it will be immensely beneficial to hire someone who has experience within the intelligence community, it is not a requirement. Conversely, someone skilled in managing “geeks” or “nerds” is a minimal requirement.  

When under tight budget constraints, companies often try to cut corners and hire someone skilled in both collection and analysis, having them perform both full-time roles, i.e. two masters. This does not scale and is not sustainable. While it may work initially, you will quickly learn that time spent serving the first master Collection & Processing (collecting intelligence, developing tools, and tuning collectors) is time that cannot be spent serving the second master Analysis and Reporting (doing robust analysis of the threat data that has been collected). The individual cannot serve two masters (do both jobs) indefinitely.

At a minimum, you should plan on having a developer to focus on developing, integrating, and tuning intelligence collection tools. This person will also work with analysts to develop tools and processes for converting the collected data into formats the analysts can use, a phase known as intelligence processing. The team/person responsible for developing the tools will have an intimate relationship with the analysts consuming the data/information that has been collected and processed. Whether you hire the threat intelligence analyst or the developer first is not important, however, them being able to effectively communicate with each other and having a solid understanding of what the other one does is important.

HOW…

Know the traits you need in a threat intelligence analyst and realize a great analyst may not have “analyst” in their previous job titles. More importantly, a person’s mindset and character often make the difference between a good and great analyst, not their years on the job. A good threat intelligence analyst, while unique in their own way, shares many characteristics with analysts from other disciplines. So, what traits and skills should they possess?

First, they should be able to WRITE CONCISELY. This is a skill commonly found in journalists, historians, and researchers. Look for someone who has experience in public affairs, school newspapers, blogging.  If an analyst cannot communicate the importance of the threat in a short, concise manner, decision makers will likely not find value in their reporting. If an analyst cannot show value, leaders can (and often quickly) form the opinion threat intelligence is a useless money pit.

Second, a good analyst is a professional tin-foil hat model, never trusting an analysis without knowing what methods and data were used to generate a report and how it was collected. They are skeptical, ask lots of questions, and think outside the box.

Third, they should be humble, admit their mistakes, and learn from them. Sometimes an analysis can go horribly wrong, and when it does, it makes front page news. This doesn’t necessarily mean the analyst is a bad analyst, at least as long as they learn from it. It may be they were pressured to provide a report based on insufficient or corrupted source data and didn’t push back for more time to consider other explanations of the data, or maybe they were unaware of their own bias. Whatever the cause, a good analyst can identify where the analysis went wrong and learn from the error(s).

Fourth, a threat intelligence analyst needs to have comprehensive knowledge on the subject or be able to quickly ramp up. For example, an analyst with one year of security experience who also has in-depth knowledge of religious and cultural practices from a geographic region where your biggest threats reside can be just as valuable to a threat intelligence team as someone with ten years of security experience and no relevant geographical or religious knowledge or experience.

Fifth, they know the tools and data resources available for collecting intelligence. Often, the hardest part of collecting intelligence is knowing where it is, how to get it, and the ability to find new sources.

Sixth, a good analyst has refined technical skills with respect to understanding how data is/was collected and processed, as well as knowing when data is missing and being able to explain why it is missing. This helps them know when to question the collection results and how to work with the collection team to tune the methods, techniques and processes. Additionally, they should have advanced skills when it comes to collating data points for analysis in order to identify relationships and trends.

Finally, they should have a solid understanding of and experience in developing and testing hypotheses, to include communicating the methods used, assumptions made, data that is missing, and potential biases.

GOOD TO GREAT

A great analyst is one who is willing to review someone else’s hypothesis, theory, model, etc. Then, if the data supports it, admit that while his/her assessment may differ, that they are both viable. Many times, the best analysis can be a hybrid of theories from different individuals who had very opposite starting points, combining the best of each analysis to create the final product. Additionally, when a theory or hypothesis is disproven, or the data doesn’t support it, they need to have a “no-quit” mentality in continuing to chip away at it until they have a theory that is supported by the data.

In addition to willfully accepting other’s evaluations and assessments, a great analyst is also cognizant of his/her own bias. For example, a 50-year old male analyst from Ohio who grew up in a Christian home and never traveled more than 250 miles from home is probably going to have a very different set of biases that influence his/her analyses than a 50-year old male analyst from Mississippi who spent 20 years in the military and is an atheist. The ability to admit one’s own bias is something that is often found in someone that is able to have academic discussions, being able to say, “I understand your argument, I just don’t agree with it.” Being self-aware and able to admit one’s own bias is a trait often overlooked in the interview process.

IT’S ALL ABOUT THE BIAS…

So, which of all of these things discussed above is the most critical characteristic of a threat intelligence analyst? Only the last one, the ability to admit one’s own bias. You definitely hope to find a threat intelligence analyst who embodies all of the listed traits that constitute a good and great analyst, however, at the end of it all, the ability to admit one’s own bias turns out to be the foundation upon which most of these other traits sits.

Finally, the most important, they are willing to admit when they are wrong, and even more importantly, when someone else is correct.

Three Myths About Threat Intelligence

Word Count: 678
Estimated Reading Time: 3 -4 minutes

 

  1. Threat intelligence is something you should provide your customers

If threat intelligence products are not your flagship product or primary business function, then threat intelligence is not something you should provide as a product or service directly to your customers. Threat intelligence is more than just blogs about the latest malware; it is a full scope business function that serves the organization strategically, operationally, and tactically. While threat intelligence may direct/influence the actions taken at the tactical level (i.e. to protect internal assets such as networks, intellectual property, and (customer) data), the intelligence itself and methods by which it is developed should not be released to your customer base as a product. In some rare instances, corporations have full teams dedicated to developing threat intelligence, which in turn is disseminated internally; these are usually organizations with very mature security practices and processes. While they may eventually publish what they learn via a corporate blog, the team’s function is to serve the organization, not provide a product to the customer.

NOTE: This should not be interpreted to mean that intelligence should never be shared or disseminated to customers. That is a discussion that goes beyond this article’s scope.

NOTE: In short, if you have not mastered the art of developing threat intelligence in-house, you should not be offering it as a service or product.

 

  1. Threat intelligence is nothing more than advanced information security or “googling”

Threat intelligence itself is a proactive approach to security, while an information security practice (or department) is a consumer of the details generated from threat intelligence. A true threat intelligence program consists of governance and compliance, data/intelligence collection, processing, analysis, reporting and dissemination. A Threat Intelligence team combines data from the information “cyber” security domain with data from multiple domains and disciplines such as history, economics, political science, education, religion, industry/market-specific trends, and cultural studies to define the threat. An information security department often generates data (i.e. incident post-mortem) that may be synthesized with various other sources in order to generate a holistic threat picture, because they themselves are a target. While the Information Security team may generate threat data points at a tactical/operational level, such as details about the latest denial of service attack or phishing campaign, they are not generating actual intelligence, or in other words, they are not defining the threat.

 

  1. Threat Intelligence is a “cyber” thing

While threat intelligence has many faces and a fully-fledged Threat Intelligence Program serves multiple departments, its primary mission is to support C-suite decision making by educating decision makers so they can make well-informed decisions with as much available information as possible. Supporting other departments is a secondary role, albeit still important. The Marketing department benefits from information about threats to the corporate brand and works with the Legal department to thwart it. The Legal department benefits from information about threats posed to copyrights or trademarks, by specific individuals or business partners, and anything exposing the company to potential litigation. The Human Resources department benefits from information about threats posed by personnel, especially for mission-essential roles. Pretty much any department that works with sensitive strategic information, plans, projections, forecasts, or highly sensitive data such as intellectual property, customer data, or security-related information can benefit from threat intelligence.

OSINT Data Sources: Trust but Verify

Thanks to @seamustuohy and @ginsberg5150 for editorial contributions

For new readers, welcome, and please take a moment to read a brief message From the Author.  This article’s primary audience is analysts however if you are in leadership and seek to optimize or maximize the analysis your threat intelligence program is producing you may find this walk through valuable.

OVERVIEW

In the recent blog Outlining a Threat Intel Program, https://www.osint.fail/2017/04/24/outlining-a-threat-intel-program/, steps 4 and 5 discussed identifying information needs/requirements and resources.  This blog is going to expand on the latter, identifying the resources.  Previously we discussed that when identifying resources, you start with what you have inside your organization and that sometimes the information is in a source to which you do not have access or it is sold from a vendor.  There is another category not previously mentioned, an obvious one, open source, freely available (usually on the Internet).  Free is good. Free is affordable. Free can be dangerous.  Before you decide to trust/use a free source you need to adequately vet that source.  Below I will talk about a recent project, a free source that was considered, how it was vetted and the takeaways which are 1) ensure accuracy and completeness of OSINT sources before they are considered reliable or relevant; 2) know the limitations of your OSINT sourced data; and 3) thoroughly understanding any filtering and calculations that occur before source data is provided to you.

GETTING THE RAW DATA AND CONVERTING IT FOR ANALYSIS

Raw Data on github here. https://github.com/grcninja/OSINT_blog_data  file name 20170719_OSINT_Data_Sources_Trust_but_Verify_Dyn_Original_Data.7z

Over the years I have worked on a few projects with ties to the Ukraine.  During a conversation with colleagues, we came up with a question we wanted to answer:  Is there any early warning indicator of a cyb3r attack on a power grid that might be found in network outages?  We reasoned that even though businesses, data centers and major communications companies have very robust UPS/generators available, an extended power outage would surely tap them and thereby cripple communications which could have a ripple effect during an armed conflict, biological epidemic, civil unrest etc.  So, I decided to see what data was out there with respect to network outages and then collect data related to power outages.  I settled on my first source,  Dyn’s Internet Outage bulletins from here http://b2b.renesys.com/eventsbulletin (filtered on Ukraine). who according to their site the organization “helps companies monitor, control, and optimize online infrastructure for an exceptional end-user experience. Through … unrivaled, objective intelligence into Internet conditions… “

The effort to script this versus the time it would take to copy/paste the few pages of data I needed to retrieve led me to opt to do this old school and copy the data straight off the web pages.  If this is something that I find I will do more than two or three times, then I will consider asking for API access and automating this task. However, for the time being, this was a one-off-I-am-curious-and-bored investigation & manual labor settled it.  I scraped the approximately 480 entries available at the time, going back to the first entries in March 2012 (there are more now of course).

I used a mix of regex-fu & normal copy/paste to morph the data into semicolon delimited lines of data.  I chose semicolons because there were already commas, spaces, and dashes in the data that I was not sure I wanted to touch and since I had less than 2 million rows of data and I love MS Office, I opted to use semicolons so that I could manipulate it in MS Excel more easily.

Below is an example of a typical entry:

6 networks were restored in the Ukraine starting at 16:03 UTC on April 11. This represents less than 1% of the routed networks in the country.

100% of the networks in this event reached the Internet through: UARNet (AS3255).

Let the metamorphosis begin:

replace -> networks were restored in the Ukraine starting at ->

6;16:03 UTC on April 11. This represents less than 1% of the routed networks in the country.

100% of the networks in this event reached the Internet through: UARNet (AS3255).

delete ->  on

6;16:03 UTC April 11. This represents less than 1% of the routed networks in the country.

100% of the networks in this event reached the Internet through: UARNet (AS3255).

replace -> . This represents -> YYYY; [I saved each year’s data as separate text files to make this search/replace easier later so that I can do one year per file]

6;16:03 UTC April 11 YYYY; less than 1% of the routed networks in the country.

100% of the networks in this event reached the Internet through: UARNet (AS3255).

replace -> less than ->  with < less than symbol

6;16:03 UTC April 11 YYYY; < 1% of the routed networks in the country.

100% of the networks in this event reached the Internet through: UARNet (AS3255).

replace ->  of the routed networks in the country. (including the newline characters) -> ;

6;16:03 UTC April 11 YYYY; < 1% ; 100% of the networks in this event reached the Internet through: UARNet (AS3255).

replace ->  of the networks in this event reached the Internet through:  -> ;

6;16:03 UTC April 11 YYYY; < 1% ; 100;UARNet (AS3255).

replace -> (AS -> ;AS

6;16:03 UTC April 11 YYYY; < 1% ; 100;UARNet;3255).

delete -> ).

6;16:03 UTC April 11 YYYY; < 1% ; 100;UARNet;3255

 

At this point, each of the individual original files contains mostly lines that look like above with the added caveat that I replaced YYYY with the appropriate year for each file.

And like any good analyst I began to scroooooooolllllllllll through over 400 lines hoping my eyes would not catch anything that wasn’t uniform.  Of course, I wasn’t that lucky, and instead I found entries that were ALMOST worded the same, but just different enough to make me wonder if this “reporting” was automated or if it was human.

  • 59 networks experienced an outage…
  • 30 networks experienced a momentary outage…
  • 6 networks were restored…

Do you see it?  I now had three kinds of entries, “restored” which implies that there was an outage.  I also had “outage” and “momentary outage”.  I performed a few more rounds of semicolon replacement so that I could keep moving and noted the issues.

DATA QUALITY AND ANOMALIES

I noted that I had inconsistencies in the entries, but I wasn’t exactly ready to write off the source.  After all, they are a large, global, reputable firm and I believed that there could be some reasonable explanation, and continued to prepare the data for analysis.

The statement “This represents less than 1%” isn’t an exact number so I reduced the statement to “;<1%” to get it into my spreadsheet for now. I wanted to perform some analysis on the trends related to the impact of these outages and planned to use the ‘percentage of networks affected’ value.  To do this, I was going to need a number, and converting them seemed like it should be relatively easy.  I considered that new networks are probably added regularly, although routable networks not very often, and decided I should try to estimate the number of networks that existed just prior to any reported outage/restoration based on a known number.  So, I turned to the statements that were more finite.  Unfortunately, to my surprise, there were some (more) serious discrepancies.  For example, on 5/8/2012 there were two entries one for 14:18 UTC, the other 11:39 UTC.  The first stated that 97 networks were affected which represented 1% of the routed networks in the country (not less than 1%, exactly 1%), and math says 97/.01 = 9,700 networks. The second entry stated that 345 networks were affected which was 5% of the routed networks, and math says that 345/.05 = 6,900 networks.  How could this be?  In less than three hours, did someone stand up 2,800 routable networks in Ukraine?  I don’t think so.  I performed the same calculations on other finite numbers for very close entries most of them on the same day and found the same problem in the data.  This is no small discrepancy and I can assume that the percentages being reflected on the web page have undergone some serious rounding or truncation, and therefore decided to remove this column from the data set and not perform any analysis related to it.  Instead I kept the value that, for the time being, I trusted, the exact number of networks affected.

In the data set, there were 10 entries that did not list a percentage of networks that were able to reach the Internet either during the outage or after the restoration (the range of networks affected was from 7 to 105), and these 10 entries were removed from the data set.  These 10 entries also lacked data for the affected ASN(s) as well. To avoid skewing the analysis, these incomplete entries were removed.  There was one entry of 23 affected networks 2/17/2014, where the affected ASN was not provided, but all other information was available, and an entry of UNK (unknown) was entered for this value.

Additional name standardization was completed where the AS number (ASN) was the same, but the AS names varied slightly such as LLC FTICOM vs. LLC “FTICOM”.

The final data set content that remained was 97.8947% of the original data set with data from March 22, 2012 16:55 UTC to July 5, 2017 21:40 UTC.

Let’s review the list of concerns/modifications with the data set:

  1.  inconsistent reporting language
  2.  lack of corresponding outage and restoration entries
  3.  inconsistent calculations for the percentage of routable networks affected
  4.  incomplete entries in the data set
  5.  AS names varied while the AS number remained the same

THE ANALYSIS TO PREPARE FOR  ANALYSIS

First, I started with the low hanging fruit.

  • Smallest number of networks affected: 6
  • Largest number of networks for one ASN affected: 651 (happened on 5/18/2013)
  • Average number of networks affected: 74.93534483
  • How many times were more than one ASN affected? 253 which is slightly more than half of the total events.
  • Were there any outages of the same size for the same provider that were recurring? YES
    • ASN 31148, Freenet (O3), had three identical outages of 332 networks 10/2/2014, 1/8/2015, 11/25/2015
    • ASN 31148, Freenet (O3), had four identical outages of 331 networks 8/11/2014, 10/6/2014, 11/5/2015, & 8/3/2016

At a glance, it seems like the low hanging fruit are what was expected, but I am still not sold that this data set is all that it is cracked up to be, and I decided to check for expected knowns.

The Ukraine power grid was hit on Dec 23, 2015 with power outage windows of 1-6 hours; however, the only network outages reflected in the data for Dec 2015 are on 1, 2, 8, 27, 30 & 31 December. The next known major outage was more recent, just before midnight on 17 December 2016 and again, there is not a single network outage recorded in the data set.  This puzzled me, so I began to look for other network outages that one would reasonably expect to occur.  In India, 2012 there were outages on 30 & 31 July https://www.theguardian.com/world/2012/jul/31/india-blackout-electricity-power-cuts and according to The Guardian, sources “Power cuts plunge 20 of India’s 28 states into darkness as energy suppliers fail to meet growing demand” and if you follow the article, one of many, you get reports that power was out up to 14 hours.  [I encourage readers to check sources, so you can find 41 citations on this wiki page https://en.wikipedia.org/wiki/2012_India_blackouts].  Then I went back to the data source, filtered on India, and magically there were no reported network outages that paralleled the power outages.  Then I decided to start spot checking USA outages listed here https://en.wikipedia.org/wiki/List_of_major_power_outages because this list has the following criteria:

  1.     The outage must not be planned by the service provider.
  2.     The outage must affect at least 1,000 people and last at least one hour.
  3.     There must be at least 1,000,000 person-hours of disruption.

And sadly, I chose three different 2017 power outage events none of which seemed to be reflected in the data set of network outages.  It was time to decide if the data set I wanted to use was going to meet my needs.  After reviewing what I had learned after normalizing and analyzing the data I summarized the concerns:

  1.  inconsistent reporting language
  2.  lack of corresponding outage and restoration entries
  3.  inconsistent calculations for the percentage of routable networks affected
  4.  incomplete entries in the data set
  5.  AS names varied while the AS number remained the same
  6. known events aren’t found in the data set

All things considered, I opted not to use this OSINT data source for my analysis as there were too many discrepancies, especially the lack of “known” events being reflected.

THE LESSON & CONCLUSION

Because this was a sudo personal endeavor that only paralleled some of the work I have previously done, I took a very loose approach to the analysis and I did not verify that the data set I was considering met my needs regarding completeness & accuracy.  I did however assume that because the data was being made public by a large, reputable, global company that they themselves were exerting some level of quality control in the reporting and posting of the data.  This assumption of course is flawed.  Currently, I am unable to come up with an explanation for so many multiple data points being absent from the dataset.  It is possible that this major global Internet backbone company simply has no sensors in the affected 71% of India or they are only in the 29% that weren’t affected.  Perhaps all their sensors in the Ukraine were down on the exact days of the power outages there.  It is also very possible that there were enough UPSs and generators in place in Ukraine and India that despite power going out, the networks never went down.  At this point, I’ve decided to search for answers elsewhere and am not spending time to try to figure out where the data is as this was just a personal project.

TAKEWAYS

Data sources need to be vetted for accuracy and completeness before they are considered reliable or relevant.  When considering a source, check it for known events that should be reflected.  If you do not find these events and you wish to use the source, ensure you understand WHY the events are not being reflected.  Knowing the limitations of your OSINT sourced data, is critical and thoroughly understanding any filtering and calculations that occur before it is provided to you is just as vital to performing successful analysis.  This kind of verification and validation should also be repeated if a source is used in OSINT collection.  Because a data source is not appropriate for the kind of analysis that you wish to do for one project does not mean that it is invalid for other projects.  All OSINT sources have some merit, even if it is that they are an example of what you DO NOT want.