Open Source Intelligence (OSINT) is the practice of collecting information from published or publicly available sources for intelligence purposes. . The term ‘Open Source’ within Open Source Intelligence refers to the public nature of the analyzed data; publicly available information includes blogs, forums, social media sites, traditional media (TV, radio, and publications), research papers, government records, and academic journals. The scope of this information is almost infinite, concerning various people, companies, and organizations. Individuals who leverage Open Source Intelligence can span from IT security professionals and state-sanctioned intelligence operatives with ethical intentions to malicious hackers with unethical intentions. Understanding The History of Open Source Intelligence The history of Open Source Intelligence dates back to the emergence of intelligence to support a government’s decisions and actions. However, it was not used in a systematic way until the United States established the Foreign Broadcast Monitoring Service (FBMS) in response to the Japanese attack on Pearl Harbor. In 1947, it was renamed the Foreign Broadcast Intelligence Service (FBIS) under the newly established CIA. In 2005, following the 9/11 attacks and the passage of the Intelligence Reform and Terrorism Prevention Act, FBIS - with other research elements - was transformed into the Director of National Intelligence's Open Source Center (OSC). Since its establishment, the OSINT effort has been responsible for filtering, transcribing, translating/interpreting, and archiving news items and information from many foreign media sources. What Role Does Open Source Intelligence Play in Different Industries? OSINT is essential for many fields, such as law enforcement, risk and fraud management, human resources, cybersecurity, and military operations. It can be used to identify data breaches, uncover vulnerabilities, back up decision-making processes, aid customer due diligence, or help users stayupdated. In business, OSINT can be used for penetration testing, breach detection, ethical hacking, and chatter monitoring. Using OSINT is also crucial when keeping tabs on vast amounts of information. Information technology users using OSINT often target three essential tasks: discovering public-facing assets, discovering relevant information outside the organization, and collecting and grouping discovered information into an actionable form. By finding public-facing assets using OSINT, IT professionals can find information that anyone can find on or about a company's assets without resorting to unethical means such as hijacking. Using OSINT to discover relevant information outside an organization helps IT professionals expand from exploring only tightly defined networks, thus increasing their scope of discovery. Using OSINT tools to help collect and group this discovered information helps shape this information into more valuable and actionable intelligence. Within fraud detection and prevention, OSINT can be used as manual review support for anti-fraud systems. For instance, if an anti-fraud system’s ruleset was insufficient to assess the case correctly, OSINT can be used as a backup assessment. OSINT can also search carder forums or the dark web to see what information is trending and what professionals should prepare for. What Techniques Are Used in Open Source Intelligence? OSINT reconnaissance involves using publicly available resources to gather information on a person or organization. OSINT reconnaissance techniques fall into three categories: passive, semi-passive, and active. Passive reconnaissance often involves searching the web using applications such as search engines. This reconnaissance method is hard to detect since no direct engagement is involved, and only archived information is collected. Semi-passive reconnaissance usually consists of searching the web to find data, but can also utilize software solutions to non-intrusively gather information. Active reconnaissance is when data iscollected directly from the target, offering more accurate and timely information. This type of probing can be detectable. The best reconnaissance technique is dependent on the organizational needs of a team. However, following a general process helps lay the foundations for effective intelligence gathering. The Open Web Application Security Project (OWASP) outlines this 5-step OSINT process. This process begins with source identification, where we can find the information for the specific intelligence requirement. Next comes harvesting, collecting relevant information from the identified source. Data processing deals with processing the identified source’s data and extracting meaningful insights. The analysis step combines the processed data from multiple sources. Reporting is the last step, creating a final report on the findings. Using OSINT investigative skills , such as identifying visual clues in photos (e.g., terrain, architecture, shadows, street signs) and leveraging tools like Google Earth or reverse image search, investigators can geolocate images effectively to uncover critical insights, enhancing their OSINT investigative expertise. What Types of Open Source Intelligence Tools Exist? OSINT tools can be divided into three main categories. Discovery tools are used to search for any information that might be found on the web. Good discovery tools can be as simple as search engines. Scraping tools ensure only the required information is filtered through for extraction to a database. Scraping tools are helpful in hiding the presence of bulky data transfers and preventing irrelevant information from mixing with relevant information. Aggregation tools help combine related information from scraping tools to display a clearer picture of what the data represents, all in a presentable format. These can be instances of relations and connections between datasets. There are many free and paid open source intelligence tools available for a variety of purposes, such as searching metadata andcode, researching phone numbers, investigating identities, verifying email addresses, analyzing images, detecting wireless networks, and analyzing packets. However, some of these tools are limited by a paywall. Here is a list of the latest open-source intelligence tools that are free and can be used to their full potential: Nmap Scraping Tool Nmap (Network Mapper) is a free, open-source tool for vulnerability checking , port scanning, and network mapping. It allows you to scan your network and discover everything connected to it, and a wide variety of information about what’s connected and other valuable information. At its heart lies port scanning, which is helpful for administrators. Nmap utilizes a large number of scanning techniques, such as UDP, TCP connect (), TCP SYN (half-open), and FTP. It also offers various scan types such as Proxy (bounce attack), Reverse-ident, ICMP (ping sweep), FIN, ACK sweep, Xmas, SYN sweep, IP Protocol, and Null scan. Nmap can also do limited deployments of network port scans or scheduled network port scans, which is helpful since massive port scans would likely trigger security alerts by the target. Users can control the depth of each scan with light or limited scans for information regarding the port status or more detailed scans for relaying information about the operating systems using these ports. Nmap can do operating system detection via TCP/IP fingerprinting, stealth scanning, dynamic delay and retransmission calculations, parallel scanning, detection of down hosts via parallel pings, decoy scanning, port filtering detection, direct (non-portmapper) RPC scanning, fragmentation scanning, and flexible target and port specification. These qualities make Nmap very versatile. Previously, controlling these scans used to require training in console commands. However, with the new Zenmap graphical interface , experienced admins can more easily use commands to help them identify a target. This makes Nmap a helpful tool for experts and professionals involved inpenetration testing. However, the tool is still very technical and not recommended for novice users. Use Scenario: A user wants to use Nmap to identify a host’s operating system. They want to identify the host’s operating system because they are performing an inventory sweep of their network and want to identify any older assets. The user uses the- A switch to determine the OS for a remote system. For example, running: $ nmap -A localhost. yields an output that says the host is running Linux 3.7 - 3.9. Using Nmap, the user could identify that the host was running a deprecated operating system. Wireshark Scraping Tool A packet analyzer tool, Wireshark, effectively lets users put their network traffic under a microscope, allowing them to zoom in on the root cause of a particular problem. Wireshark captures network traffic on local networks such as Ethernet, Bluetooth, Wireless (IEEE.802.11), Token Ring, etc (packet capture). It then breaks the packets of these local networks down (filtering) before storing the data from these packets for purposes such as offline analysis (visualization). Wireshark has many uses within the industry, such as network analysis and network security. For instance, network administrators may use Wireshark to troubleshoot network problems, while network security engineers may use Wireshark to examine security problems. Quality assurance engineers may use Wireshark to verify network applications, while developers may use it to debug protocol implementations. Beyond these uses in the industry, Wireshark can also be used as a learning tool. Those new to information security can use Wireshark to understand network traffic analysis, how communication occurs when particular protocols are involved, and where it goes wrong when certain issues present themselves. Wireshark can also help novice users learn more about network protocol internals, such as those concerning TCP/IP. However, to properly use Wireshark, a user should first learn exactly how a network operates,such as understanding the three-way TCP handshake and various protocols, including TCP, UDP, DHCP, and ICMP. Use Scenario: A user has an issue with their home network; their internet connection is very slow. Using Wireshark, the user drills down into a packet to identify a network problem. They discovered quickly that their router thought a common destination (Youtube) was unreachable using the Wireshark interface. The issue was easy to find since Wireshark’s interface marks any packet in black to reflect an issue. Once realizing this, the user restarts the cable modem to fix the problem. GHunt Discovery Tool This OSINT tool allows users to analyze a target’s Google history based on factors such as a Gmail address. From a Gmail address, GH unt can extract the target’s name, Google ID, Youtube account, and active Google services. GHunt can also discover a target’s phone model and make, firmware and installed software, public photos, and even the target’s physical location with the right data. Within the industry, white hat hackers and penetration testers may use Ghunt to test whether the emails they find are reasonable and whether they can leak other information. However, they can also be used for threat hunting to identify and track threats. This tool can also be used to understand the extent of a user’s or business’s internet footprint. These qualities make GHunt a great threat intelligence collection and attack simulation tool. Use Scenario: A user’s friend has been receiving strange messages from a “secret admirer” through their email. These messages contain statements that make them feel uncomfortable. The user decides to find the identity of this “secret admirer,” but cannot find their name from the Gmail address alone. The user chooses to use GHunt to investigate their Gmail account. By typing: $ python3 hunt.py
This article is the second in a series that is designed to help readers to assess the risk that their Internet-connected systems are exposed to. In the first installment, we established the reasons for doing a technical risk assessment. In this installment, we'll start discussing the methodology that we follow in performing this kind of assessment.. This article is the second in a series that is designed to help readers to assess the risk that their Internet-connected systems are exposed to. In the first installment, we established the reasons for doing a technical risk assessment. In this installment, we'll start discussing the methodology that we follow in performing this kind of assessment. Why all the fuss about a Methodology? If you ever read anything SensePost publishes on assessments, or if you attend our training, you'll notice that we tend to go on a bit about methodology. The methodology is the set of steps we follow when performing a certain task. We try and work according to methodologies with just about everything we do. We're not fanatical about it, and the methodologies change and adapt to new and different environments, but they always play a big role in the way we approach our work. Why such a big fuss then? There are a few good reasons for performing assessments according to a strict methodology: Firstly, it gives us a game plan . Rather then stare blankly at a computer screen or a network diagram, an analyst now has a fixed place to start and a clear task to perform. This takes the whole "guru" element out what we do. The exact steps that we will follow are clear, both to us, and to the customer, from the outset. Secondly, a methodology ensures that our work is consistent and complete . I've worked on projects where the target organization has in excess of 150 registered DNS domains. Can you imagine how many IP addresses that eventually translates to. I don't have to imagine - I know it was almost 2000. Consider how hard it must be to keeptrack of every DNS domain, every network and every IP to ensure that you don't miss something. Consider also what happens when the actual "hacking" starts (we'll get to this later) and the analyst's heart is racing. A strict methodology ensures that that we always cover all the bases and that our work is always of the same quality. This holds true, no matter how big all small the environment is that you're assessing. Finally, our methodology gives our customers something to measure us against . Remember, to date there are really no norms or standards for technical assessment work. How does the customer know that she's getting what she paid for? This is an especially pertinent question when the assessment findings are (how can I put this?) dull. By working strictly according to a sensible methodology with clear deliverables at each stage we can verify the quality of the assessment even when there's very little to report. A Methodology that Works I'm completely sure that, when it comes to security assessment, there's more then one way to skin the cat. What follows is a description of a methodology that we like to use when performing security assessments over the Internet. It's certainly not the only way to approach this task, but it's one way that works, I believe. 1. Intelligence Gathering The first thing we do when we begin an assessment is to try and figure out who the target actually is. Primarily we use the Web for this. Starting with the customer's own Web site(s), we mine for information about the customer that might be helpful to an attacker. Miscellaneous tidbits of useful data aside, our primary objective is to derive the DNS domain names that the target uses. If you're assessing your own infrastructure, you may already have this information but if the organization is big, it can be a fascinating exercise. Later, these domain names will be mapped to the IP addresses we will actually analyze. Some companies have a small Internet presence, anddiscovering the DNS names they use may be simple. Other companies we've worked with have hundreds of domains, and discovering all of them is no mean feat. How do we get the DNS domain names? Well, usually we have an e-mail address, the company's name or some other logical place to begin. From there we have a number of techniques: We use search engines to search all instances of the company's name. This not only provides links to the company's own site (from which DNS domain information can be easily derived), we also obtain information about mergers and acquisitions, partnerships and company structure that may be useful. We use a tool like httrack to dump all the relevant Web sites to disk. We then scan those files to extract all mail and HTTP links, which are then parsed again to extract more DNS domains. Then, we use the various domain registries. Tools like geektools.com, register.com and the like are simple and can often be used in one of two ways: To help verify whether the domains we have identified actually belong to the organization we are assessing. To extract any additional information that may be recorded in a specific domain's record. For example, you'll often find that the technical contact for a given domain has provided an e-mail address at a different domain. The second domain then automatically falls under the spotlight as a potential part of the assessment. Many of the registries provide for wildcard searches. This allows us to search for all domains containing a certain string, like "*abc*". I would use such a search to identify all the domains that may be associated with the company ABC Apples Inc , for example. Then, we need to apply some human intelligence - using the information we read on Web sites, company reports and news items we attempt to make logical jumps to other domains that may be relevant to our analysis. The output of this phase is a comprehensive list of DNS domains that are relevant to the targetcompany. You'll notice that the next phase of the assessment may provide additional domain names that weren't found during this phase. In that case, those domains are used as inputs during this phase and the entire process is repeated. Phases 1 and 2 may recur a number of times before we've located all the relevant domains. Typically, we'll check this list with the customer once we're done to ensure that we haven't missed anything or included something inappropriate. 2. Foot Printing At the start of phase two we have a list DNS domains - things like apples.com, apples-inc.com, applesonline.com, apples.co.uk, etc. The reasons these domains exist is to provide Internet users with a simple way of reaching and using the resources they require. For example, instead of typing , a user simply needs to remember https://sensepost.com/. Within a domain, therefore, there are a number of records - specific mappings between machine names and their actual Internet Protocol (IP) numbers. The objective of this phase is to identify as many of those IP/name mappings as we possibly can in order to understand which address spaces on the Internet are actually being used by the target organization. There are a few different techniques for identifying these mappings. Without going into too much detail, these techniques are all derived from the same assumptions, namely: Some IP/name mapping must exist for a domain to be functional. These include the name server records (NS) and the mail exchanger records (MX). If a company is actually using a domain then you will be able to request these two special entries. Immediately you have one or more actual IP addresses to work with. Some IP/name mappings are very likely to exist on an active domain. For example, "www" is a machine that exists in just about every domain. Names like "mail", "firewall" and "gateway" are also likely candidates. We have a long list of common names that we test. This is by no means a watertight approach butone is more often lucky then not. An organization's machines usually live close together. This means that if we've found one IP address, we have a good idea of where to look for the rest of the addresses. The Name -> IP mapping (the forward lookup), and the IP -> Name mapping (the reverse lookup) need not necessarily be the same. The technology is fundamentally verbose. DNS, as a technology, was designed for dissemination of what is essentially considered "public" information. With one or two simple tricks we can usually extract all the information there is to be had. The DNS zone transfer - a feature of DNS literarily designed for the bulk transfer of DNS records - is a fine example of this. Other, craftier, techniques fall beyond the scope of this paper. Once we have all the relevant DNS names we can find, we attempt to identify the distinct network "blocks" in which the target organization operates. As stated previously, IPs tend to be grouped together. The nature of IP networking is to group addresses together in what are known as subnets. The expected output of this phase is a list of all the IP subnets in which the target organization has machines active. At this stage, our broad reasoning is that if we find even a single IP in a given subnet we include that entire subnet in the list. The technically astute among you will already be crying "False assumption! False assumption!" and you'd be right. But bear with me. At this stage we tend rather to over-estimate then to under-estimate. Later, we will do our best to prune the list to a more accurate depiction of what's actually there. 3. Vitality We ended the last phase with a list of IP subnets in which we believe the target organization to have a presence and a horde of technocrats objecting loudly to our assumptions about the subnet size. Let's quickly make a list of the some of the facts we need to know before we can move on with the process: An organization does not need to own the entire subnetin which it operates. IP addresses can be lent, leased or shared. Nor do all an organization's IPs have to be grouped together, they can be as widely spread across the Internet as they wish. Just because a Name / IP mapping exists for a machine, doesn't mean that machine actually exists. Conversely, just because a Name / IP mapping doesn't exist for a machine, doesn't mean the machine doesn't exist. There are thousands of nameless addresses on the Internet. Yes, it's sad, but true nevertheless. Without a route to describe how an IP address can be reached, that address can never be used on the Internet So we see that, although DNS gives us a logical starting point for our search, it by no means provides a comprehensive list of potential targets. This is why we work with the rather loose subnet definitions we derived in the previous phase. The objective of the "Vitality" phase of the assessment is to determine, within the subnet blocks that we have, which IP addresses are actually active and being used on the Internet. We now leave the wonderful world of DNS behind us, and begin to concentrate solely on the IP address space. So how does one determine if an address is active on the Internet or not? Well, let's recall the third "fact" from our list above. If there's no route to a given IP subnet, that subnet is as good as dead. Various core routers on the Internet graciously allow technicians and administrators to query them regarding routes to any given address. At the time of writing, one such router is route-views.oregon-ix.net . Such a router can't tell us that an IP address is alive. If there's no route for a subnet on the core routers, however, then we can conclude that all the IPs in that subnet are dead. The next, and probably most the obvious technique is the famous IP "ping". Pinging works just like sonar. You send a ping to a specific address and the machine responds with a "pong" indicating that it is alive and received your request. Ping is astandard component of the Internet Protocol (IP), and machines that talk IP are compelled to respond when they receive a ping request. With simple and freely available tools we are able to ping an entire subnet. This is know as a "ping scan". Without going into too much detail, the response of such a ping scan can be interpreted as follows: A reply from an IP address indicates that the address is probably in use and accessible from the Internet. Multiple replies from a single IP address indicate that the address is probably actually a subnet address or a broadcast address and suggest a subnet border. No reply can only be interpreted to mean that the machine is not replying to IP ping requests. I realize that the latter point is a bit vague, but that really is the only conclusion that can be drawn from the information available. I said that all machines that speak IP are obliged to respond to ping requests. Why not simply conclude that if the IP doesn't respond, it isn't being used? The confusion is introduced by modern network security products like firewalls and screening routers. In the real world, one often sees networks configured in such a way that the IP ping packet is blocked by the firewall before the packet reaches the machine. Thus the machine would respond if it could, but it's prevented from doing so. So we haul out the heavy artillery. Just about every machine on the Internet works with a series of little Internet "post boxes" called ports. Ports are used to receive incoming traffic for a specific service or application. Each port on a machine has a number and there are 65536 possible port numbers. A modern machine that is connected to the Internet and actually functioning is almost certain to be using at least one port. Thus, if an IP address does not respond to our ping request, we can further probe it using a tool called a "port scanner". Port scanners are freely available software utilities that attempt to establish a connection to every possibleport within a specified range. If we can find just one port that responds when probed, we know that the IP is alive. Unfortunately, the amount of work required to probe all 65,000 plus ports is often prohibitive. Such an exercise can takes hours per single IP address and days or even weeks for an entire range. So we're forced to make yet another assumption: If an IP address is active on the Internet, then it's probably there for a reason. And there are only so many reasons to connect a machine to the Net: The machine is a Web server (and thus uses port 80 or 443) The machine is a mail server (and thus uses port 25) The machine is a DNS server (and thus uses port 53) The machine is another common server - FTP, database, time, news, etc) The machine is a client. In this case it is probably a Microsoft machine and uses port 139. Thus, we can now modify our scan to search for only a small number of commonly used ports. This approach is called a "host scan". It is by no means perfect, but it generally delivers accurate results and is efficient enough to be used with large organizations. The common ports we scan for can be adjusted to better suite the nature of the organization being analyzed, if required. The nmap network utility (available from https://insecure.org/ ) is a powerful tool that serves equally well as ping scanner and a port scanner. Thus, by the end of this phase we have fine-tuned our list of IP subnets and generated a list of actual IP addresses that we know to be "alive" and that therefore qualify as targets for the remainder of the process. At this point, our findings are usually presented to the customer to ensure that we're still on the right track. Conclusion That concludes the first part of our discussion of Internet assessment methodology. In the next installment in this five-part series on Internet Risk Assessments, we will continue to discuss methodology, including: visibility, vulnerability scanning, and analyses ofWeb applications. . Uncover effective strategies for online risk evaluation aimed at assessing safe networks and enhancing your protection protocols.. Internet Security Assessment,Risk Evaluation Methods,Methodology for Security. . Brittany Day
Get the latest Linux and open source security news straight to your inbox.