Open Source OSINT Tools and Techniques
Open Source Intelligence (OSINT) is the practice of collecting information from published or publicly available sources for intelligence purposes.
The term ‘Open Source’ within OSINT refers to the public nature of the analyzed data; publicly available information includes blogs, forums, social media sites, traditional media (TV, radio, and publications), research papers, government records, and academic journals. The scope of this information is almost infinite, concerning various people, companies, and organizations. Individuals who leverage OSINT can span from IT security professionals and state-sanctioned intelligence operatives with ethical intentions to malicious hackers with unethical intentions.
The History of OSINT
The history of OSINT dates back to the emergence of intelligence to support a government’s decisions and actions. However, it was not used in a systematic way until the United States established the Foreign Broadcast Monitoring Service (FBMS) in response to the Japanese attack on Pearl Harbor. In 1947 it was renamed the Foreign Broadcast Intelligence Service (FBIS) under the newly established CIA. In 2005, following the 9/11 attacks and the passage of the Intelligence Reform and Terrorism Prevention Act, FBIS - with other research elements - was transformed into the Director of National Intelligence's Open Source Center (OSC). Since its establishment, the OSINT effort has been responsible for filtering, transcribing, translating/interpreting, and archiving news items and information from many foreign media sources.
Importance in Industry
OSINT is essential for many fields, such as law enforcement, risk and fraud management, human resources, cybersecurity, and military operations. It can be used to identify data breaches, uncover vulnerabilities, back up decision-making processes, aid customer due diligence, or help users stay updated. In business, OSINT can be used for penetration testing, breach detection, ethical hacking, and chatter monitoring. Using OSINT is also crucial when keeping tabs on vast amounts of information. Information technology users using OSINT often target three essential tasks; discovering public-facing assets, discovering relevant information outside the organization, and collecting and grouping discovered information into actionable form. By finding public-facing assets using OSINT, IT professionals can find information anyone can find on or about a company's assets without resorting to unethical means such as hijacking. Using OSINT to discover relevant information outside an organization helps IT professionals expand from exploring only tightly defined networks, thus increasing their scope of discovery. Using OSINT tools to help collect and group this discovered information helps shape this information into more valuable and actionable intelligence. Within fraud detection and prevention, OSINT can be used as manual review support for anti-fraud systems. For instance, if an anti-fraud system’s ruleset was insufficient to assess the case correctly, OSINT can be used as a backup assessment. OSINT can also search carder forums or the dark web to see what information is trending and what professionals should prepare for.
OSINT reconnaissance involves using publicly available resources to gather information on a person or organization. OSINT reconnaissance techniques fall into three categories; passive, semi-passive, and active. Passive reconnaissance involves often searching the web using applications such as search engines. This reconnaissance method is hard to detect since no direct engagement is involved, and only archived information is collected. Semi-passive reconnaissance usually consists in searching the web to find data but can also utilize software solutions to non-intrusively gather information. Active reconnaissance is when data is collected directly from the target, offering more accurate and timely information. This type of probing can be detectable.
The best reconnaissance technique is dependent on the organizational needs of a team. However, following a general process helps lay the foundations for effective intelligence gathering. The Open Web Application Security Project (OWASP) outlines this 5-step OSINT process. This process begins with source identification; where can we find the information for the specific intelligence requirement? Next comes harvesting, collecting relevant information from the identified source. Data processing deals with processing the identified source’s data and extracting meaningful insights. The analysis step combines the processed data from multiple sources. Reporting is the last step, creating a final report on the findings.
OSINT tools can be divided into three main categories. Discovery tools are used to search for any information that might be found on the web. Good discovery tools can be as simple as search engines. Scraping tools ensure only the required information is filtered through for extraction to a database. Scraping tools are helpful in hiding the presence of bulky data transfers and preventing irrelevant information from mixing with relevant information. Aggregation tools help combine related information from scraping tools to display a clearer picture of what the data represents, all in a presentable format. These can be instances of relations and connections between datasets.
There are many free and paid open source intelligence tools available for a variety of purposes such as searching metadata and code, researching phone numbers, investigating identities, verifying email addresses, analyzing images, detecting wireless networks, and analyzing packets. However, some of these tools are limited by a paywall. Here is a list of the latest open-source intelligence tools that are free and can be used to their full potential:
Nmap (Network Mapper) is a free, open-source tool for vulnerability checking, port scanning, and network mapping. It allows you to scan your network and discover everything connected to it and a wide variety of information about what’s connected and other valuable information. At its heart lies port scanning, which is helpful for administrators. Nmap utilizes a large number of scanning techniques, such as UDP, TCP connect (), TCP SYN (half-open), and FTP. It also offers various scan types such as Proxy (bounce attack), Reverse-ident, ICMP (ping sweep), FIN, ACK sweep, Xmas, SYN sweep, IP Protocol, and Null scan. Nmap can also do limited deployments of network port scans or scheduled network port scans, which is helpful since massive port scans would likely trigger security alerts by the target. Users can control the depth of each scan with light or limited scans for information regarding the port status or more detailed scans for relaying information about the operating systems using these ports. Nmap can do operating system detection via TCP/IP fingerprinting, stealth scanning, dynamic delay and retransmission calculations, parallel scanning, detection of down hosts via parallel pings, decoy scanning, port filtering detection, direct (non-portmapper) RPC scanning, fragmentation scanning, and flexible target and port specification. These qualities make Nmap very versatile. Previously, controlling these scans used to require training in console commands. However, with the new Zenmap graphical interface, experienced admins can more easily use commands to help them identify a target. This makes Nmap a helpful tool for experts and professionals involved in penetration testing. However, the tool is still very technical and not recommended for novice users.
Use Scenario: A user wants to use Nmap to identify a host’s operating system. They want to identify the host’s operating system because they are performing an inventory sweep of their network and want to identify any older assets. The user uses the- A switch to determine the OS for a remote system. For example, running:
$ nmap -A localhost.
yields an output that says the host is running Linux 3.7 - 3.9. Using Nmap, the user could identify that the host was running a deprecated operating system.
A packet analyzer tool, Wireshark effectively lets users put their network traffic under a microscope, allowing them to zoom in on the root cause of a particular problem. Wireshark captures network traffic on local networks such as Ethernet, Bluetooth, Wireless (IEEE.802.11), Token Ring, etc (packet capture). It then breaks the packets of these local networks down (filtering) before storing the data from these packets for purposes such as offline analysis (visualization). Wireshark has many uses within the industry, such as network analysis and network security. For instance, network administrators may use Wireshark to troubleshoot network problems while network security engineers may use Wireshark to examine security problems. Quality assurance engineers may use Wireshark to verify network applications while developers may use it to debug protocol implementations. Beyond these uses in the industry, Wireshark can also be used as a learning tool. Those new to information security can use Wireshark to understand network traffic analysis, how communication occurs when particular protocols are involved, and where it goes wrong when certain issues present themselves. Wireshark can also help novice users learn more about network protocol internals, such as those concerning TCP/IP. However, to properly use Wireshark, a user should first learn exactly how a network operates, such as understanding the three-way TCP handshake and various protocols including TCP, UDP, DHCP, and ICMP.
Use Scenario: A user has an issue with their home network; their internet connection is very slow. Using Wireshark, the user drills down into a packet to identify a network problem. They discovered quickly that their router thought a common destination (Youtube) was unreachable using the Wireshark interface. The issue was easy to find since Wireshark’s interface marks any packet in black to reflect some issue. Once realizing this, the user restarts the cable modem to fix the problem.
This OSINT tool allows users to analyze a target’s Google history based on factors such as a Gmail address. From a Gmail address, GHunt can extract the target’s name, Google ID, Youtube account, and active Google services. GHunt can also discover a target’s phone model and make, firmware and installed software, public photos, and even the target’s physical location with the right data. Within the industry, white hat hackers and penetration testers may use Ghunt to test whether the emails they find are reasonable and whether they can leak other information. However, they can also be used for threat hunting to identify and track threats. This tool can also be used to understand the extent of a user’s or business’s internet footprint. These qualities make GHunt a great threat intelligence collection and attack simulation tool.
Use Scenario: A user’s friend has been receiving strange messages from a “secret admirer” through their email. These messages contain statements that make them feel uncomfortable. The user decides to find the identity of this “secret admirer” but cannot find their name from the Gmail address alone. The user chooses to use GHunt to investigate their Gmail account. By typing:
Within the GHunt folder and pressing enter, the user finds the name of their friends’ “secret admirer” and, using their name, also finds out that the “secret admirer” goes to their university. The user gives this information to university authorities.
Google Dorks is a data querying method that involves using advanced search arguments in a Google Search to reveal tough-to-find but public information. Its roots go back to 2002 when a man named Johnny Long started using custom queries to search for elements of certain websites that he could leverage in an attack. Since then, the role of Google Dorks has remained relatively the same. It remains a way to use the search engine to find websites with certain flaws, vulnerabilities, and sensitive information that hackers can take advantage of. However, cybersecurity professionals can also use it to protect businesses and users from attacks. Google Dorks users can prevent hackers from exploiting their targets by finding vulnerable information before hackers can leverage it for nefarious reasons. One of the most popular Google Dorks sites is Google Hacking Database on Exploit Database. The site enables users to dive deep into a server to find data on a target using an extensive list of arguments that can address queries for almost any type of data, such as usernames and passwords. This reason is why using Google Dorks is a must for penetration testers.
Modern software development is about collaboration and leveraging the power of the open source. Greg.app makes this easy, allowing users to search code from half a million public repositories on GitHub. What's cool about Greg.app is that, in addition to a repository filter and language filter, it includes a path filter that can check for similar code within particular folders. This can be useful for finding key details about code similarities and differences between various languages. If a user is interested in finding any code, regardless of punctuation, Greg.app is a great OSINT tool to use.
Intel Owl is an OSINT solution for finding threat intelligence data about a specific file, IP, or domain from a single API request. A scalable API, Intel Owl can gather threat intelligence data about a particular file or observable (IP, domain, URL, hash) by querying many different analyzers and services that are externally or internally available. Built to scale up and speed up the retrieval of cyber threat information, Intel Owl can easily be integrated into a user’s stack of security tools to automate common jobs usually performed manually by security operations center analysts. This autonomy makes Intel Owl an effective tool for any user that needs a single point to query for information about a specific file or domain, IP, URL, hash, etc. Some of Intel Owl’s main features are its built-in web interface and more than 80 available analyzers that can be used to generate or retrieve data about a suspicious file or observable.
A python tool created to identify risky domains before they attack, 0365 Squatting can create a list of typo squatted domains based on the domain provided by the user. The software can then check all the domains against 0365 infrastructure, singling out risky domains. This makes 0365 Squatting an ideal tool for users searching for potential phishing domains before these websites attack.
Use Scenario: A user has received a strange email from what seems to have been sent from a Microsoft domain. Afraid to block this domain, the user wants to check whether or not this domain is real. Using 0365 Squatting, the user types in a Python terminal:
python 0365squatting.py -o micros0ft.com
The user receives an output of:
Checking domain micros0ft.com
Micros0ft.sharepoint.com is down / not available
By using 0365 Squatting, the user finds out that the domain is fake and they should block this domain.
If a user is looking for the best OSINT tools but is unsure of the tools they should choose for their target, the OSINT Framework is a very useful resource. As its name implies, the OSINT Framework is a cybersecurity framework with a vast collection of OSINT tools within and outside Linux that can help find information that spans from telephone numbers to IP addresses and email addresses. Though mostly used by security researchers and penetration testers for digital footprinting, OSINT research, intelligence gathering, and reconnaissance, there are also uses for analyzing malicious files and exploring the Dark Web. When exploring the OSINT Framework, users are provided an easy-to-use, interactive tree graph user interface to help them find the best free tools and resources for their work objectives.
Use Scenario: A user wants to do research on worldwide mobile coverage but does not know where to look. Since they want to use the most effective free tools and resources available, they look through the OSINT Framework. First, the user clicks on Geolocation Tools / Maps. From there they receive a massive list of map-related tools. Specifically, there is a parent node titled ‘Mobile Coverage’ that they find intriguing - as it pertains to their research topic. Clicking on the ‘Mobile Coverage’ parent node, the user discovers the resources they need for their topic.
An automated reconnaissance framework, reNgine does end-to-end reconnaissance with the help of configurable scan engines. The beauty of reNgine is that users can use these configurable scan engines against multiple targets. Users can configure them to scan results, find endpoints, and quickly filter endpoints based on extension, HTTP status, page title, etc. These qualities make this tool great for penetration testing of web applications and organizations looking for asset discovery and continuous monitoring. If a user has a website that receives a large amount of web traffic, they might want to use reNgine to help protect and maintain their site.
Use Scenario: A user wants to do reconnaissance on a domain that continuously receives a lot of web traffic to check whether there are any vulnerabilities periodically. Using reNgine, the user can complete a full scan on that specific domain. A full scan includes subdomain discovery, port scan, directory and files search, fetching of endpoints (URLs), and vulnerability scan. Looking at the vulnerability scan, the user finds that no vulnerabilities were discovered for the domain. To be safe, the user sets a timer for reNgine to periodically scan the domain to ensure vulnerabilities don’t go unnoticed.
Recon-ng is a reconnaissance framework designed to provide an environment to quickly and thoroughly conduct open-source web-based reconnaissance. Written in Python, it has many modules, features for database interaction, built-in convenience functions, interactive help, and command completion. Its primary purpose is to work and act as a web application/website scanner. Recon-ng can also be used to find the IP Addresses of a target, look for error-based SQL injections, find sensitive files such as robots.txt, and more using built-in features such as WHOIS lookup. For users looking for a reliable information-gathering tool, Recon-ng is an excellent choice.
Sublist3r is a python tool designed to list subdomains of websites using search engines such as Google, Yahoo, Bing, Baidu, and Ask. It can help collect and gather subdomains of a target domain, making it useful for penetration testers. and bug hunters. If a user is interested in finding the subdomains of their target domain, they should use Sublist3r.
ZMap is a modular, open-source network scanner architected to perform Internet-wide scans. Capable of surveying the entire IPv4 space in under 45 minutes from user space on a single machine, the tool is often used to discover vulnerabilities within a network, the impact of these vulnerabilities, and detect affected IoT devices such as connected appliances. On a single port on one gigabit per second of network bandwidth, Zmap can scan the entire IPv4 address space in 44 minutes. However, with a ten-gigabit connection, the total time is reduced to just 5 minutes. This speed in scanning makes Zmap an effective tool for network scanning. If users want to monitor their network for vulnerability, Zmap is a highly recommended tool.
Is OSINT Legal?
The legality of OSINT is dependent on how it is used. The U.S. Code defines the legal use of open source intelligence as “... intelligence that is produced from publicly available information and is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement.” When OSINT is used for purposes such as “doxing” (unveiling publicly available information of anonymous internet users) to someone, it can be illegal. There can also be legal issues with managing vulnerable information if managed improperly. If, for example, an organization accidentally leaked an employee’s credentials on a public storage bucket, it is up to an OSINT analyst to alert the organization accordingly to ensure fast remediation. Without remediation, consequences will ensue.
What is Operations Security (OPSEC), and How is it Related to OSINT?
Operations Security (OPSEC) is a process that identifies non-illicit means that a potential attacker can use to reveal critical or sensitive information and data. OPSEC uses countermeasures to reduce or eliminate an attacker’s exploitation of such information to prevent this action by a potential attacker. Just like OSINT, OPSEC can trace its origins to U.S defense and military interests. The term OPSEC was created during the Vietnam War campaign by the U.S military when unclassified information was inadvertently found to be shared with the North Vietnamese and their allies.
The relationship between OSINT and OPSEC lies in how one balances the other. OSINT is the practice of collecting information from published or publicly available sources for intelligence purposes. OPSEC concerns the protection of individual pieces of data that can be aggregated to form a bigger, potentially critical/sensitive picture. Without OPSEC, there is a chance for OSINT tools and techniques to be used by potential attackers for illicit reasons. Therefore, to protect the legality of using OSINT tools and techniques, OPSEC is a necessary enforcer.
Open Source Intelligence (OSINT) is a formidable tool for finding valuable information. The information found using OSINT in the past has not only helped industries but also saved lives in military sectors and law enforcement. As the internet keeps becoming a larger part of daily human life, the need for OSINT and cybersecurity will continue to grow. We hope that the tools mentioned above will help you start utilizing OSINT in your daily life and that you share these tools with others.