Data Intelligence in Patent Search and Analytics

3/2020 13.5.2020

No matter how far away you are from Artificial Intelligence (AI), the automation and digitalization trend cannot be ignored. The evolution of AI can be seen in a broader sense as a continuation of digitalization tendency in the global economy. Machine learning (ML) in general and deep neural networks in particular provide efficient ways to process complex input data into limited output information. The advantage of ML is in its adaptability to input data and its ability to generalize. Hence, tedious human tasks can be either fully automated or made much simpler and faster. Although the benefits of AI might sometimes be exaggerated, there is a range of pragmatic and practical applications for AI-based technologies in many domains, including intellectual property. In this article, my goal is to identify the benefits and drawbacks of external analytics solutions based on AI/ML algorithms used for searching and analyzing patent data. 

Why patents?

Legislation tends to adapt to technological advancements with skepticism. It can be stated that lawyers love to follow a simple rule of engineering: do not touch the mechanism if it functions. Yet, does this apply in the context of the search and analysis of patent data?

Patent data consists of unique and valuable information gathered across the world. The existing patent search collection involves millions of patents diffused across various databases integrated through up-to-date web services. Statistics prove the growing value of patents and their importance in leveraging patent data effectively. The tremendous amount of data already available and the enormous growth of new data produced daily by humans create a new infrastructure for search and analytics approaches. Here, AI-based instruments may reshape the techniques of how we work with data by enabling efficient data aggregation and exploitation.


A case study methodology lies at the heart of this study. Even though qualitative research methods refer to empirical methods originated from social science, nowadays, these methods have widely been used in many other areas, including legal research. The reasons for selecting a case study method based on semi-conducted interviews are the limited number of studies on the application of AI/ML-based algorithms in patent analytics as well as the possibility to address a broader real-life context of the problematic in question.

Although the topic of AI/ML-based search and analytics is recognized to be highly relevant for both scholars and practitioners, the phenomenon itself is somewhat new. It has not yet been well established, particularly from the legal perspective. A case study should help to get feedback from real experts who are practically involved in the development of analytical solutions and can provide unique examples from their personal experience.

Empirical findings

AI as a buzz word and the democratization of patent data

As one of the respondents put it: ‘everybody says they have AI, but nobody knows what it is’. Most companies shared the view that there is a lot of hype around AI technologies, and it has become a buzz word also in the context of patent search and analytics. Users often exaggerate and oversimplify the possibilities of AI tools and can be frustrated if the results fail to meet their initial expectations. In large part, it may happen because of the misleading perception of easy to use approaches advertised by the providers of the AI/ML solutions.

Unintended depreciation of the expertise needed to get a relevant interpretation of the search results can lead to a confusing impression that the problem lies in the technology. In contrast, it can derive from a faulty post-analysis of the results executed by humans. The user should be able to ‘break the code’ and to translate the output data into concrete answers to analyze patent information with a high level of accuracy. In practice, the democratization of the information embedded in patents does not automatically mean an increase in the understanding of this data. Technology can bring the users to a much more efficient starting point in analyzing data, but it cannot replace the human’s expertise. Since one patent application may contain a massive amount of fragmented information, the user should understand what is hidden in there. It is crucial to be able to infer if this bunch of data refers to a novelty obstacle, inventive step, or simply mentioned like a passing issue somewhere in the background of the application. A wrong interpretation of the search results may prove to be extremely costly in the future.

Generic character of the AI/ML-based search and analytics systems

Most of the respondents shared the opinion that two different tools may and most probably will provide the user with different results, even with the same set of input data. The currently available commercial search engines use different databases to ‘feed’ or train their systems. One of the respondents mentioned that on a macro level, most of the tools might look generic. Still, each has its unique features that make them different, so ‘placing them all under one general category won’t do justice to the developers of these tools.’ Certainly, there is some truth in that these systems are, to a certain extent, generic since it is too expensive to do otherwise. However, AI itself is not generic and is capable to generalize, but currently only within the limits of a given model and based on the training data provided to it.

Industry-specific factors 

Several respondents emphasized that the more specific the sector is, the easier it is to train a system to search efficiently and with a higher level of accuracy. For example, the respondents stated that the tool could provide much better results in a mechanical field than in the ICT or pharma sectors. Indeed, ICT is full of functional and abstract definitions. One term can encompass many steps that describe how a particular process is arranged and how it functions. If we take the descriptions of a table or a bicycle and, for instance, error protection, a radio resource management, or a network slicing, one can easily notice that these concepts are different in their nature and, therefore, hard to compare. One of the challenges is to define in specific terms the ideas that are, in fact, not accurate and concrete.

Another significant issue is that the ICT industry is one of the most innovative sectors ever existed and, thus, there is a massive amount of patent applications and an almost endless amount of patent data. In this respect, classification limitation that can be useful in the mechanical field, will not give the same promising results in ICT: the user will fail to go through all the related inventions, and the set of findings will still stay enormous and vague. One possible way to manage this problem would be a semantic search, when the user places a patent, patent passages or any other text into the search system to avoid a routine selection of the keywords and synonyms. However, one of the fundamental problems in a semantic search is the inability of the machine to answer ‘why’ questions. Without the conceptual seemingly understanding of the technology, semantic analysis can take us only up to a certain point. Currently, that’s not much.

Accuracy of the results. Reliability of AI/ML algorithms

Is it possible to get 100% accurate results with AI/ML algorithms in patent search and analytics? The answer is ‘no’. However, technology can bring the users to a much more efficient starting point in analyzing patent data. Moreover, the accuracy of the results cannot be judged in a vacuum. Final results depend on many factors: the level of the technological development (machine accuracy), the quality of input data (data accuracy), the correct formulation of inquiry (purpose accuracy), the industry (industry accuracy factor), and the expertise in the translation of the output information (data interpretation accuracy). The user shall make his decisions guided by understanding that the probability of 100% reliable output at each step is close to zero. Optimal choices are those where the user can define the minimum level of accuracy that is enough to validate his pick.

What is coming next?

There are plenty of reasons that can explain the so-called AI winters that took place in 1974 and 1987. Finding the root cause of them is a topic for another study. Still, one important observation is that the progress in any technology has a cyclic nature and has its ups and downs. AI research is not an exclusion, and some of the respondents confirmed that this type of technology is developing in cycles, which depend pretty much on the evolution of one of its layers (the bottom layer consists of the hardware and the upper layer of algorithms). An extensive spiral growth in the AI research is doubtful because up to this moment, the progress to a large extent appears on the bottom layer but not in the algorithms.

One fundamental task is to increase the contextual awareness of the AI/ML towards a holistic understanding of human behavior, particularly of a cause-and-effect relationships. In terms of patent analytics, lacking causality has become a visible problem when it comes to the quick accessibility of the patent data to ordinary users. Hence, a professional who is familiar with such concepts as a novelty and inventive step can interpret the results being made by AI/ML-enabled tool with a reasonably high level of certainty. The causality, in this case, is embedded in the collaboration between the machine and human expertise.

Yet there is a lot of space for improvements in line with solving this central problem. The respondents emphasized the increasing complexity of the tasks that AI/ML-based systems will be able to settle in the future. Several companies pointed out that their primary purpose is to upgrade the services they currently provide without differentiating the product range at this stage. Furthermore, the improvement of these tools can be reached faster only in close collaboration with the industry. Unless companies start adopting AI software in their IP practice, the service providers will struggle to step forward from the ‘trial and errors’ phase.

Interview data: 

This study was made based on my master thesis research conducted at Hanken School of Economics. The data samples include seven companies that were interviewed between June and September 2019. All the selected companies represent SMEs. They are engaged in the development of patent search and/or analytics software tools involving AI/ML algorithms. A fundamental ethical principle of voluntary participation in the research was respected. The anonymity was guaranteed: neither the name of the company nor the respondent’s name were published in the study.


  1. Patton MQ, Qualitative Research and Evaluation Methods (4th edn, Thousand Oaks, CA: Sage 2015)
  2. WIPO, ‘World Intellectual Property Indicators 2019.’ [2019] Geneva: World Intellectual Property Organization
  3. Moses LB, ‘How to Think about Law, Regulation and Technology: Problems with ‘Technology’ as a Regulatory Target’ (2013) 5 Law, Innovation and Technology 1
Aiheet: Muut, Patentit