For some time now, artificial intelligence that allows an image to be generated from a text input, has been more or less freely available. Well-known examples are OpenAI's DALL-E and Google's Imagen. Not too long ago, Stability.ai's DreamStudio.ai was released, which, unlike the other AIs, is completely open source.

For fields in which analytics, prediction, machine learning, and decision making are paramount AI works best when categories can be strictly defined and there is an established ground-truth – a definitive answer from which it can be modeled. Essentially AI needs to be taught by examining a problem back to front, from there it figures out which attributes are deterministic or descriptive and applies these learnings to new data sets. So, in a sense, AI is only as good as the education it receives.

The application of AI in patent offices

This template is seen in the application of AI to patent office examinations, patent classifications, reporting, and other critical workflows. Within examinations, it is used to greatly accelerate and increase the accuracy of a necessary step in the patent approval process – prior art searches.

Prior art is any published evidence that an invention is already known, which can take numerous forms, from just a description of an idea or formula to a centuries-old piece of technology or an existing product. When a new patent application is filed, patent office searchers and examiners spend much of their time performing searches of documents and other assets around the work and evaluating the results to determine if the target application encroaches on existing prior art.

The results of these searches determine whether an invention meets the patent protection criteria for novelty and obviousness. The former is the notion that an invention must be new or novel and therefore not known in the public domain prior to the application filing date, while the latter is the notion that an invention must be non-obvious and not a logical extension of a pre-existing invention that any skilled member of that field could feasibly surmise. Across the millions of patents, a single instance of prior art can be used to reject a patent or to send it back to the applicant for revision.

The process of searching for prior art is complicated, iterative, and time-consuming. For each search, examiners must devise a search strategy, select which databases to search, create the search parameters, perform the search, evaluate the results, and then if needed, modify and rerun the search.

According to an analysis of search activity conducted by the European Patent Office, a comprehensive patent application search draws on around 1.3 billion technical records in 179 databases, leading to about 600 million documents appearing in search results monthly. Another study by the Japan Patent Office estimated that its staff spent around 40% of their time conducting and reviewing prior art searches through traditional and rather labor-intensive tools.

The rapid growth in patent applications and the complexity of inventions coupled with the staggering volume of materials to search means that patent offices are always considering new ways to accelerate the application process to avoid long pendencies and in a few cases, backlogs. Indeed, according to WIPO (World Intellectual Property Office) in 2019, there were 5.7 million patent applications pending worldwide. To keep up with this flood of applications, patent offices hire more examiners and adopt technologies to improve productivity.

The integration of AI solutions in Brazil’s patent office

In 2020, one such office experiencing a sizeable patent backlog was INPI Brazil. With around 150,000 applications pending and an average wait time of more than 10 years, their backlog was significantly impacting innovation in Latin America’s largest economy and thereby limiting investments.

A sizeable chunk of their backlog, around 15%, consisted of chemistry patents. Chemistry patents require searches of both text and chemical structures within patent and non-patent publications and include full text and structure queries, which make finding similarities and relevance between the application patent and existing art a far more demanding review process than other patent applications.

INPI partnered with CAS, who offered an AI solution that could analyze the complexities of chemistry prior art to solve this problem, streamline their workflow processes, and tackle their backlog. In collaboration with INPI, a unique AI approach was created, which accelerated the laborious task of discovering prior art by focusing the solution’s search algorithms on multiple facets of patents to determine similarity between the target patent application and existing patent and non-patent publications, and refine results. An additional algorithm then created a relevant ranked data set for examiners to review. The results of this solution were impressive, with up to a 50% reduction in examination times, reduced search times for over 75% of applications processed, and contributing to an overall reduction of 80% in the office’s patent backlog. However, CAS arrived at the tailored solution with constant refinement and consideration of three factors.

Three considerations when implementing AI:

1. Quality data and human-curated data sets

While AI solutions can speed up prior art searches exponentially, AI alone is not a silver bullet and cannot replace patent examiners. However, AI can become a powerful tool patent examiners can use to enhance performance of their workflows. The secret lies in possessing curated and highly structured content that can train an algorithm correctly and then utilizing experts to maximize its application. In this regard, CAS see AI as just the latest technology that they layer upon their continuously updated data sets to improve search and retrieval of information and supplement this data and technology with extensive subject matter expertise and services.

Two waves in publishing have made the careful curation of content even more necessary, namely digitization and globalization. Digitization is the process of converting physical materials, such as books, illustrations, objects, and analog recordings and photos, into digital form. While globalization is the translation of these sorts of materials into other languages, as patents are territorial and must be filed in each country where protection is sought. These waves pose significant roadblocks to optimizing AI-powered prior art searches. Digitization often leads to transcription errors, mislabelled units, and overly complex patent language, while globalization leads to patents in dozens of languages. Each of these make human curation a necessity for quality data that can be easily searched and retrieved.

Thankfully CAS has a vast catalog of expertly human-curated data. In fact, CAS has been crowdsourcing data for over a century, by gathering abstracts from public and private domains since 1907. This vast catalog has been normalized, prepared, and connected in a structured format which improves the training of AI algorithms and increases the performance of prior art searches. By augmenting AI technology with human expertise for INPI, CAS scientists fed clean and structured data to the AI solution improving the predictive accuracy.

2. Domain expertise

Another consideration is to leverage the know-how of domain experts to refine the AI solution throughout a project. The INPI project required CAS to provide a wide array of expertise from distributed algorithms and machine learning to data science, cheminformatics, patent searching, and high-performance computing.

The CAS IP search team was therefore able to support the examiners’ searches by validating algorithm results during development and performing highly complex searches to augment the office’s capabilities. With individual prior art searches often variable in scope, different search professionals are likely to design different strategies for a given search. Having a team of search experts available to analyze algorithm results enabled them to yield insights into how those algorithms can be fine-tuned to improve relevancy.

3. Choosing the right algorithms

As has been established, completing a comprehensive prior art search is a painstaking process that requires the consideration of multiple facets of possible similarity. Therefore, choosing only one algorithm focusing on a single type of analysis, such as semantics, will prove insufficient to the task. For the INPI project, CAS chose to integrate four types of algorithms for text-based and substance-based analysis, including deep learning and term frequency-inverse document frequency. Using multiple algorithms allowed the AI to find semantic, syntactic, and substance similarities all in one multifaceted solution.

Traditional knowledge graphs were also added to analyze the connectedness between the vast amounts of data. The INPI Brazil project deployed one for chemistry and one for non-chemistry to determine ontological similarity and connectedness between documents using keywords, scientific topics, roles, and nomenclature.

The first-level algorithms evaluated semantics, such as title, abstract, and claims between patent and non-patent publications, and used a syntactic-driven algorithm that compared the prevalence of special terms in the target document to their uniqueness across all other documents to return an accurate set of similarity results.

Then, at the second level, an algorithm for a patented ensemble learning process combined the results to produce an optimal predictive model, which was then used to generate relevance-ranked results based on search context and each algorithm’s strengths and limitations. The ensemble learning algorithm then analyzed the ranked results arriving at a single prioritized list of patent and non-patent publications that were most likely to conflict with the target patent for the examiners to review.

Worldwide applicability of tailored AI

When implemented correctly, as in the INPI project, AI can transform patent office workflows and remove tedious tasks to free up researchers’ and examiners’ time for value-add work. There is no one size fits all solutions for these complex workflows and undertakings. The key is having close collaboration between the office and solutions experts to ensure the approach is perfectly aligned with the office’s strategic objectives.

Global patent offices face fundamental challenges that put their operational sustainability at risk. By combining AI, human-curated data, and workflow transformation, CAS has established an extremely effective approach for improving patent office timeliness, patent quality, and efficiency to help accelerate innovation around the world.

Machine Learning, Natural Language Processing, NLP

Three Key Considerations When Implementing AI

How these helped to revolutionize patent office workflows

Three Key Considerations When Implementing AI

The application of AI in patent offices

The integration of AI solutions in Brazil’s patent office

Three considerations when implementing AI:

1. Quality data and human-curated data sets

2. Domain expertise

3. Choosing the right algorithms

Worldwide applicability of tailored AI

Exploring OpenAI Embeddings: A guide to advanced natural language processing

Join MLCon Berlin 2025:
The Machine Learning Conference of the year.

Behind the Tracks

Machine Learning & Principles

Advanced ML Development

Business & Strategy

Tools, APIs & Frameworks

Three Key Considerations When Implementing AI

How these helped to revolutionize patent office workflows

The application of AI in patent offices

The integration of AI solutions in Brazil’s patent office

Three considerations when implementing AI:

1. Quality data and human-curated data sets

2. Domain expertise

3. Choosing the right algorithms

Worldwide applicability of tailored AI

Exploring OpenAI Embeddings: A guide to advanced natural language processing

Join MLCon Berlin 2025: The Machine Learning Conference of the year.

Top Articles About ML Business & Strategy

Agentic AI: The Future of Business Proce...

AI as a Superpower: LAION and the Role o...

AI in Vaccine Development and Rollout...

Behind the Tracks

Machine Learning & Principles

Advanced ML Development

Business & Strategy

Tools, APIs & Frameworks

Join MLCon Berlin 2025:
The Machine Learning Conference of the year.