Thursday, May 18, 2023

ChatGPT: Making Librarians Look Good Since 2022

Well, my mind hasn't changed: AI won't be coming for librarian jobs any time soon. Until someone builds and AI that is focused on data and research--instead of a language model like ChatGPT--replacing us is a laughable concept.

Some of my fellow PTRC representatives have been experimenting with ChatGPT since I last posted about my initial adventures in February. For those who may not recall, I found that the language model AI was unable to perform any sort of patent or trademark search, or provide specific data on patents or trademarks.

That has not changed over the past few months' updates. I tested again by asking for a specific patent number applied to the patented plant featured in my last post (Hibiscus 'Panama Red'). Again, ChatGPT was unable to give me a patent number.

ChatGPT still can't provide patent numbers

I decided to try the approach made by my fellow rep mentioned above, Stella Mittelbach of the Los Angeles Public Library Science, Technology & Patents Department. She requested Cooperative Patent Classification (CPC) numbers that may be applied to an invention, and was provided with an extensive list of potential matches. I was curious about the quality of CPC numbers ChatGPT could provide.

In recent publications and casual conversations, I've learned that ChatGPT struggles to provide concrete data and references, sometimes producing "hallucinations", which is made-up information presented as factual. Furthermore, searching CPC numbers can be challenging, and narrowing down the candidates to a category that most closely matches an invention is comparable to the initial stages of a patent search. It surprised me that ChatGPT was capable of returning such specific information.

Thanks to teaching courses on searching for CPC numbers and using them to search for patents, I have become extremely familiar with a few groups and subgroups. I decided to ask ChatGPT to suggest CPC numbers for the hypothetical invention I typically use: a new contact lens material that makes it safe to sleep in them, that is an improved silicone hydrogel with higher oxygen permeability. I know two potential CPC numbers that could be applied to this invention, just based on that description.

These numbers are so wrong, don't bother looking too closely.

ChatGPT was off-base. It supplied numbers that were mostly unrelated, yet has some aspects that used similar terminology. I followed up with asking for the definition of the CPC number G02C 7/001, which I knew was for bifocal contact lenses. Again, ChatGPT failed to give accurate information.

It's also wrong

I corrected the AI.

My correction spurred it to give some better information
 

Second correction, with a suggestion

Then I asked where it was obtaining its information. It vaguely cited "general knowledge", which means nothing. CPC numbers are something I would classify as "specific knowledge", rarely known to any extent by the vast majority of the populace. I'd wager most people are unaware the CPC exists. 

ChatGPT seems unsure of its information sources

Since the AI is supposed to learn from interactions, I wanted to test if my corrections were somehow absorbed. I emailed PTRC colleagues a copy of the transcript, and requested someone else attempt my questions, and share the answers given.

Dave Bloom, Science and Engineering/PTRC Librarian of University of Wisconsin - Madison tried my initial prompts verbatim using multiple ChatGPT accounts and one for Google Bard, a few times each. He found each response was "noticeably different, not necessarily informed by the other answers, and, most importantly, not reliable."

A sample of Dave Bloom's interactions


 
Of course I tested this myself, on two subsequent days to see if that made a difference. 

Similar to Dave Bloom, I got multiple answers. When I submitted a correction, I was able to choose a "better" answer, but this is not an option that someone actually trying to find a CPC number using ChatGPT would have. Furthermore, the better answer is based on the specific feedback I provided. A different account might receive the wrong information, and not be able to correct it. OpenAI should program it to stick with an instructional response, which is never wrong.

The next day

Before I made a suggestion through the system

After my suggestion through the system
Over a couple of days, and especially when I provided feedback not in chat, but through the feedback form, there was some slight improvement.
On the second day...

the answers seemed slightly improved...

but are still inaccurate.

Dave Bloom later reminded me that when using a large language model (LLM) like ChatGPT, "it's tempting to evaluate each individual response on the basis of accuracy. [...] It’s not about whether any individual answer is right or wrong or whether ChatGPT or Bard is better at generating “correct” answers, but that, even with the exact same prompt, LLMs do not produce the same results. Replicability is a baseline expectation of credible scientific research, and we should apply it to search tools, too." They don't provide reliable source responses, because their function is to produce answers that appear to be written by humans. Dave explained that this means the fake citations and URLs are therefore often very convincing.

So, even when ChatGPT can provide an answer, they're at high risk of being very bad answers. Librarians have dodged the bullet again!

I wasn't worried about you, buddy; my concern is decision makers who like to save money

Friday, May 12, 2023

The End of Paper Plant Patents

Apparently, we have received the last of the paper copies of plant patents.* While a part of me is sad--looking through pictures of new plants was always entertaining--I understand the reasons for it. Paper copy distribution is expensive, uses resources, and the copies take up space in depository libraries/PTRCs.

Loyal readers and fellow plant patent fans are likely asking, "But how will we see the important color images of patented plants?" After all, the images included in Patent Public Search and therefore all other patent search tools and databases are grayscale and low quality.

Fortunately, you can still find color images of plant patents using Patent Center.

Unfortunately, you have to know the number of the plant patent you want to see in advance, and it only goes as far back as about 2007. I've also found this works with some Internet browsers (Chrome) and not others (Firefox).

Here's a brief tutorial plus images for finding the color images we all crave.

  1. Search for a patented plant in Patent Public Search (please see other entries or resources for help with that) or another tool of your choice.
  2. Once you find the patented plant desired, post 2007ish, make a note of the patent number.
    Search results list and search query panels in Patent Public Search, showing Hibiscus plant patents
  3. On the home page of the USPTO, hover your mouse cursor over the Find It Fast menu and then select Patent Center (first listing).
    Find It Fast menu on USPTO.gov with a red arrow pointing to the first Patent Center link
  4. OPTIONAL: Log into Patent Center with your USPTO account. This is not necessary, but if you have one, why not? 
  5. On the home page, select Patents # from the drop down menu next to the search bar.
    Patent Center search box with drop down menu to select what type of number to search
  6. Enter the patented plant number, complete with PP prefix, and click the search button (magnifying glass icon) or use the Enter key.
    Patent Center search for plant patent number PP20121
  7. The patented plant's page will open to its Application Data. Using the menu on the left side, select Supplemental Content.
    Patent Center information for plant patent PP20121
  8. On that page there will be an option to download JPEGs and PDFs or just preview a PDF.
    A listing of the supplemental documents that includes plant patent images
  9. Voila! You have found color images of plant patents, and all future patented plants should be findable here, too.

So, while it isn't as fun as having stacks of new plants to look through, and the quality of a scanned photo is never as good as an original print, at least we have something.

 

*I say "apparently" because immediately following the official announcement that April 18th would be the last issue date, more paper copies were issued. Clarification should be forthcoming...

Friday, May 5, 2023

Weeding

How familiar are you with weeding? I don't mean pulling unwanted plants from a garden, but rather removing books from a library collection.

Most of the population is pretty unfamiliar with weeding in librarianship. It isn't uncommon to first encounter weeding when someone finds the product--a pile of books in a bin--and is immediately horrified by libraries and librarians getting rid of them. It's unfortunate that they enter at that point, and aren't exposed to the beginning. Every librarian understands weeding is essential, and most will engage in weeding throughout their career. We even share our interesting finds on social media (#AdventuresInWeeding #WeedingWednesday) with each other.

A library cart showing stacks of books to be evaluated (top) and to be removed (lower two levels)
Books to be evaluated (top) and to be removed (lower two levels)

The point of weeding is to remove outdated material so it can be replaced with newer, more accurate versions. Information changes as we learn more and society evolves. 

People rarely question the need for discards and updates in other industries. No one wants to rent a VHS or laser disc or even DVDs from Netflix; a dwelling that relied on a fireplace for heating and an icebox for food storage is undesirable.

Even in a specialized library service area like the PTRC, we have to weed our collections, so I thought it would be useful to share some of the process and explain why it's beneficial. After all, there will be a large number of disposed PTRC materials in the near future, so any public outcry can be directed here.

For those who may be unfamiliar, there was an enormous change in US patent law in 2011. So much of the previous system was overhauled that just about anything covering US patents published pre-2011 is now completely irrelevant. Unfortunately, that means almost everything about patents in our collection from before 2011 is also irrelevant. So far, I've removed all of the patent-focused books for evaluation; most will be discarded. Exceptions are patent case law series, books on the history of patents, and those concerned with patents abroad. 

A library cart with a row of the evaluated materials we are keeping
Some of the evaluated materials we are keeping

It's also been a shock to learn how many books on software and domain development and protection we have from the 1990s and early 2000s. Just like the patent legal landscape has completely transformed, so has software technology and the internet. Even if copyright laws remained the same (and they most certainly do not), those books would still be completely out of touch. 

A third category of materials has been obsolesced within the past year: patent searching manuals. Materials published after 2011 are also mostly inaccurate, too, because the USPTO released a new search tool in early 2022 and discontinued its predecessors later that year. Fortunately, this is a much smaller set.

Library shelves showing large gaps without books or materials
New gaps in our shelves

After taking the three above factors into consideration, most of those books still need evaluating. Do they have any relevant information that makes them worth keeping? Are there newer editions in our section or the larger Fondren collection? Does a newer edition exist that can easily replace it?
Library shelves showing large gaps without books or materials
More shelf gaps that will likely grow larger

Then, we have to consider if the materials slated for discard without new editions need some kind of replacement. Some don't need updates, like guides to filing patent or trademark applications online; the only way to apply is online. Personal experience-based guides, academic monographs, and industry/professional association booklets offer valuable information, but require review. 

Come visit our refreshed PTRC collection in the next few weeks. Until then, chances are you will find me in the stacks a couple times a day, as I cart books back and forth to my office.