-7.9 C
United States of America
Sunday, January 19, 2025

Summarizing Books as Podcasts – O’Reilly


Like nearly everybody, we have been impressed by the flexibility of NotebookLM to generate podcasts: Two digital individuals holding a dialogue. You can provide it some hyperlinks, and it’ll generate a podcast primarily based on the hyperlinks. The podcasts have been fascinating and fascinating. However additionally they had some limitations.

The issue with NotebookLM is that, whilst you can provide it a immediate, it largely does what it’s going to do. It generates a podcast with two voices—one male, one feminine—and offers you little management over the outcome. There’s an non-compulsory immediate to customise the dialog, however that single immediate doesn’t permit you to do a lot. Particularly, you may’t inform it which subjects to debate or in what order to debate them. You possibly can attempt, nevertheless it received’t hear. It additionally isn’t conversational, which is one thing of a shock now that we’ve all gotten used to chatting with AIs. You possibly can’t inform it to iterate by saying “That was good, however please generate a brand new model altering these particulars” like you may with ChatGPT or Gemini.


Study quicker. Dig deeper. See farther.

Can we do higher? Can we combine our data of books and expertise with AI’s potential to summarize? We’ve argued (and can proceed to argue) that merely studying easy methods to use AI isn’t sufficient; it is advisable discover ways to do one thing with AI that’s higher than what the AI might do by itself. You should combine synthetic intelligence with human intelligence. To see what that may appear to be in apply, we constructed our personal toolchain that offers us rather more management over the outcomes. It’s a multistage pipeline:

  • We use AI to generate a abstract for every chapter of a ebook, ensuring that every one the vital subjects are coated.
  • We use AI to assemble the chapter summaries right into a single abstract. This step basically provides us an prolonged define.
  • We use AI to generate a two-person dialogue that turns into the podcast script.
  • We edit the script by hand, once more ensuring that the summaries cowl the proper subjects in the proper order. That is additionally a possibility to right errors and hallucinations.
  • We use Google’s speech-to-text multispeaker API (nonetheless in preview) to generate a abstract podcast with two members.

Why are we specializing in summaries? Summaries curiosity us for a number of causes. First, let’s face it: Having two nonexistent individuals focus on one thing you wrote is fascinating—particularly since they sound genuinely and excited. Listening to the voices of nonexistent cyberpeople focus on your work makes you’re feeling such as you’re dwelling in a sci-fi fantasy. Extra virtually: Generative AI is definitely good at summarization. There are few errors and nearly no outright hallucinations. Lastly, our customers need summarization. On O’Reilly Solutions, our prospects continuously ask for summaries: summarize this ebook, summarize this chapter. They need to discover the knowledge they want. They need to discover out whether or not they really want to learn the ebook—and if that’s the case, what elements. A abstract helps them try this whereas saving time. It lets them uncover rapidly whether or not the ebook will probably be useful, and does so higher than the again cowl copy or a blurb on Amazon.

With that in thoughts, we needed to assume by way of what essentially the most helpful abstract could be for our members. Ought to there be a single speaker or two? When a single synthesized voice summarized the ebook, my eyes (ears?) glazed over rapidly. It was a lot simpler to take heed to a podcast-style abstract the place the digital members have been excited and enthusiastic, like those on NotebookLM, than to a lecture. The give and take of a dialogue, even when simulated, gave the podcasts power {that a} single speaker didn’t have.

How lengthy ought to the abstract be? That’s an vital query. In some unspecified time in the future, the listener loses curiosity. We might feed a ebook’s total textual content right into a speech synthesis mannequin and get an audio model—we could but try this; it’s a product some individuals need. However on the entire, we count on summaries to be minutes lengthy quite than hours. I would hear for 10 minutes, possibly 30 if it’s a subject or a speaker that I discover fascinating. However I’m notably impatient once I take heed to podcasts, and I don’t have a commute or different downtime for listening. Your preferences and your state of affairs could also be a lot totally different.

What precisely do listeners count on from these podcasts? Do customers count on to study, or do they solely need to discover out whether or not the ebook has what they’re on the lookout for? That relies on the subject. I can’t see somebody studying Go from a abstract—possibly extra to the purpose, I don’t see somebody who’s fluent in Go studying easy methods to program with AI. Summaries are helpful for presenting the important thing concepts introduced within the ebook: For instance, the summaries of Cloud Native Go gave a superb overview of how Go might be used to handle the problems confronted by individuals writing software program that runs within the cloud. However actually studying this materials requires taking a look at examples, writing code, and practising—one thing that’s out of bounds in a medium that’s restricted to audio. I’ve heard AIs learn out supply code listings in Python; it’s terrible and ineffective. Studying is extra doubtless with a ebook like Facilitating Software program Structure, which is extra about ideas and concepts than code. Somebody might come away from the dialogue with some helpful concepts and probably put them into apply. However once more, the podcast abstract is just an outline. To get all the worth and element, you want the ebook. In a latest article, Ethan Mollick writes, “Asking for a abstract will not be the identical as studying for your self. Asking AI to resolve an issue for you will not be an efficient strategy to study, even when it feels prefer it needs to be. To study one thing new, you will should do the studying and considering your self.”

One other distinction between the NotebookLM podcasts and ours could also be extra vital. The podcasts we generated from our toolchain are all about six minutes lengthy. The podcasts generated by NotebookLM are within the 10- to 25-minute vary. The longer size might permit the NotebookLM podcasts to be extra detailed, however in actuality that’s not what occurs. Slightly than discussing the ebook itself, NotebookLM tends to make use of the ebook as a leaping off level for a broader dialogue. The O’Reilly-generated podcasts are extra directed. They observe the ebook’s construction as a result of we offered a plan, an overview, for the AI to observe. The digital podcasters nonetheless categorical enthusiasm, nonetheless usher in concepts from different sources, however they’re headed in a route. The longer NotebookLM podcasts, in distinction, can appear aimless, looping again round to select up concepts they’ve already coated. To me, no less than, that looks like an vital level. Granted, utilizing the ebook because the jumping-off level for a broader dialogue can be helpful, and there’s a steadiness that must be maintained. You don’t need it to really feel such as you’re listening to the desk of contents. However you additionally don’t need it to really feel unfocused. And if you’d like a dialogue of a ebook, you need to get a dialogue of the ebook.

None of those AI-generated podcasts are with out limitations. An AI-generated abstract isn’t good at detecting and reflecting on nuances within the authentic writing. With NotebookLM, that clearly wasn’t underneath our management. With our personal toolchain, we might definitely edit the script to replicate no matter we wished, however the voices themselves weren’t underneath our management and wouldn’t essentially observe the textual content’s lead. (It’s controversial that reflecting the nuances of a 250-page ebook in a six-minute podcast is a shedding proposition.) Bias—a type of implied nuance—is a much bigger challenge. Our first experiments with NotebookLM tended to have the feminine voice asking the questions, with the male voice offering the solutions, although that appeared to enhance over time. Our toolchain gave us management, as a result of we offered the script. We received’t declare that we have been unbiased—no one ought to make claims like that—however no less than we managed how our digital individuals introduced themselves.

Our experiments are completed; it’s time to indicate you what we created. We’ve taken 5 books, generated quick podcasts summarizing every with each NotebookLM and our toolchain, and posted each units on oreilly.com and in our studying platform. We’ll be including extra books in 2025. Hearken to them—see what works for you. And please tell us what you assume!



Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles