In a significant move within the publishing sector, Microsoft has struck a three-year deal with HarperCollins to use its nonfiction catalogue for training an AI model, while ensuring authors receive a share of the earnings.
In a significant development within the intersection of artificial intelligence and publishing, Microsoft has entered into a three-year agreement with HarperCollins to facilitate the training of an undisclosed AI model using the publisher’s catalogue of nonfiction books. Bloomberg reports that under the terms of this deal, Microsoft will pay $5,000 per eligible nonfiction title, with the proceeds being equally divided between the authors and HarperCollins. Notably, this agreement stands apart from any existing publishing arrangements and does not count against authors’ previous advances.
The agreement, which came to light through a report by 404 Media, stipulates that only select nonfiction works published prior to this agreement are eligible for inclusion. Authors who wish to have their works featured in this AI training programme must actively opt in, while those who decide against participation will exclude their titles from the dataset and forego the associated payout. It’s important to note that not all HarperCollins authors will receive the offer, as Microsoft is engaged in a targeted selection of the books intended for the model’s training.
To alleviate concerns about potential misuse of content, the terms include specific limitations on how the authors’ works can be used. The agreement specifies that the AI model can utilise “no more than 200 consecutive words and/or five percent of a book’s text” from the selected works, alongside a commitment from Microsoft that it will refrain from scraping text from any illegal piracy sites.
This move highlights a growing trend where tech companies are seeking to leverage large datasets to train their AI models, particularly large learning models (LLMs). As the availability of suitable training material is limited within the public domain, Microsoft’s arrangement with HarperCollins significantly broadens the scope of data available for the development of its AI capabilities.
Previous agreements between tech firms and publishers often remained under wraps; however, the details of the HarperCollins deal not only illuminate Microsoft’s intentions but also establish a financial baseline for what technology companies may be willing to invest in similar initiatives. Although the specific purpose of the AI model remains undisclosed, Bloomberg noted that it will not be designed for book generation.
As businesses seek to harness emerging technologies, the Microsoft-HarperCollins collaboration represents a pivotal moment in the realignment of content creation and artificial intelligence within the publishing industry.
Source: Noah Wire Services
- https://www.siliconrepublic.com/machines/harpercollins-microsoft-books-ai-model-training-copyright-news-corp – Corroborates the agreement between HarperCollins and Microsoft to train AI models using nonfiction books, including details on author opt-in and financial arrangements.
- https://www.siliconrepublic.com/machines/harpercollins-microsoft-books-ai-model-training-copyright-news-corp – Provides information on the scope of the agreement, including the limitation on text usage and the commitment to avoid scraping from piracy sites.
- https://www.siliconrepublic.com/machines/harpercollins-microsoft-books-ai-model-training-copyright-news-corp – Mentions the involvement of News Corp and other publishers in similar AI training deals, highlighting the broader trend in the industry.
- https://www.siliconrepublic.com/machines/harpercollins-microsoft-books-ai-model-training-copyright-news-corp – Details the financial aspects, including the $2,500 payout per title and the division of proceeds between authors and HarperCollins.
- https://www.siliconrepublic.com/machines/harpercollins-microsoft-books-ai-model-training-copyright-news-corp – Discusses the reaction of authors, such as Daniel Kibblesmith, and their decisions regarding participation in the AI training program.
- https://www.siliconrepublic.com/machines/harpercollins-microsoft-books-ai-model-training-copyright-news-corp – Highlights the broader context of AI training and content usage, including other deals between tech firms and publishers.
- https://www.siliconrepublic.com/machines/harpercollins-microsoft-books-ai-model-training-copyright-news-corp – Clarifies that the AI model is not intended for book generation and outlines Microsoft’s intentions and investments in AI development.
- https://www.siliconrepublic.com/machines/harpercollins-microsoft-books-ai-model-training-copyright-news-corp – Mentions the involvement of other news organizations and publishers in AI training deals, such as OpenAI and News Corp.
- https://www.siliconrepublic.com/machines/harpercollins-microsoft-books-ai-model-training-copyright-news-corp – Discusses the legal and ethical considerations, including the lawsuit by The New York Times against OpenAI for unauthorized use of content.
- https://www.siliconrepublic.com/machines/harpercollins-microsoft-books-ai-model-training-copyright-news-corp – Provides a statement from HarperCollins on their approach to innovation and protecting authors’ rights in the context of AI training.












