Microsoft faces scrutiny over customer data use for AI training

Microsoft is under fire after allegations suggest its ‘Connected Experiences’ feature may enable automatic data scraping from user documents for AI training, prompting concerns about user privacy.

Recent discussions surrounding the utilisation of customer data by Microsoft for artificial intelligence (AI) training have sparked considerable debate, following allegations that the technology giant has engaged in specific practices that could lead to unintended data usage. The controversy centres on Microsoft’s ‘Connected Experiences’ feature, which some users allege is designed to gather user-generated content from applications such as Word and Excel for the purpose of training large language models (LLMs).

The allegations emerged from an X post by user @nixCraft, who claimed that Microsoft had enabled a feature by default that permits the automatic scraping of content from user documents to enhance its AI capabilities. According to this claim, users must actively uncheck a box within the settings to opt out of this data collection, raising concerns about the automatic harvesting of proprietary content.

In response to these claims, Microsoft has firmly rebutted the allegations, asserting that it does not incorporate customer data from Microsoft 365 applications into its AI training processes. A spokesperson for Microsoft stated, “In the M365 apps, we do not use customer data to train LLMs. This setting only enables features requiring internet access like co-authoring a document.” The company’s public stance reaffirms its commitment to safeguarding user data privacy.

Microsoft has previously addressed these concerns in a blog post released in August 2024, emphasising that user data remains confidential and cannot be disclosed without prior consent. The company specifically noted that generative AI models do not store training data or return it to provide responses. Instead, these models are intended to generate entirely new responses based on their training.

Moreover, Microsoft has committed to informing users transparently should there be any modifications to its data handling practices in relation to training its generative AI models, particularly regarding its Copilot feature.

A key aspect of Microsoft’s strategy is the differentiation between customer data and publicly available data, which they refer to as ‘freeware’ for AI training. This perspective was highlighted by Microsoft AI CEO Mustafa Suleyman, who articulated the company’s approach to utilizing data obtained from open sources while ensuring that customer-generated content remains separate and protected.

As developments in AI technology continue to evolve, the questions surrounding data privacy and the ethical use of information in training AI models are likely to remain at the forefront of public discourse. Microsoft’s approach and responses to these concerns will play a significant role in shaping both customer trust and the broader landscape of AI implementation in business practices.

Source: Noah Wire Services