Recent introduction of Generative Al technologies, and in particular Large Language Models (LLMs) and ChatGPT (as the most famous example), are introducing a major potential disruption to the technical content industry and to the ways in which such content is produced and consumed today. For the first time, LLMs are no longer perceived as expensive, high-end luxury technology that requires fleets of data scientists to tune and train, but are slowly being seen as affordable, low-maintenance commoditized infrastructure being routinely used by consumers.
The ability to quickly produce and analyze technical content in mass, at a high degree of accuracy and consistency, and from a wide range of inputs, is a huge promise. Content can be generated quicker and with less manual labor, leading to significant cost savings for organizations and reduced customer effort. Thus, resulting in happier and more productive content consumers. We believe that fast and smart implementation of LLMs throughout the technical content cycle will be vital for surviving the paradigm shift in content creation and delivery.
In mainstream research, content production and consumption are consistently mentioned among the top five areas that will be disrupted by these new technologies. We are already witnessing this shift in other content-intensive areas, such as content marketing and in the media industry. Surveys we have issued have shown that the TechComm industry largely understands the ramifications and has already prepared itself to embrace the coming change. When asked if these new technologies would result in a major disruption to the industry, 59% of respondents expected a major disruption and only 12% of respondents considered this a temporary buzz. Furthermore, when asked about how far they are in the evaluation and in the adoption of those technologies, 37% of respondents were already in some form of an early pilot or rollout processes, while the others said they were Go to Sactively studying the art of the possible ahead of making decisions.
This high-level review is meant to outline the major predictions we have about how our industry will be affected by this upcoming and trending technology. Enjoy the read.
Enterprise content is typically undergoing a similar workflow across the various industries. This workflow consists of the following four major stages:
In this white paper, we will present our top predictions of how Generative Al has the potential to disrupt all of the stages in this content workflow and reshape the content ecosystem as a whole.
Historically, the work of writing documentation was split between two primary roles the subject matter expert (SME) and the technical writer. The SME was responsible to be the domain expert, educate the technical writer, and approve the final deliverable. The technical writer was responsible to take the SME's knowledge and formalize it into a consistent, clear, engaging piece of content that fits well with the rest of the company's information t and content standards. Much of the reason why SMEs, despite their seniority, weren't given the full authority to generate formal customer-facing content was because companies lacked an objective method for assessing their writing quality and weren't sure if the SME had the global knowledge of how this specific deliverable would fit in with the rest of the existing content.
The introduction of Generative Al changes this balance. Even today, you can provide a ChatCPT-like bot a high- level overview of the knowledge itself and ask it to generate an extended, cohesive knowledge article that can be used as a scaffold providing the writer with about 80% of the task in a matter of minutes. The more advanced these bots become, the more they will leam your company's writing style guides, tone-of-voice, and preferences, and in addition, will leam from prior knowledge to teach SMEs "what great writing looks like for your organization, allowing SMEs to independently generate better deliverables. SMEs will no longer spend time training someone else (ie. the writer) on the domain and will also not need to review their work, thus leaving them time to generate better content. Finally, given the fact there are always more SMEs than writers, organizations can quickly scale up content production and generate larger quantities of content in a shorter period of time.
XML-based structured content standards (e.g. DITA) offer a lot of power, but require an extensive learning curve, appropriate tooling, and skilled personnel to build and maintain content operations. Even today, some organizations find it to be too complex and are reverting or balancing it with simpler and cheaper - although less powerful- formats (e.g. Markdown) and all-in-one KB tools.
Over time and with the assistance of generative Al, the SMEs will be taking a primary role in the creation of the content. These SMEs are not full-time writers and lack the skills, capacity, and desire to adopt and use complex structured content standards and tools - they just don't see its value.
The need for structured content and self-contained chunks is not going to go away. But, the better automated text analysis technologies become, computers could extract the semantics and structure even from large blobs of text which are not written in an ideal way, thus adding the structure which the SME's tool cannot create.
However, structured content - if used correctly-can dramatically help with the definition of content semantics to make the content highly optimized for programmatic consumption via search, filtering and personalization. Standards committees will need to modify their standards accordingly to make sure they are able to provide optimal support based on what the Al models are requiring.
In the last 20 years, users have accepted Google as the "holy-grair experience for finding answers. To be more specific, users developed habits of writing optimized, concise, and minimalistic search queries, in order to get a collection of what would likely be the most useful links to potential results. In other words, users would not typically expect to get a direct authoritative answer, but instead to get references to places where they can research and find the answers on their own.
With LLMs getting traction and with Microsoft and Google embedding conversational experiences into their search engines (with Bing and Bard Al respectively), conversational language is becoming part of the new standard for getting answers. This shift changes the way customers expect to get answers: direct and conversational over clicking links, reading text, and summarizing on their own..
With the emergence and media coverage of ChatGPT and the fact that the overall quality of the answers provided are above expectations, some people may think that LLMs are ready for prime-time and can provide a magic solution for extracting complex answers from their content with super-high accuracy.
LLMs are just starting to reach the threshold of sufficient performance. In cases where the training set is not large enough or is mostly domain-specific and non-standard, it is possible that certain answers would be rejected by customers as being incorrect or just irrelevant. In extreme cases, this may cause customers to rely on false answers to operate complex products, and may lead to customer complaints, escalations, or even legal or punitive action.
Although technology is improving at a blazing speed and customer content consumption habits are changing dynamically, we still believe that with the right mitigations, companies can reach a balance that provides their customers an elevated level of customer experience and usability while mitigating the impact of potential risks.
Today, the generation of technical content is done in silos, by different teams, using different formats and tools. This is likely not going to change any time soon, and perhaps even become worse as the move to SME-generated content intensifies. Having said that, despite the silos, the market as a whole is still seeing the integration of siloed content as a nice-to-have and most companies do not have a real solution in place which provides an always up-to-date, consolidated source of truth.
LLMs are only as good as the content they are fed.. If content is siloed and inaccessible, Al will fail to deliver accurate and comprehensive answers and will not live up to its expectations. Companies will realize that they have two big problems to solve: (1) consolidation - getting all of their content into one place where it can be fed to the LLM and (2) governance - solving for security and privacy, to make sure sensitive information does not leak out and make its way into the wrong hands.
Today, different departments in companies are working on writing different content related to the same subject or Issue. You can have documentation topics, knowledge base articles, peer to peer discussions, training materials- all talking about the same subject, but written by various functions who hardly know about one another. Typical customer experience today is that these various materials are all "thrown at the user as they are surfaced in search results, and the customer would be faced with choosing the most applicable result which best matches their question or problem. Typically, they will be choosing individual pieces of content, without seeing the "aggregate nature of the information as could have been collated from all of the different content pieces which were written about the subject at hand.
LLMs can analyze all of these content pieces, and dynamically build a "Content mashup"-Lea new "synthetic" summary of the content made from fragments from the various content pieces rewritten as a cohesive and personalized answer. In other words, LLM are building new content that was never explicitly authored by anyone based on all available information, saving users the need to ensure they visited and read all individual content pieces one-by-one.
Content review today is a manual and tedious process. The review typically consists of a few components. Among others, it is being checked to ensure the content is accurate from a technical point of view, that it can be followed and is functional, that it is mechanically corect in terms of spelling, grammar, style and tone-of-voice and writing guidelines, and that it is pedagogically correct in terms of the information location compared to other information This process today involves both the SME and the professional writer.
LLMs will be able to do most of the stylistic editorial work on the content. Even today, they can largely enforce writing style guides and rewrite content to better fit the target style without significant reduction in accuracy.
Since a lot of the content will be written by SMEs that are also the domain experts, reviews are expected to be shorter and to focus more on the accuracy and completeness of the content, with the style validation being almost entirely automated.
Technical content analytics is heavily influenced by web analytics today. Many KPIs such as bounce rate, session duration, click through rate, and others, are very popular and frequently tracked.
Conversational experiences change the user's experience and require new KPIs which are better suited to evaluate success. For example, the three KPIs outlined above might not even be relevant in a conversational context.
Today, analytics is mostly consumed manually by professional analysts or business functions who are relatively skilled with operating complex BI tools and dashboards. This means that analytics is underutilized, and regular insights about consumption of technical content and its impact is not regularly making its way to leadership at most companies.
Just like LLMs are currently writing economic updates, summaries, and trend briefs for stock analysts (as they're analyzing large amounts of market data), they will be able to analyze content consumption data and generate short briefs for leadership showing the general trends and opportunities for improvement.
In this white paper we discuss the impact of generative Al on technical writing and structured content, making predictions for the future. We suggest that SMEs will take on a larger role in content creation, which will lead to the democratization of tools and technology stacks, simplification of authoring tools and increased need for content orchestration. Structured content will undergo a transformation to stay relevant, with more automation of text analysis technologies and computers extracting the semantics and structure from large blobs of text.
This means companies using structured content should explore deeper how they can follow content architecture best-practices and embrace additional publication pipelines with non-structured content at its core. The consumption of content will shift from being search-based to being conversational-based, as conversational language becomes part of the new standard for getting answers. This shift changes the way customers expect to get answers and will require a high degree of relevancy and personalization.
To ensure that companies are ready to handle the impact of these changes, their next steps should include preparing their content to be compatible with GPT. Content readiness will be crucial in providing a seamless GPT experience, particularly for technical content.
This involves consolidating and governing their content to ensure that it is accurate, secure, and relevant. By doing so, companies can ensure that any user can effectively consume their content through a GPT solution.