Rewriting / Rephrasing content has been of specific interest to me for a long time. Even before NLP or GenAI was a thing, I ran a SaaS tool called Article Spinner back in the day (2008 to 2013). Selling 100 rewrites @ $5. Which was used by content farms in creating different variants of “unique content” from a single piece of content.
Now with tools like ChatGPT, it is no more a specialized task that requires specialized tools. But even while building Article Spinner, my focus used to be on how much of intent and details are retained from the original text after iterations of spinning.
We’ll take a source string, and iteratively rephrase the string using an LLM and look how the Dice Coefficient changes over iterations. Heavily inspired by this experiment of re-uploading a video over and over again to Youtube!
The text
The sun dipped below the horizon, painting the sky in hues of orange and pink, while a cool breeze carried the scent of blooming jasmine. Emma sat on the weathered wooden bench in the park, watching children chase fireflies and listening to the distant murmur of the stream. The tranquility of the moment was broken only by the occasional chirping of crickets, a natural symphony that seemed to amplify the serenity of her surroundings. She closed her eyes briefly, allowing herself to be fully immersed in the embrace of nature, feeling the rough texture of the bench against her palms. A sense of calm washed over her, reminding her of childhood evenings spent in a similar park, where worries seemed like a distant concept. Now, as an adult, moments like these were rare treasures, brief escapes from the relentless pace of life. The simplicity of the evening, she thought, held a profound beauty that modern distractions often masked. Emma inhaled deeply, savoring the fleeting peace.
|
We will be using this string as the source, and will iteratively rephrase it using an LLM dozens of time to see how the string changes over iterations.
The Prompt
Rephrase the following text. Return only the rephrased text without any additional text, explanations or formatting:
<The Text>
|
The experiment has no objective. We will just try to iteratively rephrase the text a bunch of times and see how the final text looks, does it retain the original internet, structure, meaning, and any other observations we may stumble upon.
GPT-4o-mini
Running the experiment with GPT-4o-mini - The dice coefficients with each iteration..
Iteration
|
Dice-Coefficient
|
0
|
1
|
1
|
0.8172
|
2
|
0.8467
|
3
|
0.8135
|
4
|
0.8344
|
5
|
0.8034
|
6
|
0.8349
|
7
|
0.8145
|
8
|
0.8062
|
9
|
0.8150
|
10
|
0.8246
|
Iterations
|
Text
|
Dice Coefficient
|
0
|
The sun dipped below the horizon, painting the sky in hues of orange and pink, while a cool breeze carried the scent of blooming jasmine. Emma sat on the weathered wooden bench in the park, watching children chase fireflies and listening to the distant murmur of the stream. The tranquility of the moment was broken only by the occasional chirping of crickets, a natural symphony that seemed to amplify the serenity of her surroundings. She closed her eyes briefly, allowing herself to be fully immersed in the embrace of nature, feeling the rough texture of the bench against her palms. A sense of calm washed over her, reminding her of childhood evenings spent in a similar park, where worries seemed like a distant concept. Now, as an adult, moments like these were rare treasures, brief escapes from the relentless pace of life. The simplicity of the evening, she thought, held a profound beauty that modern distractions often masked. Emma inhaled deeply, savoring the fleeting peace.
|
1
|
1
|
The sun sank beneath the horizon, tinting the sky with shades of orange and pink, while a refreshing breeze carried the fragrance of blooming jasmine. Emma sat on the weathered wooden bench in the park, observing children chasing fireflies and listening to the soft murmur of the stream nearby. The tranquility of the scene was occasionally interrupted by the distant chirping of crickets, a natural melody that seemed to enhance the calmness around her. She closed her eyes for a moment, allowing herself to be fully enveloped by nature, feeling the rough surface of the bench under her hands. A wave of serenity washed over her, evoking memories of childhood evenings spent in similar parks, where concerns felt far away. Now, as an adult, she cherished these rare moments as precious escapes from life's unyielding pace. The evening's simplicity, she mused, possessed a deep beauty often obscured by modern distractions. Emma took a deep breath, relishing the passing tranquility.
|
0.8170
|
10
|
The sun dipped below the horizon, casting orange and pink hues across the sky, while a soft breeze carried the sweet scent of blooming jasmine. Emma perched on a timeworn wooden bench in the park, watching children chase fireflies and listening to the gentle murmur of a nearby stream. The peaceful scene was occasionally broken by the distant chirping of crickets, adding a natural rhythm that seemed to enhance the tranquility around her. She closed her eyes for a moment, immersing herself in the outdoors and feeling the rough texture of the bench beneath her fingers. A wave of calm washed over her, bringing back memories of childhood evenings spent in similar parks, where worries felt far away. Now, as an adult, she cherished these rare moments as precious escapes from the unrelenting rush of life. Emma pondered how the evening's simplicity held a deep beauty often hidden by modern distractions. She breathed in deeply, relishing the fleeting peace.
|
0.8246
|
25
|
The sun sank below the horizon, casting the sky in hues of orange and pink, while a soft breeze wafted the sweet fragrance of blooming jasmine. Emma sat on an old wooden bench in the park, observing children chasing fireflies and listening to the calming sounds of a nearby stream. The tranquility was occasionally interrupted by the gentle chirping of crickets, adding a natural rhythm to the serene atmosphere. She closed her eyes for a moment, reconnecting with nature as her fingers brushed against the rough texture of the bench. A wave of calm enveloped her, bringing back memories of childhood evenings spent in similar parks, where worries seemed distant. Now, as an adult, she valued these rare moments as precious escapes from the ceaseless rush of life. Emma reflected on how the simplicity of the evening unveiled a profound beauty often obscured by the distractions of modern life. Breathing deeply, she cherished the passing tranquility.
|
0.8059
|
50
|
The sun sank beyond the horizon, coloring the sky in shades of orange and pink, while a soft breeze carried the sweet scent of blooming jasmine. Emma rested on a weathered wooden bench in the park, watching children chase fireflies and listening to the gentle flow of a nearby stream. Occasionally, the faint chirping of crickets interrupted the stillness, providing a natural rhythm to the serene scene. She briefly closed her eyes, reestablishing her connection with her surroundings as her fingers brushed against the rough texture of the bench. A wave of peace enveloped her, bringing to mind memories of childhood evenings spent in similar parks, where worries felt far away. Now, as an adult, she valued these moments as precious breaks from the relentless speed of everyday life. Emma contemplated how the simplicity of the evening uncovered a deep beauty often hidden by modern distractions. Taking a deep breath, she soaked in the fleeting tranquility.
|
0.7998
|
75
|
As the sun dipped below the horizon, it filled the sky with shades of orange and pink, accompanied by a soft breeze carrying the sweet fragrance of blooming jasmine. Emma settled onto an aged wooden bench in the park, observing children chasing fireflies and listening to the soothing sounds of a nearby stream. The serenity was occasionally interrupted by the gentle chirping of crickets, contributing a natural harmony to the peaceful setting. With her eyes softly shut, she immersed herself in the visual and auditory beauty around her, her fingers gliding over the bench's rough texture. A comforting tranquility enveloped her, bringing to mind memories of childhood evenings spent in similar parks, where anxieties seemed distant. Now that she was an adult, she treasured these moments as valuable reprieves from the unyielding pace of daily life. Emma pondered how the simplicity of the evening unveiled a profound beauty often obscured by the diversions of contemporary living. Taking a deep breath, she embraced the ephemeral tranquility.
|
0.7817
|
100
|
As the sun sank below the horizon, it painted the sky in hues of orange and pink, while a gentle breeze carried the sweet scent of blooming jasmine. Emma sat on an old wooden bench in the park, observing children joyfully chasing fireflies and listening to the soothing sounds of a nearby stream. The serenity was occasionally broken by the soft chirping of crickets, adding a natural cadence to the peaceful ambiance. With her eyes shut, she absorbed the visual and auditory beauty around her, her fingers brushing against the rough texture of the bench. A warm feeling of tranquility surrounded her, evoking memories of childhood evenings spent in similar parks, where concerns felt far away. Now, as an adult, she cherished these moments as valuable retreats from the hectic rhythm of everyday life. Emma reflected on how the evening's simplicity unveiled a deep beauty often hidden by the distractions of modern existence. Inhaling deeply, she embraced the momentary peace.
|
0.7610
|
Evaluation of the final text
Aspect
|
Metric
|
Evaluation Criteria
|
Performance
|
Clarity
|
Grammar/Fluency
|
No grammatical errors, fluency maintained.
|
Excellent. No grammatical errors, fluent prose.
|
|
Sentence Structure
|
Logical and clear sentence construction.
|
Excellent. Sentence structures remain clear.
|
Descriptive Language
|
Retention of Sensory Details
|
Are colors, sounds, and scents still vivid?
|
Good. Sensory details like colors, scents, and sounds are retained.
|
|
Specificity of Terms
|
Are original terms (e.g., "fireflies") intact?
|
Good. Specific terms like "fireflies" and "blooming jasmine" are preserved.
|
Emotion Retention
|
Conveyance of Mood
|
Does the text still evoke calm and nostalgia?
|
Excellent. Calm and nostalgic mood is well-preserved.
|
Narrative Coherence
|
Logical Flow of Ideas
|
Are sentences logically connected?
|
Excellent. The logical flow and connections are intact.
|
|
Focus on Original Topic
|
Is the story still about Emma in the park?
|
Excellent. The focus remains on Emma and her experience.
|
Meaning Preservation
|
Core Message Retained
|
Does simplicity and peace remain the theme?
|
Excellent. The core theme of simplicity and peace is preserved.
|
|
Reduction in Abstractions
|
Are phrases overly abstract or nonsensical?
|
Excellent. No unnecessary abstractions or nonsensical phrases.
|
Objective Changes
|
Word Count
|
Word count difference compared to original.
|
Minimal to no change. Original: 165, Rephrased: 165.
|
|
Unique Terms Lost
|
Number of unique words replaced or lost.
|
Moderate. Some unique terms slightly rephrased.
|
|
Repetition of Words/Phrases
|
Frequency of repeated terms.
|
Low. Minimal repetition introduced.
|
Creativity
|
Novel Expressions Introduced
|
Does the LLM introduce fitting rephrases?
|
Good. Introduced subtle rephrases like “valuable retreats” for variety.
|
|
Overgeneralization
|
Has text become vague or overly simplified?
|
Good. Text avoids overgeneralization but slightly smoother than original.
|
But Why? Why doesn't the similarity drop after each iteration, after the first?
What intrigues me is that the Dice Coefficient drops by 19% in the first iteration. Since each iteration is a fresh iteration, I was expecting it to drop at the same 19% on each iteration to eventually drop to close to zero by around 20th iteration.
But the actual drop after each iteration (after the first iteration) is close to zero! In fact, it actually the similarity to original text actually increases often, but over dozens of iterations, it drops on an average at the rate of only 0.05% per iteration.
To be doubly sure, I double checked my script to ensure
- I was comparing the text in each iteration with the original text and not the text from the previous iteration.
- Each iteration is a fresh initialization of the LLM call, with not an additional message on previous conversation.
So why does the Dice coefficient (ie, similarity with original text) drop significantly (15%-19%) on the first iteration, and very marginally (0.05%) later on with each iteration?
The observed behavior of the Dice Coefficient dropping significantly in the first iteration and then stabilizing over subsequent iterations is a fascinating phenomenon in text generation and iterative paraphrasing. This effect likely stems from how language models like GPT process and rephrase text.
In the first iteration, the model rephrases the original text with the most substantial stylistic and lexical variations, as it seeks to maximize the "paraphrasing effect" while retaining meaning. This initial rephrasing tends to introduce a shift in sentence structure, synonym usage, and narrative flow, which cumulatively accounts for the steep drop in the Dice Coefficient. However, after the initial rephrasing, the subsequent iterations have less "room to maneuver," as the rephrased text already represents an altered form of the original.
Think of this process as sculpting a statue: the first chisel strokes remove large chunks of marble, creating the rough shape. Subsequent chiseling refines details but doesn’t fundamentally alter the form. Similarly, in iterative paraphrasing, the first iteration involves broad linguistic changes, whereas later iterations introduce finer adjustments, resulting in smaller differences. This explains why the Dice Coefficient drop becomes progressively less significant with each iteration.
Another key factor is that language models aim to preserve semantic intent and clarity. After the first significant rephrasing, the model begins working with an already altered version of the text. Any additional paraphrasing tends to mirror the style and choices introduced in earlier iterations, leading to a stabilization of changes. This behavior reflects the model’s underlying optimization to generate coherent, meaningful text rather than diverging arbitrarily.
Finally, the diminishing Dice Coefficient drop can also be attributed to the nature of lexical similarity metrics. The initial changes impact surface-level features (e.g., word choice and sentence structure), which heavily influence the Dice Coefficient. As iterations progress, the changes increasingly affect subtler nuances like tone or phrasing, which don’t dramatically impact the metric. This creates a plateau effect, where even iterative changes do not result in significant additional divergence from the original text.
Based on my limited understanding of LLMs and some exploratory experiments like this, I believe that's what causes the Dice Coefficient to plateau. Of course, I could be mistaken. What are your thoughts?