[Teachable NLP] New Pride and Prejudice with GPT-2

Teachable NLP : The Finetuned GPT-2 Model with Pride and Prejudice
TabTab : TabTab!


[Spoiler]
p&p

The ’ Pride & Prejudice ’ is one of the most-loved romantic novels. In the early 19th century, Mrs.Elizabeth met Mr. Darcy at a ball and she immediately felt he was arrogant. This prejudice is developed and caused them to fall out. However, after Mr.Darcy and Ms. Elizabeth managed to overcome his pride and her prejudice, their relationship was eventually determined.

If the outcome of the incident had been different, would the love between the two have come true?

If…

  • They love at first sight?
  • Mrs. Elizabeth doesn’t believe Mr.Wickham who slanders Mr. Darcy?
  • Mr. Darcy doesn’t send a letter containing the truth to convice her?
  • Mr. Wickham and Mrs. Lydia (Elizabeth’s younger sister) don’t run away in the nighttime?
  • etc…

Thanks to Teachable-NLP, I can fine-tune the model without any code. You can use text file from ‘Pride & Prejudice’ to train the NLP model and test the model. Feel free to try here!

A. Acquiring the Data

I got the original ‘Pride & Prejudice’ from Project Gutenburg. It is free to use for machine learning or editing because the copyright of the text is in the Public domain.

B. Preprocessing the data

In the preprocessing the data, I removed the 1) headings and 2) CRLF. And I kept double quotation because double quotation is usual in the novel.

file_name = "Pride and Prejudice.txt"
f = open(file_name,"rt",encoding='utf-8')
file = f.readlines()
f.close()
sentences = []
start = 0
for line in file:
		# To know the beginning of the volumes
    if line == ("PRIDE & PREJUDICE.\n"):
        start = 1
        continue
    elif start == 0:
        continue
		# Remove CHAPTER heading
    elif line.startswith("CHAPTER"):
        continue
		# Remove CRLF
    elif line == "\n":
        continue
		# To know the end of CHAPTER
    elif line.startswith("END OF"):
        start = 0
        continue
		# To know the end of data
    elif line == "       *       *       *       *       *":
        break
    # To remove symbols
    line = line.replace("_", "")
    line = line.replace("--", " ")
		# To remove unnecessary space
    line = line.strip()
		# After remove CRLF, create a space between the sentences
    line = line + " "
    sentences.append(line)
training_data = ''.join(sentences)
training_file = open('preprocess_pp.txt',"w")
training_file.write(training_data)
training_file.close()

C. Teachable NLP

I fine-tuned with GPT-2 small pre-trained with epochs 5. After training GPT-2 in Teachable -NLP, please click the Test your model . It shows you TabTab linked to my own fine-tuned model. In TabTab, you can try ‘Pride & Prejudice’

I tested the model by writing about the case of Mrs. Elizabeth falling in love with Mr. Darcy right after dancing at the ball.

Will the romance ever succeed?

There are many different ways to rewrite Pride & Prejudice with your creative thoughts! I wonder how you’d change the story. Please share your own story in the forum with your NLP model by Clicking the See what your friends made at the bottom left side of TabTab.

1 Like