Dream Analysis in Python - help needed

**cedwards105** · 06-26-2023 01:43 AM

Hello! I have been trying to write some code in Python that would analyze my dream journal, recognize dreamsigns, and do two things (so far):

(1) Make a plot of dreamsign frequency per week, and
(2) Make a predictive model that would predict dreamsigns that might appear in the next dream

This way I have a visual output of where I have been, and a prediction as to where I'm headed.

I have been working with ChatGPT, and have some code for making a predictive model. What I am having trouble with now is simply making an XY graph of dreamsign frequency, which is probably a very simple fix for anyone out there who knows coding. Here is my code at the moment for this part:

Code:

import csv
import docx
import pandas as pd
from collections import Counter
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import matplotlib.pyplot as plt

# Function to remove stopwords and punctuation from the text
def preprocess_text(text):
    stop_words = set(stopwords.words('english'))
    tokens = word_tokenize(text.lower())
    filtered_tokens = [token for token in tokens if token.isalpha() and token not in stop_words]
    return filtered_tokens

filename = 'Dream_Journal.docx'  # Input: Word document filename
doc = docx.Document(filename)

# Combine all text in the document
text = ' '.join([paragraph.text for paragraph in doc.paragraphs])

# Preprocess the text by removing stopwords and punctuation
filtered_words = preprocess_text(text)

# Count the occurrences of each word
word_counter = Counter(filtered_words)

# Get the most common words as keywords
num_keywords = 200  # Specify the number of keywords you want
keywords = [word for word, _ in word_counter.most_common(num_keywords)]

data = []
categories = []

while True:
    date = input("Enter a date (YYYY-MM-DD) or press Enter to exit: ")
    if not date:
        break
    journal_entry = input("Enter a journal entry: ")
    row_keywords = []

    for keyword in keywords:
        if keyword in journal_entry:
            row_keywords.append(keyword)
            if keyword not in categories:
                categories.append(keyword)

    data.append([date, row_keywords])

csv_filename = 'data.csv'

with open(csv_filename, 'a', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(data)

print(f'CSV file "{csv_filename}" created successfully.')

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(csv_filename)

# Convert 'date' column to datetime type
df['date'] = pd.to_datetime(df['date'])

# Group by week and calculate the frequency of each keyword
df['week'] = df['date'].dt.to_period('W')
df_grouped = df.groupby(['week']).apply(lambda x: x['categories'].sum()).apply(Counter)

# Create the XY plot
plt.figure(figsize=(12, 6))
for keyword in categories:  # Plot based on the list of categories
    frequencies = [counter.get(keyword, 0) for counter in df_grouped]
    plt.plot(df_grouped.index.astype(str), frequencies, label=keyword)

plt.xlabel('Week')
plt.ylabel('Frequency')
plt.title('Keyword Frequency Over Weeks')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

When I run this, it plots something at 0 frequency and nothing else.