Predicting Relationship Quality from Relationship Attributes in 2022

Authors

Affiliation

Jasjot Parmar

University of British Columbia

Jade Chen

University of British Columbia

Eugene Tse

University of British Columbia

Johnson Leung

University of British Columbia

1 Summary

In this project, we used a dataset based on survey responses to common relationship questions to develop a logistic regression model to classify relationships into one of five relationship quality statuses: excellent, good, fair, poor, or very poor. The model developed below uses common relationship-based features such as whether the subject is married or not and how many children the subject has in the relationship.

2 Introduction

Relationship quality classification is an important topic for couples to be aware of, particularly when trying to maximize the amount of satisfaction each partner receives from the relationship. Accurate relationship quality classification allows couples in relationships to assess the quality of their relationship to develop better targeted strategies to improve or maintain that relationship quality. It is often difficult for couples to estimate the perceived quality of their relationship, so we investigate below if our machine learning model can correctly classify the quality of a relationship, based on common relationship attributes. This analysis asks: How well do relationship characteristics such as age, income category, marital status, relationship duration, and number of children predict relationship quality?

The dataset contains 1293 survey responses of common relationship characteristics such as the income category of the respondent and their estimated relationship quality (our target to predict). This dataset (Diverse Data Hub (n.d.)) is originally based on the How Couples Meet and Stay Together survey (Rosenfeld, Thomas, and Hausen (2023)). We accessed the CSV version of this dataset directly from CRAN (R Core Team (n.d.)). The README.md file contains separate conda lock files that explain how to run the environment if you want to run the code. We used principles from the Reproducible and Trustworthy Workflows for Data Science textbook (Timbers et al. (n.d.)) on conda lock files to manage environments.

In our analysis below, we investigate whether a logistic regression model can correctly classify relationship attributes into one of five relationship quality statuses: excellent, good, fair, poor, or very poor.

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
Pointblank Validation
2025-12-13\|23:42:47 Pandas
#4CA64C	1	rows_distinct()	ALL COLUMNS	—	✓	1293	1293 1.00	0 0.00	—	—	—	—
2025-12-13 23:42:47 UTC< 1 s2025-12-13 23:42:47 UTC

3 EDA

We start by initially confirming our data was read in to a pandas DataFrame object, as shown in Table 1.

Table 1: Peek of demographic and relationship variables included in this analysis.

	subject_age	subject_education	subject_sex	subject_ethnicity	subject_income_category	subject_employment_status	same_sex_couple	married	sex_frequency	flirts_with_partner	...	relationship_duration	children	rel_change_during_pandemic	inc_change_during_pandemic	subject_had_covid	partner_had_covid	subject_vaccinated	partner_vaccinated	agree_covid_approach	relationship_quality
0	53.0	high_school_grad	female	white	35k_40k	working_paid_employee	no	not_married	once_or_twice_a_week	a_few_times_a_week	...	1.500000	2.0	better_than_before	no_change	no	yes	not_vaccinated	not_vaccinated	completely_agree	excellent
1	72.0	some_college	female	white	75k_85k	working_paid_employee	no	married	once_a_month_or_less	never	...	57.416668	1.0	no_change	worse	no	no	fully_vaccinated_and_booster	fully_vaccinated_and_booster	mostly_agree	good
2	43.0	associate_degree	male	white	75k_85k	working_paid_employee	no	married	once_or_twice_a_week	a_few_times_a_week	...	22.333334	5.0	no_change	worse	no	no	fully_vaccinated_and_booster	fully_vaccinated_and_booster	completely_agree	excellent
3	64.0	some_college	male	white	75k_85k	working_paid_employee	no	married	once_or_twice_a_week	1_to_3_times_a_month	...	28.250000	2.0	no_change	no_change	no	no	fully_vaccinated_and_booster	fully_vaccinated_and_booster	completely_agree	good
4	60.0	high_school_grad	female	black	75k_85k	working_paid_employee	no	married	once_or_twice_a_week	a_few_times_a_week	...	38.916668	3.0	better_than_before	no_change	no	no	not_vaccinated	partially_vaccinated	completely_agree	excellent

5 rows × 21 columns

We can see from the distribution of the Relationship Quality categorical variable, that the dataset contains imbalanced classes, with a very large number of respondents reporting excellent or good relationship quality and a much lower number of respondents reporting fair, poor, and very poor relationship quality.

Figure 1: Relationship Quality Score Distribution

Our numeric predictor / input features show a high correlation between subject age and relationship duration with a \(\rho\) value of 0.736, which means that as the subject’s age increases, their relationship duration increases. Subject age and (number of) children show a weak negative correlation of -0.326, indicating that as subject age increases, the reported number of children slightly decreases.

The distribution of income categories show that we have a left skewed distribution, with most respondents making over 50k per year. For respondents who earn >= 50k / year, income distribution between 50k and 250k+ seems to be roughly uniformly distributed, showing that there is a pretty even spread of incomes between respondents as income passes 50k.

We then split up the data into the relationship features that we want to predict relationship quality with. Input features include: Subject Age, Subject Income Category, Marital Status, Relationship Duration, and Number of Children, before splitting the data into train and test splits. We then conduct simple data cleaning through changing relevant numeric features such as age and number of children into integers, before reordering the income category feature to be ordered in ascending order by income.

4 Methods

Numeric features have different scales, with age having much larger values than relationship duration and number of children. Therefore, Standard Scaler is applied to numeric features so all numeric features contribute equally to the logistic regression model. Ordinal features such as subject income category are converted to ordinal categories, as their categories have an order based on the income of the subject. Categorical features such as marital status are one hot encoded, resulting in one column indicating martial status or not (0 / 1). Each transformation is wrapped in a column transformer.

A scikit-learn pipeline is used to preprocess and train the model on the training data in one step. The pipeline first applies the preprocessor above to the training set to standardize numeric features and one hot encodes categorical features, before training the Logistic Regression model. The Logistic Regression model addresses our above issue regarding the class imbalance in relationship quality by giving the minority class a bigger penalty, so the model pays more attention to that observation

5 Results

To see how our model did on predicting relationship quality based on the training data, we plot a confusion matrix Figure 4. We see that for respondents with excellent relationship quality, the model only correctly predicts 49.1% of relationships that have excellent relationship quality correctly (Recall = 0.491). Of all relationships that are predicted to have excellent relationship quality,the model correctly predicts 58.6% of them (Precision = 0.586).

For respondents with Good relationship quality, the model only correctly predicts 2.0% of them. For all relationships that are predicted to have good relationship quality, the model correctly predicts 40.0% of them.

For respondents with Fair relationship quality, the model only correctly predicts 28.6% of them. Of all relationships that are predicted to have Fair relationship quality, the model correctly predicts 13.6% of them.

For respondents with Poor relationship quality, the model correctly predicts 22.7% of them. For all relationships that are predicted to have Poor relationship quality, the model only correctly predicts 3.0% of them.

For respondents with Very Poor relationship quality, the model correctly predicts 100.0% of them (there are very few observations with very poor relationship quality so this prediction should be used carefully). Out of all relationships that are predicted to have very poor relationship quality, the model correctly predicts only 0.9% of them.

Figure 4: Confusion Matrix Training Data

To see how our model did on predicting relationship quality based on the testing set Figure 5, we plot a confusion matrix for predictions on the test set. We see that for respondents with excellent relationship quality, the model only correctly predicts 51.1% of relationships that have excellent relationship quality correctly (Recall = 0.511). Of all relationships that are predicted to have excellent relationship quality,the model correctly predicts 54.3% of them (Precision = 0.543).

For respondents with Good relationship quality, the model only correctly predicts 0.0% of them. For all relationships that are predicted to have good relationship quality, the model correctly predicts 0.0% of them.

For respondents with Fair relationship quality, the model only correctly predicts 15.8% of them. Of all relationships that are predicted to have Fair relationship quality, the model correctly predicts 10.3% of them.

For respondents with Poor relationship quality, the model correctly predicts 0.0% of them, as there is only 1 poor relationship quality observation in the test set. For all relationships that are predicted to have Poor relationship quality, the model only correctly predicts 0.0% of them, since the model predicted an observation to have Poor relationship quality 0 times.

For respondents with Very Poor relationship quality, the model correctly predicts 0.0% of them. Out of all relationships that are predicted to have very poor relationship quality, the model only correctly predicts 0.0% of them.

Our micro-averaged ROC curve (Figure 6) above shows an AUC of 0.621. This means that our model is not the strongest at correctly predicting relationship quality based on the input relationship features we specified above. The ROC curve shows that our model is only slightly better than randomly guessing the relationship quality class, meaning that our model’s accuracy is pretty weak.

6 Discussion

Our findings above indicate a poor overall accuracy across each class on the testing set, with an overall test accuracy from the confusion matrix of 28.2% meaning that the model is poor at predicting the correct relationship quality based on features such as age, income_category, marital status, relationship duration, and number of children. We also see that the precision and recall of each relationship quality class is quite low in the training and testing set. Our poor accuracy, precision, and recall for all relationship quality classes tells us that with a Logistic Regression model, the features we included (age, income category, marital status, relationship duration, and number of children) do not predict relationship quality well.

This is generally what we expected to find because the features we chose are mostly external or demographic traits about the relationship that one can argue, do not define the emotional or personal status of a relationship. Since our features do not include deeper characteristics that could matter more for relationship quality compared to demographic features like age, it makes sense our Logistic Regression model is performing poorly.

These findings could change how people in relationships and researchers think about what defines relationship quality. Since our above results show that demographic or external features like age, number of children, income category, and relationship duration were not good predictors of relationship quality with our Logistic Regression model, people could place less focus on these relationship features when gauging the relationship quality of their own relationship. The results could lead to people in relationships placing more importance on emotional or behavioural metrics in relationships instead like how often partners openly communicate about problems. These emotional and behavioural relationship features could be much better predictors of relationship quality. These results could ultimately impact how relationship quality is assessed, by changing the focus to deeper personal relationship dynamics instead of surface level demographic features.

Accuracy was used as an initial, intuitive measure of how often the model correctly classified relationship quality, providing a simple baseline for overall performance. However, because the classes were imbalanced, accuracy alone could be misleading, as high values may be driven primarily by correct predictions of the dominant categories. Therefore, ROC AUC was also reported to evaluate the model’s ability to discriminate between classes across decision thresholds using predicted probabilities, offering a more informative and imbalance-robust assessment of model performance.

Another issue we did not address was hyperparameter optimization. For logistic regression, tuning could focus on the regularization strength (C) and the type of penalty (L1, L2, or elastic net). Adjusting C controls the bias–variance trade-off: smaller values impose stronger regularization and can help prevent overfitting, particularly given correlated predictors such as age and relationship duration. If feature selection or sparsity were desired, an L1 or elastic-net penalty could be explored. These hyperparameters could be tuned using cross-validation, ideally with a class-weighted or balanced scoring metric (e.g., macro F1 or balanced accuracy).

More generally, cross-validated hyperparameter search (e.g., grid search or randomized search) would allow systematic comparison of models under consistent evaluation criteria. To avoid optimistic bias, hyperparameter tuning should be performed only on the training data, with the test set held out strictly for final evaluation. Given the class imbalance, stratified cross-validation would be especially important to preserve class proportions within folds.

Another issue that could have been incorporated into our analysis is the reweighting of class imbalance class_weight="balanced", which reweights the loss function so that errors on minority classes carry more importance. We see that our responding classes are skewed towards categories ‘excellent’ and/or ‘good’ and would improve prediction performance. This approach is simple, does not alter the data distribution, and is often preferable to resampling for smaller datasets like yours, as it reduces the risk of overfitting while directly addressing imbalance at training time.

We had only tested one model - logistic regression. We could have also used an SVM with a linear or kernelized decision boundary and it could be effective, particularly if the relationship between predictors and quality is not strictly linear. Class weights can be incorporated to address imbalance. However, SVMs are less interpretable and can be sensitive to feature scaling and hyperparameter choices, making them harder to explain in a social-science context.

The future questions our above results could lead to are:

What emotional or personal relationship features (that relate to both partners) best predict relationship quality?
Could using a dataset that contains data for relationship metrics from both partners in the relationship improve accuracy of our above Logistic Regression model?
How much better (or worse) would non-linear models such as decision trees perform on the same above dataset?
Which relationship features that we used above contribute most to predicting relationship quality?

7 References

Diverse Data Hub. n.d. “How Couples Meet and Stay Together.” https://diverse-data-hub.github.io/website_files/description_pages/hcmst.html.

R Core Team. n.d. “Hcmst.csv [Data Set].” CRAN. https://cran.r-project.org/incoming/UL/diversedata/data-clean/hcmst.csv.

Rosenfeld, Michael J., Reuben J. Thomas, and Sonia Hausen. 2023. “How Couples Meet and Stay Together 2017–2020–2022 Combined Dataset [Data Set].” Stanford University Libraries. https://data.stanford.edu/hcmst2017.

Timbers, Tiffany A., Joel Ostblom, Florencia D’Andrea, Rodolfo Lourenzutti, and Daniel Chen. n.d. “Conda Lock: Reproducible Lock Files for Conda Environments.” In Reproducible and Trustworthy Workflows for Data Science. UBC Master of Data Science. https://ubc-dsci.github.io/reproducible-and-trustworthy-workflows-for-data-science/lectures/090-conda-lock.html.