We aim to build a baseline text classification model that predicts the sentiment toward a given entity in a tweet.
The TF-IDF score for a \(term(t)\) in \(document(d)\) is given by:
\(\mathbf{x}\) : The input feature vector (TF-IDF transformed)
\(\mathbf{w}_k\) : The weight vector for class \(k\)
\(K\) : The number of classes (4 in our case)
3.2 Baseline: Logistic Regression with TF-IDF
Logistic Regression for Multi-class Classification uses a one-vs-rest strategy by default in Scikit-learn, where a separate binary classifier is trained for each class. The model outputs probabilities for each class, and the class with the highest probability is selected as the prediction.
3.2.1 Modeling Pipeline Summary
We build a baseline sentiment classification pipeline using Scikit-learn, consisting of:
3.2.1.1 TF-IDF Vectorization
Remove English stopwords
Limit vocabulary to the top 10,000 terms
Use unigrams and bigrams (1–2 gram)
3.2.1.2 Logistic Regression Classifier
A linear model used for multi-class classification
max_iter = 1000 ensures convergence for large feature sets
3.2.1.3 Evaluation
Dataset split: 80/20 train-test
Evaluation via classification_report (Precision, Recall, F1-score for each class)
Here is the evaluation result for the Logistic Regression (TF-IDF) baseline model:
Class
Precision
Recall
F1-score
Support
Irrelevant
0.75
0.60
0.67
2568
Negative
0.77
0.81
0.79
4463
Neutral
0.75
0.70
0.72
3610
Positive
0.70
0.79
0.75
4124
Accuracy
0.74
14765
Macro avg
0.74
0.73
0.73
14765
Weighted avg
0.74
0.74
0.74
14765
3.3 Save Trained Model for Interpretation
After training, we save the entire pipeline (including the TF-IDF vectorizer and the logistic regression classifier) using joblib. This serialized model will be used later for interpretability analysis with tools like LIME or SHAP.
Code
import joblib# Save pipelinejoblib.dump(model, "scripts/baseline_pipeline.pkl")