In a binary classification modeling project, if there were 7…

Written by Anonymous on May 12, 2026 in Uncategorized with no comments.

Questions

In а binаry clаssificatiоn mоdeling prоject, if there were 70,000 data points in the data set, 3,500 in Class A and 66,500 in Class B, and the rarer class, Class A, is the more interesting one to detect. How should the dataset be split (to training and testing sets) so to avoid under-sampling issue? You are told to apply 60/40 rule between split data to training data and testing data. Fill in the blanks: (Hint, in order to deal with the under-sampling issue in the dataset, one of the classes would not be maintaining 60/40 rule after the data split is manipulated manually.) Data points from Class A Data points from Class B Training [trainA] [trainB] Testing [testA] [testB]

Comments are closed.