Even in the “age of AI”, Neural Networks, Transformers, etc. more “traditional” and “simple” (e.g. statistical) methods like Naive Bayes and kNN are still used and sometimes have very close or better performance. These simple methods are less impacted by some of the issues with classification tasks, or downright ignore them and do not have to deal with such issues (though they obviously have their limits and downsides as well).
Some issues with classification, in particular models using Deep Neural Networks and similar, include:
Usually, the bigger the models, the more data they need (though other variables have to be taken into account, such as the quality and variability of the data). Depending on the application, there might not be enough data to learn and let the models converge.
The data might need to be labelled, which is not always available or doable, especially at the scale of the data needed (e.g. tens of thousand of images, or languages that are not much represented on the internet).
Black box: it is difficult to really have an idea of what is going on in details inside the model. There are ways to analyse it, but considering the tendency to have ever bigger and bigger models with millions of parameters, this is extremely difficult to do at such scales.
Usually, the models are EXPENSIVE. Both in terms of (labelled?) data needed, which in itself can be expensive to generate, but as well in terms of infrastructure, facilities, computing power and time, man-hours of preparation and fine-tuning, etc.
This often is also linked with pollution, and extensive use of ressources, for mere 0.01%s of improvements at specific tasks, which often have no real application value.
Scalability issues come in many forms. Modification to the classes usually leads to a full re-training of the model (you recognize dogs and cats, now you want to add giraffes, so you have to re-train from scratch with a new dataset that includes giraffes, and that might need different fine-tuning). This might also include needing a bigger facility and increasing computing power, amplifying related problems.
Reproducibility: Because only those that can have a lot of computing power can use computationally-heavy models, it is actually difficult, if not outright impossible, to reproduce the results to validate the claims of the authors, AND it means there'll be little actual impact as this will not be able to be used in real life applications, unless the authors provide a way to access the trained model (e.g. ChatGPT website and APIs).