• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Multi-Label NLP

 
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all.

I am trying to train a simple CNN on this dataset, which is multi-class and natural language:
https://www.kaggle.com/badalgupta/stack-overflow-tag-prediction/data

I am using word embeddings from FastText.
I have converted the words to index numbers in my vocab (from FastText), then used a (non-trainable) TensorFlow Embedding layer to convert the index numbers to word vectors using the pre-trained FastText embeddings.
The labels are multi-hot encoded (there are 100 labels).
The output activation is sigmoid and the loss is binary crossentropy, as that is what many websites recommend.
I have just split the train/validation/test sets randomly for now, so they do not take into account of the labels.

When I train the CNN, the "accuracy" gets to 0.99 very quickly and the loss is low.
At the end of each epoch the precision, recall and F1 scores gradually improve, then plateau at around 0.35.

The predictions are poor, with the maximum probability from the sigmoid output often as low as 6%, so the network does not seem to be properly trained. With a high accuracy and low loss, any training will be very slow anyway.

As there is a fairly large skew in the number of times each label has been allocated, I have used the class_weight parameter when fitting, to try to assist the training.

Does anyone have the experience to point to where I should start my investigation? There are so many things to twiddle!
Perhaps the transfer-learning expert will have some ideas.


Thanks
Don.
 
Don Horrell
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just to clarify, this is a multi-LABEL problem, not multi-class.
Apologies for my mistake.


Don.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic