Five Takeaways on the State of Natural Language Processing
Thoughts following the 2015 "Text By The Bay" Conference
1. word2vec and doc2vec appear to be pervasive
Mikolov et al.’s work on embedding words as real-numbered vectors using a skip-gram, negative-sampling model (word2vec code) was mentioned in nearly every talk I attended. Either companies are using various word2vec implementations directly or they are building diffs off of the basic framework. Trained on large corpora, the vector representations encode concepts in a large dimensional space (usually 200-300 dim). Beyond the “king - man = queen - woman” analogy party trick, such embeddings are finding real-world applications throughout NLP. For example, Mike Tamir ("Classifying Text without (many) Labels"; slide shown below), discussed how he is using the average representation over entire docs as features for text classification, out-performing other bag-of-words (BoW) techniques by a large measure with heavily imbalanced classes. Marek Kolodziej ("Unsupervised NLP Tutorial using Apache Spark”) gave a wonderful talk about the long history of concept embeddings along with technical details of most of the salient papers. Chris Moody ("A Word is Worth a Thousand Vectors”) showed how word2vec was being used in conjunction with topic modeling for improved recommendation over standard cohort analysis. He also ended his talk about how word2vec can be extended beyond NLP to machine translation and graph analysis.
2. Production-grade NLP is Spreading in Industry
3. Open tools are being used but probably not compensated in the way they should
4. "RNNs for X"
5. A Big Problem: Massive Gender Imbalance
- Tweet
- 82
-
-
Predictive Analytics
Thanks for the summary, Josh. In addition to the women you mentioned, Diana Hu of Verizon was a speaker, Katelyn Lyster was a panelist, and we had dozens of women in attendance. We've accepted all talks submitted by women to our CFP that was open for months, and asked for submissions on meetup lists numbering tens of thousands. We've reached out to numerous women in technology. We'll be extremely grateful for any outreach and help on this you can provide to sftext.org every month and the next Text By the Bay.
Alexy-
Good to know
about the outreach that the organizers undertook. I don't see Diana and Katelyn
on the speaker roster (http://text.bythebay.io/schedule.html), so it appears
they must have been added after the schedule was published.
Indeed the schedule has a correction and the
closing panel was extended based on feedback. There are some good conference
photos at @chiefscientist, e.g.
https://twitter.com/h2oai/status/592118127766347777
https://twitter.com/ChiefScientist/status/592107891449933824
https://twitter.com/ChiefScientist/status/592096177660657664
There will be a gallery that will show the whole
diversity of our conference. Generally, it helps to spend a lot of time with the
community to get a feeling for it. We had an amazing feedback from the community
and will work on growing it in all aspects, including
diversity.
Bharat Shrinevas
Good summary Joshua. Overall thought Alexy and the organizers did an excellent job. It's takes a lot of effort to organize a conference of this scale with quality across both days. Kudos to them. I am sure the organizers will fix some of the shortcomings highlighted next year and make it much better.
Pat Z
I am uncomfortable with your limited view on and
the assumption that the majority were male. It is highly likely that some would
have been genderfluid/genderqueer or genderless.
Please think before you
post next time
mali
Where did he write "male" or specifically that there was male over-representaiton? I don't see the word used once. The section was about female under-representation, not male over-representation. You may want to take your own advice.
mali
Where did he write the word "male" at all, let alone claim that the majority were male? I don't see it used once. I do see a section about female under-representation, very specifically. You may want to take your own advice before hitting post.
Alexy Khrabrov