How to use less Liberia LR sugar data to automatically classify text and sky, and at the same time, the accuracy is higher than the original method

Huaqiu PCB

High and reliableLiberians EscortMultilayer board manufacturer

Huaqiu SMT

Highly reliable one-stop PCBA smart manufacturer

Huaqiu Mall

Self-operated spot electronic components Device Mall

PCB Layout

High multi-layer, high-density product design

Steel Mesh ManufacturingLiberia Sugar Daddy

Focus on high-quality steel mesh manufacturing

BOM ordering

A company specializing in research One-stop purchasing solution

Huaqiu DFM

One-click analysis of hidden design risks

Huaqiu certification

Certification testing is beyond doubt

Editor’s note: This article The article is written by data scientist Jeremy Howard and natural language processing expert Sebastian Ruder, and the purpose is to help Liberians Escort help veterans and laypeople better Clear about their new paper. The paper shows how to automatically classify astronomy using less data while still being more accurate than the original method. This article will explain in simple terms natural language processing, literary classes, migration learning, language modeling, and how their methods combine these concepts. If you have ever been interested in NLPYou are very familiar with in-depth learning. You can directly enter LR Escorts into the project homepage.

Introduction

May 14 , we published the paper Universal Language Model Fine-tuning for Text Classification (ULMFiT), a pre-trained model and open sourced it in Python. The paper has been peer reviewed and will be presented at ACLiberians EscortL 2018. The link below provides an in-depth teaching video of the paper method, as well as the Python modules used, as well as training models and construction A script based on your own model.

This model significantly improves the efficiency of the Wentian profession, and at the same time , code and training models will allow each user to better solve the following problems in this new way:

Find documents related to a legal case;

Identify spam, malicious comments or bots People respond to moderators;

Classify positive and negative evaluations of products;

Classify political bias of articles;

Others

ULMFiT requires fewer numbers than other methods

So, what changes does this new technology bring? First, let us understand what the main part of the summary Liberians Escort said, and then we will expand on this in other parts of the article. What does it mean:

Transfer learning has brought huge changes to computer vision, but existing NLP technology still needs to improve the model for specific tasks and train from scratch. We propose an effective transfer learning method that can be applied to any task in the NLP field, and the techniques we propose are critical for changing language models. Our approach is better than existing ones in six civil and military occupationsThe techniques must be excellent. In addition, this method only uses 100 labeled samples for training, and the ultimate performance is achieved from scratch, with LR EscortsModel performance with tens of thousands of training data.

NLP, deep learning and classification

Natural language processing is a special task in the field of computer science and artificial intelligence. As the name implies, it is to use computers to process the languages in the world. Natural languages refer to the words we use to communicate every day, such as English or Chinese, as opposed to specialized languages (computer code or musical notes). The application scope of NLP LR Escorts is very wide, such as search, personal assistant, summary and so on. In general, natural language processing is a very challenging task because the computer code written is difficult to express the different emotions and nuances of the language and lacks flexibility. Maybe you have experienced dealing with NLP in your life, such as making a call with an active response moderator robot, or talking to Siri, but the experience is not smooth.

In the past few years, we have begun to see deep learning go beyond traditional computers, with great success in the field of NLPLiberia Sugar result. Different from the previous need to define a series of fixed rules by the program, deep learning uses Liberians Sugardaddy to directly learn a wealth of non-linear rules from the data. Linear relationships are processed by neural networks. Of course, the most obvious achievement of deep learning is still in the field of computer vision (CV). We can feel its rapid progress in the previous ImageNet image classification competition.

Deep learning has also achieved many successes in the field of NLP. For example, the automatic translation reported by the “New York Times” has been used in many applications. These Liberia Sugar successful NLP tasks have a common feature, that is, they all have a large amount of labeled data available when training the model. However, until now, these applications have only been available on models that can collect large tagged data sets, while also requiring clusters of computers capable of long-term computation.

The most challenging problem of deep learning in the NLP field is exactly the most successful problem in the CV field: classification. This refers to classifying arbitrary items into a group, such as classifying files or images into a dog or cat dataset, or determining whether they are positive or negative LR Escorts‘s and more. Many problems in practice can be regarded as classification problems, which is why the success of deep learning classification on ImageNet has spawned various related commercial applications. In the field of NLP, current technology can make “identification” very well. For example, if you want to know whether a movie review is positive or negative, what you need to do is “emotional analysis.” But as the sentiment of the article becomes more and more ambiguous, the model becomes difficult to judge because there is not enough label data to learn from.

Migration learning

Our purpose is to solve these two problems:

In NLP problems, what should we do when we do not have large-scale data and computing resources?

Make the classification of NLP simple

Research participants (Jeremy HLR Escortsoward and Sebastian Ruder) The field we are engaged in can just solve this problem, that is, migration learning. Transfer learning refers to using a model that solves a specific problem (such as classifying ImageNet images) as a basis to solve similar problems. A common approach is to fine-tune the original model. For example, Jeremy Howard has migrated the above classification model to CT image classification to detect whether there is cancer. Because the adjusted model does not need to be learned from scratch, it can achieve higher accuracy than a model with less data and shorter computation time.

For many years, simple transfer learning using only a single weight layer has been very popular, such as Google’s word2vec embedding. However, in reality complete neural networks include many layers, so applying transfer learning to only a single layer only solves superficial problems.

The point is, if we want to solve NLP problems, where should we migrate our learning? This problem has troubled Jeremy Howard for a long time, but when his friend Stephen Merity announced the development of the AWD LSTM language model, this was a significant improvement in language modeling. A language model is an NLP model that can predict what the next word in a sentence will be. For example, the phone’s built-in language model can guess what word you will type next when sending a message. The reason why this result is very important is that if a language model wants to correctly predict what you are going to say next, it must have a lot of knowledge and a very comprehensive understanding of syntax, semantics and other elements of natural language. clear. We also haveLiberi when browsing or categorizing textsa SugarThis kind of talent, but we are not aware of it.

We found that applying this method to migration learning helps to become a universal method for NLP migration learning:

This method works regardless of file size, number of numbers, and tag type

It has only one structure and training process

It does not need to customize special engineering and pre-processing

It does not require additional related files or tags

Start working

ULMFiT’s high-level method (taking IMDb as an example)

This method has been tried before Liberia Sugar Daddy, but in order to achieve satisfactory performance, millions of texts are required. We found that by adjusting the language model, we can achieve better results. In particular, we found that the model can adapt better to new data sets if the learning rate of the model is carefully controlled and the model is pre-trained on new materials to ensure that it does not forget the intrinsic events it has previously learned. Excitingly, we found that models can learn better with limited samples. On a dataset containing two different astronomy categories, we found that training our model on 100 examples achieved the same results as training it from scratch on 10,000 iconic examples.

Another important feature is that we can use any corpus that is large enough and common to build a universal language model, so that it can be adjusted for any purpose. We decided to do this using Stephen Merity’s WikiText 103 dataset, Liberia Sugar which contains a processed subset of the English Wikipedia.

Many studies in the field of NLP are in the environment around English. If Liberians Sugardaddy trains the model in a non-English language, This will bring about a series of difficulties. Typically, there are very few public non-English language data sets, and if you want to train a literary model for Thai, you have to collect the data yourself. Collecting non-English text data means you need to annotate it yourself or find annotators, since crowdfunding services like Amazon’s Mechanical Turk often only have English annotators.

With ULMFiT, we can practice English very easilyThe non-lingual literary and astronomy class model currently supports 301 languages. To make this task easier, we will release a model zoo in the future with built-in pre-trained models in various languages.

The future of ULMFiT

Liberians Sugardaddy We have proven that this technology has different tasks in the same configuration The Chinese performance was very good. In addition to the astronomy category, we hope that ULMFiT can solve other important NLP problems in the future, such as sequence labeling or natural language generation.

The success of transfer learning in the computer vision field Liberia Sugar and the pre-trained ImageNet model has been transferred to the NLP field. Many entrepreneurs, scientists, and engineers are currently using modified ImageNet models to solve important visual problems. Now that this tool is available for language processing, we hope to see more related applications in this field.

Although we have shown the latest progress in the field of literature and astronomy, a lot of effort is still needed in order for our NLP migration learning to achieve its highest level of utility. There are many important paper analyzes in the field of computer vision, which provide in-depth analysis of the results of transfer learning in this field. Yosinski et al. have tried to answer the question: “How are features in deep LR Escorts neural networks transferable?”, while Huh studied “Why ImageNet is suitable for transfer learning”. Yosinski even created a rich visual LR Escorts toolkit to help participants better understand the features in their computer vision models. If you solve a new problem using ULMFiT on a new data set, please share your feedback with friends in the forum!

Original title: The universal language model ULMFiT created using transfer learning has reached the best level in the field of astronomy

Source of the article: [Microelectronic signal: jqr_AI, WeChat public account: Lunzhi] Welcome to add tracking and follow! Please indicate the source when transcribing and publishing the article.

Correctness, tightness Liberians Escort and accuracy reflect the comprehensive impact of system errors and random errors. High accuracy, high accuracy and precision of analysis, which means that system errors and accidental errors are small. EverythingAll measurements should be completed closely and accurately. Error origin, error Published on 02-08 09:21
pyhanlp Wentian Class and Emotion Analysis The prediction interfaces are all thread-safe (designed not to store core results and put all core results into the parameter stack). Emotional analysis can use the model trained on the emotional polarity corpus of Wen Tianjie to do shallow emotional analysis. The currently public emotion analysis corpora include: Chinese emotion mining corpus published on 02-20 15:37
The NLPIR platform’s technical analysis of literary and astronomical problems is to return a document to pre-defined geometry One or more of the individual categories, and the automatic classification of text is to use computer programs to achieve this. Published on 11-18 17:46
The Chinese automatic text classification based on article title information The text classification is An important component of text mining is an important research topic in the field of information search. This article proposes a method for extracting Chinese active literature, science and technology vocational regulations based on article title information. It was actually issued by HNC on 04-13 08:31 • 10 times downloaded
A method for extracting literature, science and technology vocational regulations based on GA and information entropy Job classification is a very important technology in text data mining and has been widely used in many fields such as information management, search engines, and recommendation systems. Most of the existing astronomy methods are algorithms based on vector space models. This was published on 06-03 09:22 • 26 times downloaded
A short text vocational method that integrates word category characteristics and semantics. In view of the inherent characteristics of short texts such as lengthy events and sparse features, a new method that integrates word category characteristics and semantics is proposed. Semantic short essay vocational way. This method adopts an improved feature selection method. Published on 11-22 16:29 • 0 downloads
How to use the Spark computing framework to conduct research on distributed astronomy methods. In view of the increasing number of problems faced by traditional astronomy algorithms, To solve the problem of low efficiency when dealing with massive text data, the paper designed and implemented a parallel simple Baye civilized classifier on the Spark computing framework, and focused on the implementation of text classifiers based on the Spark computing framework. Published on 12-18 14:19 • 3 downloads
A large-scale “real fragrance scene” for Wentian profession is here. A large-scale “True fragrance scene” for Wentian profession is here: JayJay’s tweet “Super “Strong Text Semi-Monitoring MixText” tells everyone not to waste unlabeled data, but annotated data is still needed! But the paper introduced today, 's avatar was published on 02-05 11:02 • 1771 times viewed
Artificial and astronomical occupation analysis based on deep neural network With the depth enterWith the rapid development of learning technology, many researchers have tried to use deep learning to solve astronomy problems, especially in convolutional neural networks and recurrent neural networks, where many novel and effective classification methods have emerged. Advances in Astronomy and Astronomy Questions Based on Deep Neural Networks Posted on 03-10 Liberians Sugardaddy16:56 •37 downloads
A review of the liberal arts class algorithm based on topic similarity clustering. The traditional liberal arts class methodLiberians Escort only applies oneLiberia Sugar Daddy When classifying using this model, it is not difficult to ignore the overlap of feature words of different categories, which affects the classification performance. In order to improve the accuracy of astronomy and astronomy categories, clustering based on topic similarity is proposed. Published on 05-12 16:25 • 6 downloads
Research on astronomy and astronomy category methods based on different neural networks. Comparative neural network and time regression The development process of mainstream methods such as neural networks, structural recurrent neural networks and pre-training models in the astronomy category. Comparing the classification results of different models based on commonly used data sets, demonstrating the use of artificial neural network architecture Published on 05-13 16:34 •48 downloads
Convolutional capsule network textual classification algorithm based on dual-channel word vectors Textual representation based on vector space model has the characteristics of high latitude and high sparsity, weak feature expression ability, and features The project relies on manual extraction and is relatively expensive. To solve this problem, a convolutional capsule network based on dual-channel word vectors was proposed. Published on 05-24 15:07 • 6 downloads
A fuzzy text classification method based on topic distribution optimization is used to classify ambiguous texts. , the topic model only considers document and topic level information, but does not consider the implicit information between underlying words and most topic information is complex and complex. Liberians Sugardaddy doesn’t understand. To this end, an improved method for the civil and natural professions is proposed. Selecting centrally explicit topics through quantile Published on 05-25 16:33 • 5 downloads
LSTM-based representation learning-Text class model Text representation and classification are hot research topics in the field of natural language understanding. At present, there are many civil servantsClass methods include convolutional collections, recursive collections, self-attention mechanisms, and their combinations. However, a complex network cannot improve from the most basic level Published on 06-15 16:17 • 18 downloads
The basic process of PyTorch textual tasks The textual type is a relatively easy entry into the field of NLP Title, this article records the basic process of literary tasks. Most of Liberians Sugardaddy uses **torch** and **torchtext **Two libraries. ## 1. Text data preprocessing 's avatar Published on 02-22 14:23 •965 views

How to use less Liberia LR sugar data to automatically classify text and sky, and at the same time, the accuracy is higher than the original method

留言

發佈留言取消回覆

How to use less Liberia LR sugar data to automatically classify text and sky, and at the same time, the accuracy is higher than the original method

留言

發佈留言 取消回覆

發佈留言取消回覆