[repost ]ALIZE 3.0 – Open-source platform for speaker recognition


Anthony Larcher, Jean-Francois Bonastre and Haizhou Li

SLTC Newsletter, May 2013

ALIZE is a collaborative Open Source toolkit developed for speaker recognition since 2004. The latest release (3.0) includes state-of-the-art methods such as Joint Factor Analysis, i-vector modelling and Probabilistic Linear Discriminant Analysis. The C++ multi-platform implementation of ALIZE is designed to handle the increasing data quantity required for speaker and language detection and facilitate the development of state-of-the-art systems. This article reveals the motivation of the ALIZE open source platform, its architecture, the collaborative community activities, and the functionalities that are available in the 3.0 release.


Speaker and language detection systems have greatly improved in the last few decades. The performance observed in the NIST Speaker Recognition and Language Recognition Evaluations demonstrates the achievements of the scientific community to make systems more accurate and robust to noise and channel nuisances. These advancements lead to two major results in terms of system development. First, automatic systems are more complex and the probabilistic model training requires a huge amount of data that can only be handled through parallel computation. Second, evaluating system performance calls for enormous numbers of trials to maintain confidence in the results and provide statistical significance when dealing with low error rates. ALIZE offers a simple solution to tackle these issues by providing a set of free efficient multi-platform tools that can be used to build a state-of-the-art speaker or language recognition system.


ALIZE has been initiated by the University of Avignon – LIA in 2004. Since then, many research institutes and companies have contributed to the project. This collaboration is facilitated by a number of tools available from the ALIZE website: (http://alize.univ-avignon.fr/)

  • Online documentation, wiki and tutorials
  • Doxygen developer documentation to familiarize with the low level API
  • A mailing list to be informed of the latest novelties and receive support from the community
  • An SVN server to download the latest version of the source code
  • A download platform to get the latest release of the toolkit

A LinkedIn group is also available to provide a means to learn about the facilities and people involved in the field of speaker recognition.


The ALIZE project consists of a low level API (ALIZE) and a set of high level executables that form the LIA_RAL toolkit. The ensemble makes it possible to easily set up a speaker recognition system for research purposes as well as develop industry based applications.

LIA_RAL is a high level toolkit based on the low level ALIZE API. It consists of three sets of executables: LIA_SpkSeg, LIA_Utils and LIA_SpkDET. LIA_SpkSeg and LIA_Utils respectively include executables dedicated to speaker segmentation and utility programs to handle ALIZE objects while LIA_SpkDet is developed to fulfil the main functions of a state-of-the-art speaker recognition system as described in the following figure.

Figure 1: General architecture of a speaker recognition system.

  • Feature mean and variance normalization
  • Energy-based speech activity detection
  • Remove overlapping sections of speech features in two-channel recordings
  • GMM adaptation (Maximum A Posteriori [14], Maximum Linear Logistic Regression [15])
  • Joint Factor Analysis [3]
  • Latent Factor Analysis [2]
  • SVM modelling based on LibSVM [16] and Nuisance Attribute Projection [1]
  • I-vector extraction [5] and normalization through Eigen Factor Radial [7], Length Normalization [8], Spherical Nuisance Normalization [9]. Within Class Covariance Normalization [5], Linear Discriminant Analysis
  • Frame-by-frame scoring
  • Cosine similarity [5]
  • Mahalanobis scoring [7]
  • Two-covariance scoring [10]
  • Probabilistic Linear Discriminant Analysis (PLDA) [11] following implementations proposed in [12] and scoring functions described in [17]
  • EM based GMM training
  • Total Variability matrix estimation [5] using minimum divergence criteria [6]
  • Probabilistic Linear Discriminant Analysis training [11]
  • Estimation of normalization meta-parameters [7,8,9]
  • T-norm [4]
  • Z-norm [13]
  • ALIZE does not include acoustic feature extraction but is compatible withSPro , HTK and RAW formats
  • Score matrices can be exported in binary format easily handled by the BOSARIS toolkit

LIA_RAL also includes a number of tools to manage objects in ALIZE format.


ALIZE source code is available under the LGPL license which imposes minimum restriction on the redistribution of covered softwares.

ALIZE software architecture is based on UML modelling and strict code conventions in order to facilitate collaborative development and code maintenance. The platform includes a Visual Studio ® solution as well as autotools for easy compilation on UNIX-like platforms. Parallization of the code has been implemented based on the Posix standard library and some executables can be linked to Lapack for accurate and fast matrix operations. The whole toolkit has been tested under Windows, MacOs and different Linux distributions in both 32 and 64 bits architectures.

An open-source and cross-platform test suite enables ALIZE’s contributors to quickly run regression tests in order to increase the reliability of future releases and to make the code easier to maintain. Test cases include a low level unit test level on the core ALIZE and the most important algorithmic classes as well as an integration test level on the high-level executable tools. Doxygen documentation is available online and can be compiled from the sources.


ALIZE 3.0 release has been partly supported by the BioSpeak project, part of the EU-funded Eurostar/Eureka program.


Subscribe to the mailing list by sending an email to: dev-alize-subscribe[AT]listes.univ-avignon.fr

The best way to contact ALIZE responsibles is to send a email to: alize[AT]univ-avignon.fr


[1] W. Campbell, D. Sturim and D. Reynolds, “Support Vector Machines Using GMM Supervectors for Speaker Verification,” in IEEE Signal Processing Letters, Institute of Electrical and Numerics Engineers, 2006, 13, 308

[2] D. Matrouf, N. Scheffer, B. Fauve and J.-F. Bonastre, “A straightforward and efficient implementation of the factor analysis model for speaker verification,” in Annual Conference of the International Speech Communication Association (Interspeech), pp. 1242-1245, 2007

[3] P. Kenny, G. Boulianne, P. Ouellet and P. Dumouchel, “Joint factor analysis versus eigenchannels in speaker recognition,” in IEEE Transactions on Audio, Speech, and Language Processing, 15(4), pp. 1435-1447, 2007

[4] R. Auckenthaler, M. Carey, and H. Lloyd-Thomas, “Score Normalization for Text-Independent Speaker Verification System,” in Digital Signal Processing, pp. 42-54, 2000

[5] N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-End Factor Analysis for Speaker Verification,” in IEEE Transactions on Audio, Speech, and Language Processing, 19, pp. 788-798, 2011

[6] N. Brummer, “The EM algorithm and minimum divergence,” in Agnitio Labs Technical Report, Online: http://niko.brummer.googlepages

[7] P.-M. Bousquet, D. Matrouf, and J.-F. Bonastre, “Intersession compensation and scoring methods in the i-vectors space for speaker recognition,” in Annual Conference of the International Speech Communication Association (Interspeech), pp. 485-488, 2011

[8] D. Garcia-Romero and C.Y. Espy-Wilson, “Analysis of i-vector length normalization in speaker recognition systems,” in Annual Conference of the International Speech Communication Association (Interspeech), pp. 249-252, 2011

[9] P.-M. Bousquet, A. Larcher, D. Matrouf, J.-F. Bonastre and O. Plchot, “Variance-Spectra based Normalization for I-vector Standard and Probabilistic Linear Discriminant Analysis,” in Odyssey Speaker and Language Recognition Workshop, 2012

[10] N. Brummer, and E. de Villiers, “The speaker partitioning problem,” in Odyssey Speaker and Language Recognition Workshop, 2010

[11] S.J. Prince and J.H. Elder, “Probabilistic linear discriminant analysis for inferences about identity,” in International Conference on Computer Vision, pp. 1-8, 2007

[12] Y. Jiang, K. A. Lee, Z. Tang, B. Ma, A. Larcher and H. Li, “PLDA Modeling in I-vector and Supervector Space for Speaker Verification,” in Annual Conference of the International Speech Communication Association (Interspeech), pp. 1680-1683, 2012

[13] K.-P. Li and J. E. Porter, “Normalizations and selection of speech segments for speaker recognition scoring,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pp. 595-598, 1998

[14] J.-L. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pp. 291-298, 1994

[15] C. J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models,” in Computer Speech and Language, pp. 171-185, 1995

[16] C.-C. Chang and C.-J. Lin, “LIBSVM : a library for support vector machines,” in ACM Transactions on Intelligent Systems and Technology, pp. 1-27, 2011

[17] A. Larcher, K. A. Lee, B. Ma and H. Li, “Phonetically-Constrained PLDA Modelling for Text-Dependent Speaker Verification with Multiple Short-Utterances,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2013

Anthony Larcher is Research Staff in the Department of Human Language Technology at Institute for Infocomm Research, Singapore. His interests are mainly in speaker and language recognition.

Jean-Francois Bonastre is IEEE Senior Member. He is Professor in Computer Sciences at the University of Avignon and Vice President of the university. He is also member of the Institut Universitaire de France (Junior 2006). He has been President of International Speech Communication Association (ISCA) since September 2011.

Haizhou Li is the Head of the Department of Human Language Technology at Institute for Infocomm Research, Singapore. He is a Board Member of International Speech Communication Association.

[repost ]深度学习在语音识别的研究,以及语音处理常用资源


keywords: speech processing, speech recognition, speaker recognition, deep learning

card lists:


deep learning and speech recognition


Li Deng (IEEE M’89;SM’92;F’04) received the Ph.D. degree from the University of Wisconsin-Madison. He was an assistant professor (1989-1992), tenured associate professor (1992-1996), and tenured Full Professor (1996-1999) at the University of Waterloo, Ontario, Canada. In 1999, he joined Microsoft Research, Redmond, WA, where he is currently Principal Researcher and Research Manager of the Deep Learning Technology Center. Since 2000, he has also been an Affiliate Full Professor and graduate committee member at the University of Washington, Seattle, teaching graduate course of Computer Speech Processing and serving on Ph.D. thesis committees. Prior to joining Microsoft, he also worked or/and taught at Massachusetts Institute of Technology, ATR Interpreting Telecom. Research Lab. (Kyoto, Japan), and HKUST. He has been granted over 60 US or international patents in acoustics/audio, speech/language technology, and machine learning. He received numerous awards/honors bestowed by IEEE, ISCA, ASA, Microsoft, and other organizations.

http://research.microsoft.com/pubs/217165/ICASSP_DeepTextLearning_v07.pdf Deep Learning for Natural Language Processing and Related Applications, Microsoft

http://www.cs.toronto.edu/~ndjaitly/techrep.pdf Application of Pretrained Deep Neural Networks to Large Vocabulary Conversational Speech Recognition (2012) interspeech (work at Google)

http://research.microsoft.com/pubs/189008/tasl-deng-2244083-x_2.pdf Li Deng, Xiao Li, Machine Learning Paradigms for Speech Recognition: An Overview


http://www.cs.toronto.edu/~hinton/ Geoffrey Everest Hinton FRS (born 6 December 1947) is a British-born computer scientist and psychologist, most noted for his work on artificial neural networks. He is now partly working for Google.[1] He is the co-inventor of the backpropagation and contrastive divergence training algorithms and is an important figure in the deep learning movement.

http://research.google.com/pubs/VincentVanhoucke.html Vincent Vanhoucke is a Research Scientist at Google. He is a technical lead and manager in Google’s deep learning infrastructure team. Prior to that, he lead the speech recognition quality effort for Google Search by Voice. He holds a Ph.D. in Electrical Engineering from Stanford University and a Diplôme d’Ingénieur from the Ecole Centrale Paris.

http://psych.stanford.edu/~jlm/pdfs/Hinton12IEEE_SignalProcessingMagazine.pdf Deep Neural Networks for Acoustic Modeling in Speech Recognition (2012) IEEE Signal Processing Magazine

http://research.google.com/pubs/SpeechProcessing.html Google Speech processing

other research groups

http://mi.eng.cam.ac.uk/Main/Speech/ Cambridge University

  • 回复@黄浩XJU: 谢谢指正,剑桥的工作很全面,目前 http://t.cn/RP8YGTX Phil Woodland 有个中国学生 张超 在做深度学习研究

http://www.speech.cs.cmu.edu/ CMU

http://www.speech.sri.com/ SRI

http://www.clsp.jhu.edu/people/ Center for Language and Speech Processing at Johns Hopkins University

speech processing resources

tools and open source tools


  • quite some software leverage google speech api to provide online speech to text on mobile devices.

http://www.signalprocessingsociety.org/technical-committees/list/sl-tc/spl-nl/2013-05/ALIZE/ ALIZE 3.0 – Open-source platform for speaker recognition


http://kaldi.sourceforge.net/about.html Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.









http://www.technologyreview.com/news/427793/where-speech-recognition-is-going/ Where Speech Recognition Is Going

http://technav.ieee.org/tag/1597/speaker-recognition 48 resources related to Speaker Recognition

http://www.emory.edu/BUSINESS/speech/SpeechRecCase.pdf nuance white paper, business use cases


Popular speech recognition conferences held each year or two include SpeechTEK and SpeechTEK Europe, ICASSP, Interspeech/Eurospeech, and the IEEE ASRU. Conferences in the field of natural language processing, such as ACL, NAACL, EMNLP, and HLT, are beginning to include papers on speech processing. Important journals include the IEEE Transactions on Speech and Audio Processing (now named IEEE Transactions on Audio, Speech and Language Processing), Computer Speech and Language, and Speech Communication.

http://www.interspeech2014.org/public.php?page=tutorial.html tutorial of interspeech 2014

http://www.icassp2014.org/tutorials.html icassp 2014

http://www.speechtek.com/2014/ SpeechTek

http://www.asru2013.org/ ASRU

http://www.iscslp2014.org/public.php?page=keynote.html ISCSLP@INTERSPEECH 2014 – The 9th International Symposium on Chinese Spoken Language Processing


@血色又残阳 问:需要语音处理的资料? 要求 1、论文最好有配套代码,可以跑; 2、当前学术界和工业界最新或者主流技术有哪些; 3、是否有跟深度学习结合的; 4、最好也有说话人鉴别的相关论文和代码。https://github.com/memect/hao/issues/50

yongsun :有没有开源或者免费的英文语音识别软件/或项目? 打算翻译一些冰球教学的视频,想结合识别结果来进行听译 https://github.com/memect/hao/issues/53

[repost ]Dictation – Speech Recognition in the Browser


Meet Dictation v2.0, a web-based speech recognition app that will transcribe your voice into digital text using the Chrome Speech API. You can also install Dictation as a Chrome App.

Unlike the regular Chrome web apps that are nothing but fancy bookmarks, the Dictation App for Chrome will run entirely on your computer.

Dictation for Chrome

Dictation Gets New Voice Commands & Auto-Save

Getting started with Dictation in simple. Just plug in the microphone to your computer, click the Start Dictation button and watch as your spoken words are magically transformed into text. You can also use a few voice commands like:

  • Say “new sentence” to begin a new sentence. Dictation will automatically add a period to the previous sentence and capitalizes the first letter of your new sentence.
  • Say “new paragraph” to move the cursor to the next paragraph.
  • Say “stop listening” to exit the dictation mode. If you wish to resume recording, hit the “Start” button again.

If you make a mistake, or if Chrome makes an error while recognizing your speech, simple click the incorrect word and edit it inline. The entire notepad is editable as it uses the contenteditable attribute of HTML5.

Dictation 2.0 – What’s New

The first release of Dictation happened in August 2012 and much has changed since then. The Web Speech API is now part of Google Chrome though you still need an active network connection for Chrome to connect to the speech recognition servers.

The new version of Dictation App does sport a few extra features. One, it will auto-save your work in Chrome’s local storage so you can close the browser and your session would still be available the next time you Dictation.

Also, you can now export your transcriptions to Dropbox and Google Drive from Dictation itself. Please watch this YouTube video for a quick demo.

The Speech-to-Text support in Chrome is mostly accurate but because the API is still experimental, you are not allowed to submit Packaged Apps to the Chrome store that use the Web Speech API.

Related: Add Speech Recognition to your Website

[repost ]Speaker Recognition


Speaker Recognition

View this topic in

Conferences related to Speaker Recognition

Back to Top

ICASSP 2014 – 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

ICASSP 2014 will be the world s largest and most comprehensive technical conference focused on the many facets of signal processing and its applications. The conference will feature world-class speakers, tutorials, exhibits, and oral/poster sessions on the most up-to-date topics in signal processing research.

2013 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

The ASRU workshop meets every two years and has a tradition of bringing together researchers from academia and industry in an intimate and collegial setting to discuss problems of common interest in automatic speech recognition and understanding.

2013 International Carnahan Conference on Security Technology (ICCST)

This international conference is a forum for all aspects of physical, cyber and electronic security research, development, systems engineering, testing, evaluation, operations and sustainability. The ICCST facilitates the exchange of ideas and information.

2013 National Conference on Communications (NCC)

Original contributions in the fields of communications, networking, and signal processing, based on theoretical, experimental, design, development, simulation, application, test, measurement and similar studies are solicited for presentation at NCC-2013.

2012 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA)

ISSPA 2012 is the eleventh event in the series of conferences, which since 1985 has brought together leading researchers and practitioners from academia and industry engaged in research and development related to signal processing theory and applications. In 2007, ISSPA extended its coverage to include the complementary field of Information Sciences.

More Conferences

Periodicals related to Speaker Recognition

Back to Top

Audio, Speech, and Language Processing, IEEE Transactions on

Speech analysis, synthesis, coding speech recognition, speaker recognition, language modeling, speech production and perception, speech enhancement. In audio, transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. (8) (IEEE Guide for Authors) The scope for the proposed transactions includes SPEECH PROCESSING – Transmission and storage of Speech signals; speech coding; speech enhancement and noise reduction; …

Xplore Articles related to Speaker Recognition

Back to Top

Text-dependent speaker recognition using speaker specific compensation

Laxman, S.; Sastry, P.S. TENCON 2003. Conference on Convergent Technologies for the Asia-Pacific Region, 2003

This paper proposes a new method for text-dependent speaker recognition. The scheme is based on learning (what we refer to as) speaker-specific compensators for each speaker in the system. The compensator is essentially a speaker to speaker transformation which enables the recognition of the speech of one speaker through a speaker-dependent speech recognition system built for the other. Such a …

Analysis of effect of compensation parameter estimation for CMN on speech/speaker recognition

Longbiao Wang; Kitaoka, N.; Nakagawa, S. Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on, 2007

In a distant environment, channel distortion may drastically degrade speech recognition and speaker recognition performances. In this paper, we provide the analysis of effect of compensation parameter estimation for Cepstral Mean Normalization (CMN) on speech/speaker recognition. We first investigate the differences between the intra-speaker variation and the inter-speaker variation by analyzing the cepstrum distances of Japanese vowels. It is indicated …

Multi-speaker adaptation for robust speech recognition under ubiquitous environment

Po-Yi Shih; Jhing-Fa Wang; Yuan-Ning Lin; Zhong-Hua Fu Speech Database and Assessments, 2009 Oriental COCOSDA International Conference on, 2009

This paper presents a multi-speaker adaptation for robust speech recognition under ubiquitous environment. The goal is to adapt the speech recognition model for each speaker correctly in ubiquitous multi-speaker environment. We integrate speaker recognition and unsupervised speaker adaptation method to promote the speech recognition performances. Specifically we employ a confidence measure to reduce the possible negative adaptation caused by the …

Partially Supervised Speaker Clustering

Hao Tang; Chu, S.M.; Hasegawa-Johnson, M.; Huang, T.S. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2012

Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in …

A Word-Dependent Automatic Arabic Speaker Identification System

Al-Dahri, S.S.; Al-Jassar, Y.H.; Alotaibi, Y.A.; Alsulaiman, M.M.; Abdullah-Al-Mamun, K. Signal Processing and Information Technology, 2008. ISSPIT 2008. IEEE International Symposium on, 2008

Automatic speaker recognition is one of the difficult tasks in the field of computer speech and speaker recognition. Speaker recognition is a biometric process of automatically recognizing who is speaking on the basis of speaker dependent features of the speech signal. Currently, speaker recognition system is an important need for authenticating the personal like other biometrics such as finger prints …

More Xplore Articles

Educational Resources on Speaker Recognition

Back to Top


Biometrics for Recognition at a Distance

Sarkar, Sudeep Biometrics for Recognition at a Distance, 2010

It has been folklore that humans can identify others based on their biological movement from a distance. This observation was somewhat bolstered by experiments with light point displays by human perception researchers in the 70s and have been confirmed by recent human perception experiments. However, it is only recently that computer vision based gait biometrics has received much attention. Recent …

Voice: Technologies and Algorithms for Biometrics Applications

Beigi, Homayoon Voice: Technologies and Algorithms for Biometrics Applications, 2010

This tutorial provides an in-depth look at Speaker recognition. Speaker recognition is a technique that would use the vocal characteristics of an individual’s voice to be able to identify the person and verify the person. Different forms and modalities of speaker recognition will be discussed in this tutorial. Regardless of the forms and modalities of speaker recognition, ultimately, speaker recognition …


No IEEE-USA E-Books are currently tagged “Speaker Recognition”

[repost ]深度学习入门与综述资料


contributors: @自觉自愿来看老婆微博 @邓侃 @星空下的巫师

created: 2014-09-16


http://en.wikipedia.org/wiki/Deep_learning Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using model architectures composed of multiple non-linear transformations.


http://cacm.acm.org/magazines/2013/6/164601-deep-learning-comes-of-age/abstract Deep Learning Comes of Age

http://www.datarobot.com/blog/a-primer-on-deep-learning/ A Primer on Deep Learning (2014)



http://deeplearning.net/tutorial/ Deep Learning Tutorials

http://neuralnetworksanddeeplearning.com/index.html Michael Nielsen (2014) 概念讲得很细致

  • @自觉自愿来看老婆微博 共同推荐

邓侃 Deep Learning 系列

http://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/ckdqtpe 伯克利Michael Jordan教授论深度学习, 附上学习笔记 1. layer,parallel,ensemble有用,不能限于模拟人脑思维 2. backpropagation是关键, 本质是supervised learning 3. 很多成功案例是大规模样本+监督学习 4. 很少用在工业界咨询,不少其它问题(7个例子) 5. 机器学习不止是AI,还要接近system与数据库



http://research.microsoft.com/pubs/204048/APSIPA-Trans2013-revised-final.pdf Li Deng, A Tutorial Survey of Architectures, Algorithms, and Applications for Deep Learning , in APSIPA Transactions on Signal and Information Processing, Cambridge University Press, 2014

Text 文本 NLP

http://nlp.stanford.edu/courses/NAACL2013/ Deep Learning for Natural Language Processing (without Magic)

  • 自然语言处理 NLP 方向(文本为主)

Speech 语音 NLP

http://research.microsoft.com/pubs/217165/ICASSP_DeepTextLearning_v07.pdf Deep learning for natural language processing and related applications (Tutorial at ICASSP)

  • Xiaodong He, Jianfeng Gao, and Li Deng
  • 自然语言处理 NLP 方向 (语音为主,也包括文本)
  • spoken language understanding (SLU), machine translation (MT), and semantic information retrieval (IR) from text.

Computer Vision 视觉

https://sites.google.com/site/deeplearningcvpr2014/ TUTORIAL ON DEEP LEARNING FOR VISION

Yann LeCun’s Lecture on Computer Perception with Deep Learning in Course 9.S912: “Vision and learning – computers and brains”, Nov 12, 2013:






matlab deeplearning toolbox

[repost ]机器学习入门资源不完全汇总


2014-10-14版, 好东西传送门编辑整理, 原文链接 http://ml.memect.com/article/machine-learning-guide.html

感谢贡献者: tang_Kaka_back@新浪微博

欢迎补充指正,转载请保留原作者和原文链接。本文是机器学习日报的一个专题合集,欢迎订阅: 给hao@memect.com发个邮件,标题"订阅机器学习日报"。



机器学习 “机器学习是近20多年兴起的一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。机器学习理论主要是设计和分析一些让计算机可以自动“学习”的算法。机器学习算法是一类从数据中自动分析获得规律,并利用规律对未知数据进行预测的算法。因为学习算法中涉及了大量的统计学理论,机器学习与统计推断学联系尤为密切,也被称为统计学习理论。算法设计方面,机器学习理论关注可以实现的,行之有效的学习算法。” –摘自维基百科

How do you explain Machine Learning and Data Mining to non Computer Science people? @quora by Pararth Shah, 中文版 如何向小白介绍何谓机器学习和数据挖掘?买回芒果他就懂了 @36kr — 这印证了上面讲的定义 “机器学习就是从现象中发现统计规律,再利用规律预测”。当一车水果混作一团时,监督学习(supervised learning)能根据你提供的几个苹果样本帮你把所有苹果从梨,芒果中区分出来; 无监督学习(unsupervised learning)能根据已知的各种特征,无需样本自动把类似的水果分上几堆(也许是红水果和黄水果,也许是大苹果小苹果,…);关联规则学习(association rule learning) 则是帮你发现基于规则的规律,例如绿色的小苹果都有点酸。


图1: 机器学习的例子:NLTK监督学习的工作流程图 (source: http://www.nltk.org/book/ch06.html)

图2: 机器学习概要图 by Yaser Abu-Mostafa (Caltech) (source: http://work.caltech.edu/library/181.html)

图3: 机器学习实战:在python scikit learn 中选择机器学习算法 by Nishant Chandra (source: http://n-chandra.blogspot.com/2013/01/picking-machine-learning-algorithm.html)

图4: 机器学习和其他学科的关系: 数据科学的地铁图 by Swami Chandrasekaran (source:http://nirvacana.com/thoughts/becoming-a-data-scientist/)


大致分三类: 起步体悟,实战笔记,行家导读



Tom Mitchell 和 Andrew Ng 的课都很适合入门


2011 Tom Mitchell(CMU)机器学习

英文原版视频与课件PDF 他的《机器学习》在很多课程上被选做教材,有中文版。

  • Decision Trees
  • Probability and Estimation
  • Naive Bayes
  • Logistic Regression
  • Linear Regression
  • Practical Issues: Feature selection,Overfitting …
  • Graphical models: Bayes networks, EM,Mixture of Gaussians clustering …
  • Computational Learning Theory: PAC Learning, Mistake bounds …
  • Semi-Supervised Learning
  • Hidden Markov Models
  • Neural Networks
  • Learning Representations: PCA, Deep belief networks, ICA, CCA …
  • Kernel Methods and SVM
  • Active Learning
  • Reinforcement Learning 以上为课程标题节选

2014 Andrew Ng (Stanford)机器学习

英文原版视频果壳讨论 这就是针对自学而设计的,免费还有修课认证。“老师讲的是深入浅出,不用太担心数学方面的东西。而且作业也非常适合入门者,都是设计好的程序框架,有作业指南,根据作业指南填写该完成的部分就行。”(参见白马同学的入门攻略)”推荐报名,跟着上课,做课后习题和期末考试。(因为只看不干,啥都学不会)。” (参见reyoung的建议)

  1. Introduction (Week 1)
  2. Linear Regression with One Variable (Week 1)
  3. Linear Algebra Review (Week 1, Optional)
  4. Linear Regression with Multiple Variables (Week 2)
  5. Octave Tutorial (Week 2)
  6. Logistic Regression (Week 3)
  7. Regularization (Week 3)
  8. Neural Networks: Representation (Week 4)
  9. Neural Networks: Learning (Week 5)
  10. Advice for Applying Machine Learning (Week 6)
  11. Machine Learning System Design (Week 6)
  12. Support Vector Machines (Week 7)
  13. Clustering (Week 8)
  14. Dimensionality Reduction (Week 8)
  15. Anomaly Detection (Week 9)
  16. Recommender Systems (Week 9)
  17. Large Scale Machine Learning (Week 10)
  18. Application Example: Photo OCR
  19. Conclusion


2013年Yaser Abu-Mostafa (Caltech) Learning from Data — 内容更适合进阶 课程视频,课件PDF@Caltech

  1. The Learning Problem
  2. Is Learning Feasible?
  3. The Linear Model I
  4. Error and Noise
  5. Training versus Testing
  6. Theory of Generalization
  7. The VC Dimension
  8. Bias-Variance Tradeoff
  9. The Linear Model II
  10. Neural Networks
  11. Overfitting
  12. Regularization
  13. Validation
  14. Support Vector Machines
  15. Kernel Methods
  16. Radial Basis Functions
  17. Three Learning Principles
  18. Epilogue

2014年 林軒田(国立台湾大学) 機器學習基石 (Machine Learning Foundations) — 内容更适合进阶,華文的教學講解 课程主页

When Can Machines Learn? [何時可以使用機器學習] The Learning Problem [機器學習問題] — Learning to Answer Yes/No [二元分類] — Types of Learning [各式機器學習問題] — Feasibility of Learning [機器學習的可行性]

Why Can Machines Learn? [為什麼機器可以學習] — Training versus Testing [訓練與測試] — Theory of Generalization [舉一反三的一般化理論] — The VC Dimension [VC 維度] — Noise and Error [雜訊一錯誤]

How Can Machines Learn? [機器可以怎麼樣學習] — Linear Regression [線性迴歸] — Linear `Soft’ Classification [軟性的線性分類] — Linear Classification beyond Yes/No [二元分類以外的分類問題] — Nonlinear Transformation [非線性轉換]

How Can Machines Learn Better? [機器可以怎麼樣學得更好] — Hazard of Overfitting [過度訓練的危險] — Preventing Overfitting I: Regularization [避免過度訓練一:控制調適] — Preventing Overfitting II: Validation [避免過度訓練二:自我檢測] — Three Learning Principles [三個機器學習的重要原則]


2008年Andrew Ng CS229 机器学习 — 这组视频有些年头了,主讲人这两年也高大上了.当然基本方法没有太大变化,所以课件PDF可下载是优点。 中文字幕视频@网易公开课 | 英文版视频@youtube | 课件PDF@Stanford

第1集.机器学习的动机与应用 第2集.监督学习应用.梯度下降 第3集.欠拟合与过拟合的概念 第4集.牛顿方法 第5集.生成学习算法 第6集.朴素贝叶斯算法 第7集.最优间隔分类器问题 第8集.顺序最小优化算法 第9集.经验风险最小化 第10集.特征选择 第11集.贝叶斯统计正则化 第12集.K-means算法 第13集.高斯混合模型 第14集.主成分分析法 第15集.奇异值分解 第16集.马尔可夫决策过程 第17集.离散与维数灾难 第18集.线性二次型调节控制 第19集.微分动态规划 第20集.策略搜索

2012年余凯(百度)张潼(Rutgers) 机器学习公开课 — 内容更适合进阶 课程主页@百度文库课件PDF@龙星计划

第1节Introduction to ML and review of linear algebra, probability, statistics (kai) 第2节linear model (tong) 第3节overfitting and regularization(tong) 第4节linear classification (kai) 第5节basis expansion and kernelmethods (kai) 第6节model selection and evaluation(kai) 第7节model combination (tong) 第8节boosting and bagging (tong) 第9节overview of learning theory(tong) 第10节optimization in machinelearning (tong) 第11节online learning (tong) 第12节sparsity models (tong) 第13节introduction to graphicalmodels (kai) 第14节structured learning (kai) 第15节feature learning and deeplearning (kai) 第16节transfer learning and semi supervised learning (kai) 第17节matrix factorization and recommendations (kai) 第18节learning on images (kai) 第19节learning on the web (tong)



http://www.52ml.net/ 我爱机器学习

http://www.mitbbs.com/bbsdoc/DataSciences.html MITBBS- 电脑网络 – 数据科学版

http://www.guokr.com/group/262/ 果壳 > 机器学习小组

http://cos.name/cn/forum/22 统计之都 » 统计学世界 » 数据挖掘和机器学习

http://bbs.byr.cn/#!board/ML_DM 北邮人论坛 >> 学术科技 >> 机器学习与数据挖掘


https://github.com/josephmisiti/awesome-machine-learning 机器学习资源大全

http://work.caltech.edu/library/ Caltech 机器学习视频教程库,每个课题一个视频

http://www.kdnuggets.com/ 数据挖掘名站

http://www.datasciencecentral.com/ 数据科学中心网站




  • 机器学习关注从训练数据中学到已知属性进行预测
  • 数据挖掘侧重从数据中发现未知属性

Dan Levin, What is the difference between statistics, machine learning, AI and data mining?

  • If there are up to 3 variables, it is statistics.
  • If the problem is NP-complete, it is machine learning.
  • If the problem is PSPACE-complete, it is AI.
  • If you don’t know what is PSPACE-complete, it is data mining.

几篇高屋建瓴的机器学习领域概论, 参见原文


  • Machine Learning in Action Peter Harrington 中文版 机器学习实战 @豆瓣 — “这本书能让你明白:那些被吹捧得出神入化的分类算法,竟然实现起来如此简单; 那些看是高深的数学理论,其实一句话就能道明其本质; 一切复杂的事物,出发点都是非常简单的想法。” 摘自Kord @豆瓣的评论
  • 李航博士的书 统计学习方法 @豆瓣 — 首先这是一本好书,“如果我什么都不知道,这种干货为主的传统教科书很可能会让我讨厌机器学习的(个人观点)。但是,如果把这本书作为参考书,那将是非常好的一本,一方面算是比较权威吧,另一方面是简洁,用公式、逻辑说话,不做太多通俗的解释,比起PRML等书就简洁了很多,有着独特的魅力和市场需求。” 摘自chentingpc @豆瓣的评论
  • 机器学习经典书籍 @算法组 by 算法组

[repost ]MLSS Machine Learning Summer Schools


(forked from http://www.mlss.cc/) adding more links to the list


  • 特别推荐09年UK的MLSS 所有还幻灯片 打包下载ZIP 51M @bigiceberg 推荐 “其中09年UK的mlss最经典”

Future (8)

  • MLSS Spain (Fernando Perez-Cruz), late spring 2016 (tentative)
  • MLSS London (tentative)
  • MLSS Tübingen, summer 2017 (tentative)
  • MLSS Africa (very tentative)
  • MLSS Kyoto (Marco Cuturi, Masashi Sugiyama, Akihiro Yamamoto), August 31 – September 11 (tentative), 2015
  • MLSS Tübingen (Michael Hirsch, Philipp Hennig, Bernhard Schölkopf), July 13-24, 2015
  • MLSS Sydney (Edwin Bonilla, Yang Wang, Bob Williamson), 16 – 25 February, 2015http://www.nicta.com.au/research/machine_learning/mlss2015
  • MLSS Austin (Peter Stone, Pradeep Ravikumar), January 7-16, 2015 http://www.cs.utexas.edu/mlss/

Past (25)