JCP 2013 Vol.8(3): 638-644 ISSN: 1796-203X
doi: 10.4304/jcp.8.3.638-644
doi: 10.4304/jcp.8.3.638-644
Speaker Change Detection based on Mean Shift
Ji-chen Yang, Qian-hua He, Yan-xiong Li, and Xue-yuan Zhang
South China University of Technology / School of Electronic and Information Engineering Guangzhou, China
Abstract—To settle out the problem that search of speaker change point (SCP) is blind and exhaustive, mean shift is proposed to seek SCP by estimating the kernel density of speech stream in this paper. It contains three steps: seeking peak points using mean shift firstly, using maximum likelihood ratio (MLR) to compute the MLR value of the peak points secondly, and seeking SCPs from MLR value using the maximum method thirdly. The relationship of MLR and BIC is given then. Compared with those methods of using metric or model, the process of seeking SCP is no longer blind because mean shift always points the direction of maximum increase in the density. The experiments show that the proposed algorithm can arrive a comparable result against to BIC and DISTBIC, while it can save detection time, for a 3-second speech segment , the time using the proposed algorithm is about 60% of DISTBIC and 45% of BIC . Further investigation and improvement about this method is discussed at the end of this paper.
Index Terms—Speaker change detection, mean shift, kernel density estimation, peak point, maximum likelihood ratio
Abstract—To settle out the problem that search of speaker change point (SCP) is blind and exhaustive, mean shift is proposed to seek SCP by estimating the kernel density of speech stream in this paper. It contains three steps: seeking peak points using mean shift firstly, using maximum likelihood ratio (MLR) to compute the MLR value of the peak points secondly, and seeking SCPs from MLR value using the maximum method thirdly. The relationship of MLR and BIC is given then. Compared with those methods of using metric or model, the process of seeking SCP is no longer blind because mean shift always points the direction of maximum increase in the density. The experiments show that the proposed algorithm can arrive a comparable result against to BIC and DISTBIC, while it can save detection time, for a 3-second speech segment , the time using the proposed algorithm is about 60% of DISTBIC and 45% of BIC . Further investigation and improvement about this method is discussed at the end of this paper.
Index Terms—Speaker change detection, mean shift, kernel density estimation, peak point, maximum likelihood ratio
Cite: Ji-chen Yang, Qian-hua He, Yan-xiong Li, and Xue-yuan Zhang, " Speaker Change Detection based on Mean Shift," Journal of Computers vol. 8, no. 3, pp. 638-644, 2013.
General Information
ISSN: 1796-203X
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Nina Lee
Abstracting/ Indexing: DBLP, EBSCO, ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat,etc
E-mail: jcp@iap.org
-
Nov 14, 2019 News!
Vol 14, No 11 has been published with online version [Click]
-
Mar 20, 2020 News!
Vol 15, No 2 has been published with online version [Click]
-
Dec 16, 2019 News!
Vol 14, No 12 has been published with online version [Click]
-
Sep 16, 2019 News!
Vol 14, No 9 has been published with online version [Click]
-
Aug 16, 2019 News!
Vol 14, No 8 has been published with online version [Click]
- Read more>>