求知若飢大智若愚: 2008

2008年12月20日星期六

clustering on thousands of songs ?

最近在做自己的research...然後要把數千首的音樂做分群(clustering),,,
input data是一堆mp3的音樂資料(每首約30秒), 透過imdct擷取之後可得到每秒38個frame的imdct值.
目前clustering的作法是以frame為單位做分群,,,
clustering的作法是用cast的作法....
大概評估了一下記憶體的使用量....3k * 30 * 38 * dimensions= 36M * dimensions
如果我們要建一個distance matrix, 則要 n^2 / 2 的memory size, n為matrix的dimension, 以數千首音樂為例, n = 36M.
這個需求量遠大於目前機器可以容納的physical memory 的大小, 所以透過disk的方式記錄distance matrix; 目前的話一個block 約 64M.
然後如果目前cache buffer裡面沒有這個similarity的值的話, 就更新buffer的值, 透過fully buffer I/O.
更進階的做法可以模擬OS的page fault algorithm, 來計算哪個page要被swap出去, 整個還蠻有趣的.
只是分群真的分好久....

2008年11月8日星期六

Crawling Data in Amazon.com

最近因為research會用到music data，所以就利用Amazon提供的一個Web services(AWS) 。
第一步當然就先註冊一個userid，然後再看Amazon提供的programmer guide就可以抓data了。
不過有幾點事情比較有趣...主要的原因是因為需要music的content data，而Amazon所能提供的只有大概30秒左右的sample。當然，我還是把sample的曲目爬下來了...Orz(不知道有沒有違反著作權..不過我也只是做研究沒有商業用途)。
不過有趣的地方在於要找到某個mp3的原始url是一個耗時的地方(1. 從amazon抓html本身就耗時了； 2.要parse html找到正確的mp3 location也是個耗時的地方)，所以就模擬了HTTP 1.1的作法...XD
先把系統預設(daemon thread)有10個可以跑的thread，然後每當要抓mp3 data的時候就叫醒一個thread(稱為mp3 thread)。 mp3thread只有一個動作就是抓mp3的url，再利用stream的方式把mp3 download，然後繼續回去睡覺。daemon thread如果發現目前10個mp3thread都有再執行的話，就等待直到有其他mp3thread可以被使用為止。概念很簡單，不過卻是一個很基本的利用java做synchronize的問題。
不過mp3thread太多就會被Amazon reject掉，因為太短時間內有太多的request了..Orz

2008年8月10日星期日

mail server被駭

幾天前學長告知才知道mail server企圖攻擊別人的server...
然後檢查底下的log檔(主要在/var/log/secure, /var/log/messages)才知道別人利用server裡面既有的user account以及密碼進行登入, 主要原因就是密碼太過於簡單.
所以就做了底下處理
1. 允許登入SSH的帳號, 只允許一個非root的帳號登入(參考FreeBSD-安裝到SSH遠端登入)
2. 全面修改過於簡單的密碼

雖然這樣做並不是最好...不過我覺得要破就不容易了...

2008年8月2日星期六

A performance comparison between openmp and traditional multi-thread

上學期修了一堂網路的課，其中一個作業要拿目前慢慢興起的openMP與傳統multi-thread的作法做個效能上的評比。所以就把結果拿上blog給需要的人參考...如果有任何問題，歡迎底下留言or mail to me. but請勿抄襲!!
-----------------
Requirement and Architecture:
此實驗主要針對web上的application做效能上的評比，我們在這是做個file transfer protocol(簡稱FTP)。我們實做了RFC959底下幾個指令：USER、CWD(current work directory)、CDUP(up folder)、QUIT(exit)、PORT(data port)、TYPE(representation type)、RETR(retrieval file)、STOR(store)、PWD(parent work directory)、LIST(list current directory)、STAT(server status) 、SYST(server system)，並成功的測試使用FileZilla Client可以自由的連線操作。系統架構方面實作了三種模式: single-thread, multi-thread, single-thread with openMP, 三種模式的FSM參考下圖。

Testing program:
在測試方面，為了同時模擬多人連線的狀態，我們開發了一個可以同時fork出多個Process送出定義好的ftp指令。我們在這邊所用到Client端的ftp軟體是Windows底下內建的FTP Command，Testing的FSM可以參考下圖:

Experiment:
硬體環境
Server: Intel(R) Core(TM)2 Duo CPU 1.80GHz 1GB RAM
Client: (Intel(R) Pentium(R) 4 CPU 3.00GHz 1GB RAM)*2
開發環境
Microsoft Visual Studio 2005 C++
Windows XP SP2

三個不同的應用程式作下列的實驗比較

1. 調整不同的緩衝區大小

2.調整不同的使用者個數

3.調整不同的使用者進入時間

調整不同的緩衝區大小:

在這個實驗的部份，我們主要是利用不同緩衝區的大小來測試系統的反應能力；這個實驗的目的在於如果伺服器反應時間越快，代表可以處理越小的緩衝區；相對的，伺服器的反應時間慢的話，代表只能處理大的緩衝區。在實驗參數方面，User=150個process(每台測試電腦150個process)，delay time是350ms；(0.35 secs)，每個process對伺服器的指令為：60次的LIST和一個8M的影像檔。從下圖的實驗可以發現，Multi與OpenMP都能夠處理緩衝區為30的狀況；但是Single只能處理緩衝區為50的情況。針對Multi與OpenMP這兩種方法，我們可以發現Multi的時間花費比OpenMP的還要少；可以大概推測Multi是有一個額外的執行緒處理目前存在緩衝區的Client，但是OpenMP則要對所有存放在等待佇列的元素用多個執行緒的方式同步處理。等到處理結束時，才會處理存放在緩衝區的Client；所以Multi的處理時間會比OpenMP的少。

調整不同的使用者個數:

在這個實驗的部份，我們主要測試系統可以處理使用者的最大上限(Upper bound)；一個強固性高的系統應該能夠允許最多使用者連到主機，並提供服務；所以這個實驗主要在測試哪種架構有最好的強固性。實驗參數方面，我們設定緩衝區大小(Buffersize)設為100，delay time為50ms(0.05 secs)。每個process對伺服器的指令為：2次的LIST。由底下的實驗數據可以顯示出Multi可以達到最高上限User=450的狀況下，整個processed time還能維持30秒以內；OpenMP則是對多只能處理User=400，但是processed time維持低於30秒以內；Single的方式只能處理User=150，但是整個時間需要134秒左右。

調整不同的使用者進入時間:

在這個實驗的部份，我們主要測試系統可以接受測試端多少delay time；越穩定的系統架構，所用的delay time要越低。這代表可以接受多個同時連線近來的使用者。在實驗參數方面，我們將緩衝區大小設為100，User=150；每個process對伺服器的指令為：60次的LIST和一個8M的影像檔。在下圖實驗我們可以發現Multi與OpenMP都可以接受每個process有delay time=200ms(0.2 secs)的效能；但是Multi所展現的成果最好，約134秒；而Single只能接受每個process以delay time=350ms(0.35 secs)的連線速度；有此可知Multi與OpenMP的效能差異不大，都比Single的效能好。

2008年7月3日星期四

simple parallel processing in windows

玩data mining有個好處就是在處理大量資料的時候可以盡情的加速...do as whatever you can do :)

所以現在在coding的時候發現可以加速的地方都會加速..

so 底下提供一些sample, 把你要parallel的指令放入一個list, 然後call c_parallen()就可以達到parallel.

如果用MinGW 記得加上 -mwindows
windows.h記得include

#define UNICODE
#include <windows.h>
#include <iostream>
#include <vector>
using namespace std;
PROCESS_INFORMATION* c_fork(const char * cmd){
STARTUPINFO si;
PROCESS_INFORMATION *pi = (PROCESS_INFORMATION*)malloc(sizeof(PROCESS_INFORMATION) );
ZeroMemory( &si, sizeof(si) );
si.cb = sizeof(si);
ZeroMemory( pi, sizeof(PROCESS_INFORMATION) );
TCHAR Tcmd [4096];
mbstowcs(Tcmd, cmd, (sizeof(TCHAR) * strlen(cmd) ));
// Start the child process.
if( !CreateProcess( NULL, // No module name (use command line)
Tcmd, // Command line
NULL, // Process handle not inheritable
NULL, // Thread handle not inheritable
FALSE, // Set handle inheritance to FALSE
0, // No creation flags
NULL, // Use parent's environment block
NULL, // Use parent's starting directory
&si, // Pointer to STARTUPINFO structure
pi ) // Pointer to PROCESS_INFORMATION structure
)
{
printf( "CreateProcess failed (%d)\n", GetLastError() );
return NULL;
}
return pi;
}
void c_parallel(const char* cmdlist){
ifstream in(cmdlist, ios::binary);
string line;
vector<PROCESS_INFORMATION*> process;
while( ! in.eof()){
getline(in, line);
if( !line.empty() ){
cout << "do:" << line << endl;
process.push_back( c_fork( line.c_str() )) ;
}
}
//joint
for(int i=0; i < process.size(); i++){
WaitForSingleObject( process[i]->hProcess, INFINITE );
}
//close
for(int i=0; i < process.size(); i++){
CloseHandle( process[i]->hProcess );
CloseHandle( process[i]->hThread );
free(process[i]);
}
}

2008年6月12日星期四

simple byte-oriented file copy

/**
* simple copy image,
* @param src
* @param dest
*/
public void copyImage(File src, File dest){
byte[] buffer = new byte[1024];
FileInputStream r = null;
FileOutputStream w = null;
int r_num=0;
try{
r = new FileInputStream( src);
w = new FileOutputStream( dest);
r_num = r.read(buffer, 0, 1024);
while(r_num != -1){
w.write(buffer, 0, r_num);
r_num = r.read(buffer, 0, 1024);
}
}
catch(IOException e){
e.printStackTrace();
}
finally{
try{
if(r != null)
r.close();
if(w != null)
w.close();
}
catch(IOException ex){
ex.printStackTrace();
}
}
}

可用於任何byte-oriented file, ex: image

2008年5月27日星期二

簡易的java compression and decompression

目前在做之前學長的Video Annotation tool多個model的整合...常常training出來的model感覺檔案都很大...(因為我只有把它tar起來XD)...有鑑於此,,所以寫了一個簡易的compression/decompression的程式..主要改自Simple String Compression Functions 不過為了符合我的需要,input/output都是一個file,,不過這個blog好像沒辦法上傳檔案..XD所以底下大概看一下吧XD

compression 的infile就是原先的tar起來的file, outfile就是compression放置的地方
decompression的動作就是相反.

/**
* the input is the file that we want to compression
* the output is the compressed file
* @param infile
* @param outfile
* @return
* @throws IOException
*/
static public void compression(String infile, String outfile)throws IOException{
BufferedInputStream fin = null;
BufferedOutputStream fout = null;
try{
byte[] buffer = new byte[1024];
int offset = -1;
fin = new BufferedInputStream( new FileInputStream(infile));
fout = new BufferedOutputStream ( new FileOutputStream(outfile));
ByteArrayOutputStream out = new ByteArrayOutputStream();
ZipOutputStream zout = new ZipOutputStream(out);
zout.putNextEntry(new ZipEntry("0"));
while( (offset = fin.read(buffer)) != -1){
zout.write(buffer, 0, offset);
}
zout.closeEntry();
byte[] compressed = out.toByteArray();
zout.close();
fout.write(compressed);
fout.close();
}
catch(IOException e){
new IOException("simpleCompression.compression exception");
}
finally{
if(fin != null){
fin.close();
}
if(fout != null){
fout.flush();
fout.close();
}
}
}
/**
* the input is the compressed file
* the output is the original file
* @param compressed
* @return
* @throws IOException
*/
static public void decompression(String infile, String outfile)throws IOException{
int read=0;
byte[] buffer = new byte[1024];
BufferedInputStream fin = null;
BufferedWriter fout = null;
ZipInputStream zin = null;
ByteArrayOutputStream out = null;
try{
fin = new BufferedInputStream( new FileInputStream(infile));
zin = new ZipInputStream(fin);
ZipEntry entry = zin.getNextEntry();
out = new ByteArrayOutputStream();
int offset = -1;
while((offset = zin.read(buffer)) != -1) {
out.write(buffer, 0, offset);
}
fout = new BufferedWriter(new FileWriter(outfile));
fout.write(out.toString());
}
catch(IOException e){
new IOException("simpleCompression.decompression exception");
}
finally{
if(fin != null)
fin.close();
if(fout != null){
fout.flush();
fout.close();
}
if(out != null)
out.close();
if(zin != null)
zin.close();
}
}

求知若飢大智若愚

2008年12月20日星期六

clustering on thousands of songs ?

2008年11月8日星期六

Crawling Data in Amazon.com

2008年8月10日星期日

mail server被駭

2008年8月2日星期六

A performance comparison between openmp and traditional multi-thread

2008年7月3日星期四

simple parallel processing in windows

2008年6月12日星期四

simple byte-oriented file copy

2008年5月27日星期二

簡易的java compression and decompression

標籤

網誌存檔

關於我自己

2008年12月20日 星期六

2008年11月8日 星期六

2008年8月10日 星期日

2008年8月2日 星期六

2008年7月3日 星期四

2008年6月12日 星期四

2008年5月27日 星期二

標籤

網誌存檔

關於我自己

2008年12月20日星期六

2008年11月8日星期六

2008年8月10日星期日

2008年8月2日星期六

2008年7月3日星期四

2008年6月12日星期四

2008年5月27日星期二