定義としては、
ある目的（毒キノコかそうでないか、ある商品を購買するかどうかなど）に到達するために、
データの各属性の条件分岐をして、グループに分けていく方法を　決定木といいます。
目的に辿りつくのにいろいろなルートがあり、それがツリー状になっているために決定木といいます。
さらに、この決定木は、目的となる変数がカテゴリー変数の場合は分類木という。

環境

python 3.5.2
sklearn

コード

python3 です。
github.com

データセット

毒キノコの、判定問題

処理

DecisionTreeClassifier　を、使用し、
学習、予測、評価のスコアを求めます。

# 学習データとテストデータに分ける
X_train, X_test, y_train, y_test = train_test_split(X,Y, random_state=50)

# 決定木インスタンス（エントロピー、深さ5）
tree_model = DecisionTreeClassifier(criterion='entropy',max_depth=5, random_state=50)

tree_model.fit(X_train,y_train)

print("train:",tree_model.__class__.__name__ ,tree_model.score(X_train,y_train))
print("test:",tree_model.__class__.__name__ , tree_model.score(X_test,y_test))

2019-03-29

sklearnで、 SVM (サポートベクターマシン)　分類問題

machineLerning 機械学習人工知能 AI AIチュートリアル

index:

概要
環境
コード
データセット
処理
結果

概要

機械学習で、SVM　を使用した。分類問題をテストしてみたいと
思います

SVM 定義としては、下記のようでした
訓練データにおいて、他クラスの中で最も近い位置にあるサポートベクタを基準として、
距離（マージン）が最も大きくなるように境界線を引く方法

環境

python 3.5.2
sklearn

コード

python3 です。
github.com

データセット

load_breast_cancer のデータ使用しました。

処理

LinearSVC　を、使用し、
学習、予測、評価のスコアを求めます。

model = LinearSVC()
clf = model.fit(X_train,y_train)
#pred= model.predict(X_train)
pred= model.predict(X_test )
#print("pred=", pred.shape )

df= pd.DataFrame(pred)
print(df.head() )
#quit()
print("train:",clf.__class__.__name__ ,clf.score(X_train,y_train))
print("test:",clf.__class__.__name__ , clf.score(X_test,y_test))

結果

(426, 30) (143, 30)
(426,) (143,)
   0
0  1
1  1
2  1
3  1
4  0
train: LinearSVC 0.870892018779
test: LinearSVC 0.874125874126

2019-03-27

sklearnで、 k-NN(K-Nearest Neighbor Algorithm)　分類問題

machineLerning AIチュートリアル AI 機械学習人工知能

index:

概要
環境
参考
コード
データセット
処理
結果

概要

機械学習で、k-NN　を使用した。分類問題をテストしてみたいと
思います

環境

python 3.5.2
sklearn

参考

https://qiita.com/fujin/items/128ed7188f7e7df74f2c

コード

python3 です。
github.com

データセット

iris のデータ使用しました。

処理

KNeighborsClassifier　を使用し、
学習、評価結果(score) を、リストに追加

list_nn = []
list_score = []
for k in range(1, 31): # K = 1~30
  # KNeighborsClassifier
  knc = KNeighborsClassifier(n_neighbors=k)
  knc.fit(X_train, Y_train)

  # 予測　
  Y_pred = knc.predict(X_test)
  #print("Y_pred=", Y_pred.shape)
  #print("Y_pred=", Y_pred[:10 ])

  # 評価 R^2
  score = knc.score(X_test, Y_test)
  print("[%d] score: {:.2f}".format(score) % k)

  list_nn.append(k)
  list_score.append(score)

結果

２０回、過ぎると。過学習になるようです。
回数を、少なめにして調整した方が、よさそうです

データ数 = 150  特徴量 = 4
[1] score: 0.97
[2] score: 0.97
[3] score: 0.97
[4] score: 0.97
[5] score: 0.97
[6] score: 0.97
[7] score: 0.97
[8] score: 0.97
[9] score: 0.97
[10] score: 0.97
[11] score: 0.97
[12] score: 0.97
[13] score: 0.97
[14] score: 0.97
[15] score: 0.97
[16] score: 0.97
[17] score: 0.97
[18] score: 0.97
[19] score: 0.97
[20] score: 0.97
[21] score: 0.97
[22] score: 0.97
[23] score: 0.97
[24] score: 0.95
[25] score: 0.95
[26] score: 0.95
[27] score: 0.95
[28] score: 0.92
[29] score: 0.89
[30] score: 0.95

2019-03-16

入力文章から、俳句短歌を抽出する。WEB画面の追加自然言語処理(8)

自然言語処理機械学習人工知能 flask AIチュートリアル AI machineLerning

index:

概要
環境
コード
データセット
処理など
関連のページ

概要

前の、自然言語処理で TF-IDF 関係となりますが、
俳句、短歌を機械学習させて、入力文章と、
類似文章を抽出する。web画面の作成となります。

・画面は、前回と同じ Bot UIライブラリを
使用しています。

f:id:knaka0209:20190317113548p:plain

環境

python 3.5.2
janome
sklearn
flask

コード

python3 です。

github.com

データセット

俳句の文章は、NHKさんのページを
参考させて頂きました。

http://www.n-gaku.jp/taikai/tanka/h30/index.html

・PDFのテキストを、エディタにコピーし
　excelに、入力して。CSV形式にして保存

処理など

train.py
pandas で、CSV読み込み、
TfidfVectorizer　で、学習、pickle で、保存して
flaskr/files/　に、配置

#
csv_data = pd.read_csv("flaskr/files/bun_data.csv" ,encoding="SHIFT-JIS" )
#print(csv_data.head() )

tokens=[]
for item in csv_data["bun"]:
    #print(item)
    token=get_token(item)
    tokens.append(token)
#print(len(csv_data.columns) )
#print(tokens )
#quit()

#print(tokens )
#quit()
docs = np.array(tokens)

vectorizer = TfidfVectorizer(use_idf=True, token_pattern=u'(?u)\\b\\w+\\b')
print(tokens)
#quit()
vecs = vectorizer.fit_transform(docs )
print("#vecs :")
print(vecs.shape )
##print(vecs[0] )

#save
file_name="params.pkl"
with open(file_name, 'wb') as f:
    pickle.dump(vectorizer, f)
print("#save vectorizer OK!")

views.py
入力文章を、評価して。
類似文章を抽出して、画面に出力

#
@app.route('/test3', methods=['GET', 'POST'])
def test3():
    print("#test3")
    ret="sorry, nothing response."
    if(len(request.form ) > 0):
        text=request.form['intext']
        print("text=",text ,"len=", len(text) )
        if(len(text) >0):
            ret=vect.predict(text )
        print  (ret)
    return ret

TF-IDFで sklearn.feature_extraction.text.TfidfVectorizer の結果を保存する。自然言語処理(7)

自然言語処理 machineLerning 機械学習 AI AIチュートリアル人工知能

index:

概要
環境
コード
関連のページ

概要

前の、自然言語処理関係となりますが、
TF-IDFで、学習済みの　TfidfVectorizer　を保存して、
評価時に、読込むテストしてみました。

環境

python 3.5.2
janome
sklearn

コード

github.com

python3 です。

train.py
fit_transform 後に、pickleで、dump　保存します。

vectorizer = TfidfVectorizer(use_idf=True, token_pattern=u'(?u)\\b\\w+\\b')
print(tokens)
#quit()
vecs = vectorizer.fit_transform(docs )
print("#vecs :")
print(vecs.shape )
##print(vecs[0] )

#save
file_name="params.pkl"
with open(file_name, 'wb') as f:
    pickle.dump(vectorizer, f)
print("#save vectorizer OK!")

・評価
pickle.load で、読み込み。
評価
transform　で、ベクトル化
Cosine類似度　の出力

file_name="params.pkl"
vectorizer =None
with open(file_name, 'rb') as f:
    vectorizer = pickle.load(f)
print("load vectorizer OK!!")

#
vecs= vectorizer.transform( docs )

#print(tokens)
#str="利用人数は？"
#str="契約期間"
str="価格は？"

instr = get_token(str ).strip()
print("instr=", instr )
x= vectorizer.transform( [  instr ])

#print( "x=",x)
#Cosine類似度（cosine_similarity）の算出
num_sim=cosine_similarity(x , vecs)
print(num_sim )
index = np.argmax( num_sim )

print("word=", words[index])

・評価の、結果
=>類似文章が、出力できました。

　>python  pred.py
load vectorizer OK!!
instr= 価格 は ？
[[ 0.12541425  0.12972001  0.          0.14071807  0.52054432]]
word= 製品価格、値段はいくらですか？

Codeigniter 初級編、フォーム追加など。 #php #web

web php

index:

環境とか
参考
追加手順
ルーティング
フォーム追加
参考の設定例など

概要:
Codeigniter3 php 軽量フレームワークの導入編的な内容となります。

環境とか

CodeIgniter-3.1.10
php 5.6
mysql

参考

https://codeigniter.com/

https://github.com/bcit-ci/CodeIgniter

https://qiita.com/seihowlow24/items/a06eb2449e2760a13277

追加手順

https://codeigniter.com/
の、download おす。
zip取得、解凍する

apche の場合は、htdocs 下に設置

application/
下に、ユーザーコードを配置するようです。

・ＤＢ設定
事前に、mysql DB/table を作成しておく。

config/database.php

username、password、database
　を、任意に設定。

$db['default'] = array(
	'dsn'	=> '',
	'hostname' => 'localhost',
	'username' => 'user123',
	'password' => 'pass123',
	'database' => 'db123',
	'dbdriver' => 'mysqli',
	'dbprefix' => '',
	'pconnect' => FALSE,
	'db_debug' => (ENVIRONMENT !== 'production'),
	'cache_on' => FALSE,
	'cachedir' => '',
	'char_set' => 'utf8',
	'dbcollat' => 'utf8_general_ci',
	'swap_pre' => '',
	'encrypt' => FALSE,
	'compress' => FALSE,
	'stricton' => FALSE,
	'failover' => array(),
	'save_queries' => TRUE
);

ルーティング

準備ですが、
ルーティング設定に必用な .htaccess
が、含まれてなかったので。下記を追加して配置
しておきます。

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L]

フォーム追加

参考：
https://codeigniter.com/user_guide/tutorial/news_section.html

https://codeigniter.com/user_guide/tutorial/create_news_items.html

・リスト
コントローラ：
controllers/news.php
https://github.com/kuc-arc-f/codeig/blob/master/application/controllers/news.php

f:id:knaka0209:20190313150146p:plain

・追加、画面
views/news/create.php
https://github.com/kuc-arc-f/codeig/tree/master/application/views/news

f:id:knaka0209:20190313150336p:plain

・ルーティング追加
config/routes.php

$route['news/edit/(:any)'] = 'news/edit/$1';
$route['news/create'] = 'news/create';
$route['news/(:any)'] = 'news/view/$1';
$route['news'] = 'news';

・URLは、下記で開けました。(ローカルの場合です。 )

http://127.0.0.1/プロジェクト名/news/

下記でも、開きました、
http://127.0.0.1/プロジェクト名/index.php/news/

参考の設定例など

php5.6 です。
github.com

概要

環境

web画面

コード

データセット

関連

概要

環境

コード

データセット

処理

概要

環境

コード

データセット

処理

結果

概要

環境

参考

コード

データセット

処理

結果

概要

環境

コード

データセット

処理など

関連のページ

概要

環境

コード

関連のページ

環境とか

参考

追加手順

ルーティング

フォーム追加

参考の設定 例など

参考の設定例など