AdSense

Saturday, December 28, 2019

Five types of job/work left for humans in the era three of Automation (21st century) - machines take away decisions, in addition to dirty, dangerous, and dull jobs

Five types of job/work left for humans in the era three of Automation (21st century) - machines take away decisions, in addition to dirty, dangerous, and dull jobs

(1) Step up.
See machines based on higher intellectual ground, design, evaluate, apply, and expand machines

(2) Step aside.
Jobs that need human intelligence and/or hands. Understand humans subtle feeling and care. Do fine-tuning by hands.

(3) Step in.
Jobs that bridge new technologies and business including entrepreneurs.

(4) Step narrowly.
Non-cost-effective jobs for machines. Very special jobs that can be done by a very few people.

(5) Step forward.
Jobs that produce new systems including IT specialists, data scientists, machine learning engineers, IT consultants, programmers, white hackers, etc.


Reference:
Beyond Automation
by Thomas H. Davenport and Julia Kirby
https://hbr.org/2015/06/beyond-automation

Monday, December 9, 2019

Checklist: hiring an administrative / operational member

This is my own checklist when hiring an administrative / operational member (neither specialist/creative nor management).

Skills:
1. Careful and passive listening
2. Process streamlining (What needs to be done, how, when, by who?)
3. Operations (Documentation, IT)

Mindset:
1. Not selfish. “For members and/or executive/management of an organization”
2. Mentally stable - be able to manage his/her own feeling by him/herself
3. Can-do (proactive) attitude with caution
4. Be based on objective facts, not subjective opinions (personal preferences) - knowing that acting from a sense of “personal justice” is selfish and cheap entertainment
5. A doer, not a critic

Friday, August 23, 2019

My own principled approach

My own principled approach, based on my portfolio management career:

1. Accept reality
2. Don’t make an emotional decision - be objective
3. Make a right decision, rather than focusing on making a short-term profit / win
4. Play a game from a long-term perspective
5. Don’t have a big ego
6. Don’t stick to realized losses and/or profits
7. Analyze, improve, and then repeat this process over and over again


Hope you like this.

Sunday, August 18, 2019

[Financial Analysts Journal] The Impact of Crowding in Alternative Risk Premia Investing

Nick Baltas (2019) The Impact of Crowding in Alternative Risk Premia Investing, Financial Analysts Journal, 75:3, 89-104, DOI: 10.1080/0015198X.2019.1600955

To link to this article: https://doi.org/10.1080/0015198X.2019.1600955 
The analysis shows that divergence premia, such as momentum, are more likely to underperform following crowded periods. Conversely, convergence premia, such as value, show signs of outperformance as they transition into phases of larger investor flows. 

[Financial Analysts Journal] Are Passive Funds Really Superior Investments? An Investor Perspective,

Edwin J. Elton, Martin J. Gruber & Andre de Souza (2019) Are Passive Funds Really Superior Investments? An Investor Perspective, Financial Analysts Journal, 75:3, 7-19, DOI: 10.1080/0015198X.2019.1618097

In the last ve years, passive funds have increased from 16.4% of the assets under management to 26%.

An investor seeking to use passive portfolios to beat an active fund and attempting to use the Fama–French (market, small cap, value) or Carhart (these three plus momentum) methodology does not have an easily implementable strategy. 
(1) the authors searched for a parsimonious set of indexes that correctly price other indexes and (2) the authors show that exchange-traded funds (tradable assets)—rather than indexes—can be used to construct a set of portfolios that outperform active mutual funds. ETFs can be bought and shorted.

The authors found that a combination of five ETFs captures most of the variation in all available ETFs. Five ETFs consist of CRSP Market, Russell 1000 Growth, Russell 1000 Value, Russell 2000 Growth, and Russell Midcap Value 

Investors can outperform active funds by buying the lowest-cost ETF that matches each fund’s benchmark, but they can do significantly better by using the five-ETF model the authors developed in this study.

[Financial Analysts Journal] Choosing and Using Utility Functions in Forming Portfolios

My own summary for the paper "Choosing and Using Utility Functions in Forming Portfolios" on Financial Analyst Journal, Volume 75 Number 3, Third Quarter 2019:

- Utility functions and related analysis should be tailored (i.e., purposefully selected) to reflect the investor's circumstances. In this article, it is illustrated for four investor types (a private investor, an endowment fund, a defined-benefit fund, and a retired individual).
- Limitations of mean–variance analysis (essentially a single-period approach ): (1) portfolio return and risk over a discrete horizon (a problem for long-term investors under various economic and market conditions), (2) diversified investor objectives, e.g., delivering a real return, a required income stream, or sufficient assets to cover liabilities, and (3) return distributions are highly (negatively) skewed (and even with high kurtosis).

- Analyzing the mean and variance of his or her portfolio returns over some discrete time horizon is only vaguely relevant to the main concern—namely, the stream of income that can be drawn (from total assets) over time (and total liabilities).
- Choosing a utility function (with parameters) that is fit for a purpose seems more important than seeking validation from an unsettled literature (for functional forms and parameters).
- Advantages of utility functions: (1) available for return distributions of any shape, (2) considerable flexibility in available functions that can encapsulate a wide range of investor objectives and preferences (various time horizons, both up and down-sides of markets, combined strategies in investment and withdrawal in a dynamic framework)
- Utility functions: power utility and two variations of reference-dependent utility

Reference:
Geoffrey J. Warren (2019) Choosing and Using Utility Functions in Forming
Portfolios, Financial Analysts Journal, 75:3, 39-69, DOI: 10.1080/0015198X.2019.1618109
https://doi.org/10.1080/0015198X.2019.1618109

Saturday, August 3, 2019

R: Stepwise Regression

EQ_z.csv
SR,InstAUMNetFlow,Views,PassedScreens
0.166649,-0.156857166,-0.339654666,-1.707072202
0.154196,-0.291437111,-0.563715913,-1.665988363
-0.337022,-0.500222653,1.279987797,-0.017351088
1.247881,-0.581146411,-0.057977701,-0.779269670
0.501430,-0.509905650,-0.493296583,0.271775086
0.450267,-0.00902609893963425,0.114869456,-0.220068958
0.063694,-0.169655731,-0.083584568,1.192730775
-0.929903,-0.285324230831457,-0.602126250174628,-0.432616065
-0.509550507810622,-0.338434174,0.620607668,1.785622952
0.365224,-0.186896294,2.233848148,1.693394970
-1.580300,-0.233611745,-0.192414378,0.614118131
-0.482395,-0.169307101,-0.499698211,0.083536853
-0.514293,-0.165160241,-0.570117541,-1.113417949
-0.595051,-0.159739478,-0.589322781,-1.060947977
1.016973,-0.285852677,-0.422877253,-0.824485627
0.610773,-0.231633552,-0.461287732,0.714046898
-0.064182,-0.380821894,0.895882650,1.899020903
0.512515,-0.145408384,-0.218021245,1.154076782
-0.504917,-0.076575250,-0.602126250174628,-0.012381866
0.107556,-0.262600275,-0.493296583,-1.370406874
-1.022209,0.170890895,-0.512501823,-1.198996834
-1.724111,-0.156606065,-0.563715913,-1.367083438
1.166195,0.326738355,-0.083584568,1.734931596
0.809593,0.788629318,0.851070543,0.342095777
-0.649458,0.009107495,-0.256431724,0.921755174
-1.377667,-0.140383296,-0.320449427,-1.526152862
0.990153,1.411729505,-0.025968850,0.159684956
-1.614842,-0.211492345,-0.454886104,0.833875622
-0.264585,-1.069441697,-0.544510674,0.687914906
-1.241969,-0.985389063,-0.211619617,0.659963902
0.698007,-0.079714679,-0.448484476,0.257999353
1.089769,-0.162613459,-0.467689715,1.383121871
1.32550406943183,-0.585543466,-0.50610019449971,0.797317866
-0.600755,-0.181037499,-0.480492971,-0.430425758
-0.576928,-0.204406704,-0.352457922,1.217210296
0.082060,-0.447691794,0.703830254,1.369733531
1.747376,-0.240025438,0.102066199,1.511422435
1.04311685392753,-0.123980924,0.953498724,-0.047212071
0.970067,-0.199495276,0.242904505,1.674865264
-0.827666,0.118408847,-0.032370833,-1.451344886
1.611067,0.003951596,-0.525305434,-0.664270486
-0.265309,-0.291181306,-0.525305434,0.325396378
-0.283334,-0.450520959,-0.595724515499817,-0.409069499
-0.914353,-0.137189795,-0.173209138143544,-0.018567852
0.620408,0.116871706,-0.141200287,-0.515058937
-0.916319,-0.135987571,0.537385082,1.587913661
-1.118505,0.463280152,0.223699265,-0.033130971
-0.187776,-0.146433596,0.793454824,-0.003651244
0.438049,-0.368254819,0.530983098,0.614541695
-0.756349,-0.351659291,-0.410073997,-0.792100537
-2.007125,-0.340248485,-0.461287732,-1.092968001
1.213043,12.4498941614487,0.659018147,-1.629365689
1.481931,0.020246936,1.440031342,0.359641373
0.470195,-0.170614099,4.10956035415073,1.52176147007119
-0.098040,-0.243894430,4.51287074079907,0.668089908
0.958451,-0.521371297,-0.128397031,-1.091215545
1.526234,-0.585420110,-0.121995047,-0.338415089
-0.158304,-0.172770980,-0.557313929687203,0.954070028
-1.594917,-0.146478676,-0.454886104,0.584381433
0.184019,-0.216141007,-0.288440576,-1.142214482
0.193652,0.109340445,0.338930702,1.287215772
-0.025796,0.228702088,0.140476679,-0.910990804063595
-0.039563,-0.155894975,2.957245624,1.349714518
-0.069569,-0.179328085,-0.057977701,-0.262345150
0.664437,0.422490684,-0.365261533915373,-0.239523500
-0.093190,0.132424983,-0.077182940,-1.041415129
-0.627599,-0.245360205,5.153045275,2.428240960
-0.496841,5.156746073,0.466965751,-0.967271399
0.687281,-0.285607501,-0.62133152530456,-0.081582118
-0.046337,-0.264507667,-0.595724515499817,0.296519072
-0.516874,-0.278610357053612,-0.602126250174628,-0.351638819
1.212922,1.297108576,-0.352457922,-0.287037727
-0.555899,-0.153226884156467,-0.518903451,-1.436791707
-0.428942,-0.219646387,0.050852109,-0.183548979
1.671753,-0.240744285845635,0.082860960,-1.528688720
1.455031,0.086150665,0.812660064,1.330511103
-0.848843145917255,-0.238634721002062,-0.486894955,-0.144817964
-0.342203,-0.402923296,-0.006763610,0.722149688
-0.261341,-0.149054440,-0.499698211,0.591711257
0.145620,-0.148280710,3.226119334,1.419738418
-0.648728,-0.110501277,-0.371663162,-1.858500602
0.165561,-0.192266479,-0.0515760726638822,0.724504575
1.316033,0.287768037,2.765193228,0.608901887
1.07672117033893,-0.047766335,-0.314047443,-0.545639758
0.043489,-0.501971986310643,-0.422877253,-0.990917819
-0.335564,-0.098984794,-0.627733259979371,0.253520512
0.275134,-0.154490127,-0.467689715,0.671275591
-1.071435,-0.268930969,0.230100893,0.839687754
-1.677718,-0.367618028,-0.50610019449971,-0.736077500
-2.792272,-0.000985566,-0.205217989,0.713414334
-3.162712,0.554435563,-0.499698211,0.664862878
-0.476443,3.616921479,0.242904505,-0.502830936
-0.446883,0.240895436,-0.576519169,0.378166883
0.804497,1.80429838316979,0.441358528,-1.102479911
0.810396395383469,-0.209836082,0.659018147,1.230376055
1.512610,-0.283147993,-0.442082492,0.261949023
-1.125775,-0.177942350,-0.576519169,-1.538362044
-0.235747,-0.320532243,-0.50610019449971,0.563469366
-0.058605,-0.217158234,-0.422877253,1.147833714
-0.405116,-0.523915691,-0.365261533915373,0.304939696
0.558427,2.104307446,-0.589322781,-1.045967109
0.149047,-0.140040407,-0.614929755077,-1.901264881
0.221256,-0.343111123,-0.333252683,0.588021430
0.707488,-0.014892809,-0.589322781,-0.937741730
-0.439462259823633,-0.217419091,-0.403672013,-0.427858664
-2.666662,-0.191603729,-0.582921153,-0.616014648
-0.682666,-0.359022633,2.073804603,1.323476645
0.495517,-0.143849653,0.556590321,1.261554434
0.056090,0.629195981,-0.378065145,-1.342782390
-0.152172,-0.154590226,-0.454886104,0.609973799
-0.922913,-0.731770140,-0.614929755077,-1.672240955
0.503033,0.327271595,-0.474091343,-0.371997143416322
0.733606,4.334808131,-0.582921153,-1.114098241
-0.452835,0.054434454,-0.570117541,-1.151455365
0.646877,0.262949594,-0.461287732,1.027046109
0.338956,-0.508091644,-0.557313929687203,0.848812018
2.517237,0.172267272,0.057253737,-1.440530678
2.073795,-0.000422336,0.703830254,1.905521730
0.204219,-0.236326787,0.082860960,1.181597124
0.137541,-0.199108968,-0.346056294,0.390286333
-0.188497,0.269261788,-0.371663162,-0.139886013
-1.109823,-0.404126300,0.742240734,-0.815656001
-0.285400,-0.155012302,0.121271439,-0.517587686114939
0.045143,2.240875225,-0.538108690,-0.126841353
-1.334375,0.046212598,-0.371663162,1.201494118
-0.606802,0.019858054,-0.205217989,-0.248800259
-0.396725,-0.121030907,-0.429279236,1.320175759
-1.415686,0.211087536,-0.314047443,-1.711763381
-0.278711,0.226959336,-0.269235336,0.467789822
-0.096545,0.025187559,-0.50610019449971,-0.802137391
-0.516295,-0.057066761,-0.614929755077,0.779983516
-1.027395,0.094618509,-0.627733259979371,0.798629852
-0.545422,-0.015447395,-0.614929755077,-0.630656781
0.051858,6.624260555,-0.582921153,-1.123805035
-0.900951,-0.697362531,-0.563715913,0.263159810
-1.202283,0.115722825,-0.570117541,-0.515427876
-0.362307,-0.011262018,-0.627733259979371,0.283113289
-0.800682242758461,-0.224438992,-0.614929755077,-0.056554092
-0.652601,-0.387034906,-0.461287732,-1.791301809
-1.098117,-0.160149396,-0.147602271,-0.493400062
-1.63154368233283,-0.020919705,-0.557313929687203,0.281200510
0.244335,-0.176245011,-0.62133152530456,-1.460966125
1.465025,3.243307428,-0.614929755077,-1.135711218
-0.533428,-0.008721523,-0.62133152530456,-1.415419239
0.832427,-0.187895057,-0.614929755077,-1.556425900
0.503868,-0.203795809,-0.538108690,-0.240448608
-0.404268,-0.195043593,0.159681918212622,0.177761623
-0.342691339214384,0.367135767,1.011114443,-0.117205240
-1.363586,-0.714729981,-0.378065145,1.062625204
-0.0291831451466879,-0.163401047125453,-0.486894955,1.42436296959336
0.025463,-0.308040187,-0.282038948,-0.182677888
0.296520,-0.269652846,0.306921851,1.448246522
0.293719,-0.15204142156903,-0.365261533915373,-0.464952356
0.516443,-0.364967486,2.592346072,1.269076755
-2.058593,-0.212214199,-0.448484476,0.647099626
0.169856,-0.555138522,-0.230824857,-0.128908681
-1.959894,-0.218208718,-0.205217989,-0.805760065
1.17412613465405,-0.186229521,-0.486894955,0.537748894014036
-0.252419366590358,0.060113992,-0.544510674,1.282120303
0.042213,-0.139898112,-0.326851055,1.22619732272138
1.072650,-0.058675634,-0.224423229,-0.520257100881839
-0.131594,-0.374633821,-0.166807510,0.992511047
-0.193004187652273,-0.661820519,-0.416475625,-0.299986538
-0.869399,-0.369654087,-0.294842203623205,-0.044698105
-1.091420,-0.160339150281318,-0.493296583,-0.34309811719161
0.270032647051064,0.700990534,-0.62133152530456,-0.911526729601099
-0.045317,-1.174445386,-0.62133152530456,-0.840226033
2.3578064780604,-1.045646381,-0.557313929687203,-0.385457004
2.816186,0.065752922,-0.467689715,-0.084547627
0.218480,-0.250210386,-0.493296583,1.367042356
1.626564,-0.188026043,-0.557313929687203,0.178146145
-0.172926136966215,-0.149647665,-0.531707062,0.137729037
1.043252,-0.031921372,-0.50610019449971,0.878338724
-0.010146,1.362271028,-0.525305434,-1.102339316
0.026737,-0.043280690,-0.333252683,0.122800457159431
1.777929,-0.346557692,-0.179610766,-0.536432001
1.828100,-0.116434520,5.895648346,0.809680788
0.241189,-0.212997497,0.070057348,-0.836881342
0.390361,0.164100943,-0.275636964,0.110697408
-0.045645,-0.161859510,-0.570117541,-1.730992591
-1.088554,2.752605111,0.370939554,-0.617332944
0.288309,-0.083579066,-0.166807510,1.574413303
0.409491,-0.127920567,-0.442082492,0.037301354
-0.760782,0.071530230,1.478441821,1.418339733
-0.194476,-0.159968112,-0.544510674,1.356441840
-1.480631,-0.233582892,-0.531707062,0.686630040
-0.0215217837006922,-0.492219079,-0.518903451,-0.096512367
-1.203155,-0.406431609,0.166083546,0.534807512
-1.102299,-0.405891429,-0.435680864207542,0.772658759
-1.239126,-0.392437527,-0.013165594,0.405595217956958
-1.258811,-0.338175110,1.100738657,1.307767182
-0.0571074866929552,-0.631279907,0.422153289,-0.609279682
1.346689,-0.155446548,3.424573357,-0.186056969235076
-0.0816502618078865,0.399860313,1.996983645,0.830684509
-0.418116121725825,-0.517811055,0.0892625879204542,-0.054680719
-1.193879,-0.281633785,-0.454886104,-1.25341456140632
0.411747,1.622506213,-0.512501823,0.258452483
0.685310,-0.061515460,-0.512501823,-1.459990103
1.312342,0.759407719,7.272023967,-0.942706908
-0.784210,-0.241840120,0.198092397,-0.711260609
-0.835663,-0.244683603,-0.294842203623205,-1.057384345
-0.713810,-0.117720186,-0.192414378,-1.668115916
-0.939669906783922,-0.153969315,-0.147602271,1.16188089896552
1.115271,2.750537316,1.164756360,0.824033551
-2.451241,-0.247460359,-0.576519169,-0.0681878322310263
-0.320235624631293,0.204411186,3.354154027,2.551125835
0.663240,-0.0532670453316889,-0.602126250174628,-1.748162928
-0.120364,-0.637703869931426,-0.435680864207542,-0.148486314
0.017253,-0.244514746036811,0.204494025,1.175791069
-1.019264,-0.049853329,1.97777840555593,0.366614350
-0.835251,-0.179186037,1.683297828,1.15894766726893
-0.464607,-0.017757235,0.108467827,0.537597611
0.044342,-0.382634065,0.569393577,-0.094373922
0.109894,-0.511323338,-0.486894955,-1.32972863894046
0.191216,-0.267170012,3.296538308,1.552212567
-0.756611,-0.240729359,-0.493296583,0.851676984937539
-0.790173,-0.228603585,3.264529813,-0.941020219
-1.199834,1.049973503,-0.109191791,-0.621065443
1.497972,1.121229884,-0.083584568,-1.26519855279577
0.499586,0.199358720,0.755044345,1.036440054
0.372367,-0.340695473,0.178887158,1.072409253
0.311203,-0.000079106,-0.576519169,-0.094774138
0.545567,-0.255431323,-0.422877253,0.904930884
0.452658,-0.370998602,2.16983080132776,1.605752519
-0.025794,-0.146901427704502,-0.410073997,0.233828120
-0.325617,-0.156153531449457,-0.614929755077,-0.263386140
0.203416,-0.221852770,-0.147602271,1.243387721
0.128601,-0.170818940,-0.608528020402188,-0.0370282114183059
1.676133,-0.136762779,-0.442082492,0.738883366127731
-0.689085,-0.151702576,-0.480492971,-0.764820380
-0.673910,-0.129172625,-0.627733259979371,-1.912756834
0.974111763938075,0.054761779,0.562991949,-1.179019107
-0.198597,-0.297664641,-0.371663162,0.714594280
-0.371619583294954,0.345718689,-0.531707062,-1.649571416
0.388533,-0.157785535,-0.525305434,-0.588097943
-0.857524,-0.819750999,-0.467689715,0.347405212
-1.856405,0.104352887,-0.371663162,-0.895469935
-1.082708,0.590901176,-0.333252683,0.923876479
0.215122,-0.301174877,-0.62133152530456,0.815355288900369
0.254652,-0.191463154,0.537385082,1.530271189
0.037848,-0.234846224,0.300520223,1.429582803
-0.142788,-0.287995145,0.370939554,1.475348589
0.806460040709491,-0.222282352,-0.531707062,-0.304471637989094
-0.381605771662493,-0.251782626,0.159681918212622,0.428048172263782
1.306665,0.077742335,0.684625015,1.780265143
0.972270,0.687741540,-0.582921153,0.265497902
-1.11337146370607,-0.544544558,-0.550912302,-0.709958764510182
-1.304052,-0.177799093,-0.250030096,0.381748271
0.967195,0.034682666,1.023917699,-0.079014671
-0.103158,-0.363937505,0.146878307,0.597364715
-0.931717941972734,-0.207541263,-0.550912302,-0.636814373
-0.734726,-0.410900303,-0.512501823,0.873959090
-0.089697,-0.771316999,-0.109191791,0.492869701
0.223692,-0.264803867,-0.512501823,-0.150589367
0.737930483499447,-0.166002689,0.223699265,0.577656569
0.590394,0.677136447473134,-0.589322781,-1.043926244
0.524931,0.686809722,-0.595724515499817,-1.014711932
0.732996,-0.879252681,-0.243628468435712,0.594081817
-0.468198,-0.574295968,-0.454886104,1.302082377
-2.711148,0.150427943,0.550188338,-1.232840875
1.425062,-0.298976733,-0.531707062,-0.288633653
-0.110003,-0.151505624,-0.205217989,-1.264449845
-0.744926,-0.885666340,-0.429279236,-0.854629426
0.740105,-0.242456166,-0.608528020402188,-0.249465849
-0.677305,-0.010549902,0.178887158,0.590159641
-0.084509,2.357534389,-0.461287732,-0.511695781
-0.433706,-0.0172512210161253,-0.557313929687203,-1.549619231
-0.499007,-0.155567758,-0.531707062,0.047515235
-0.408959,0.527399709,0.204494025,0.520988955
-0.184534,-0.486048762,-0.480492971,0.185771374
-0.708437,1.897217469,-0.352457922,-1.104250184
-1.84439743809279,-0.198061101,-0.154003899,0.197852238
-1.502775,-0.341208777,3.091682657,-0.545238572
-1.829733,-0.818619480,-0.173209138143544,-1.021210029
-0.384179,-0.116631443,1.638485721,-0.519112050
0.732385,-0.141731222,-0.467689715,0.232170988
-1.111414,-0.187412706,-0.326851055,-0.188490860
1.108479,-0.317224465,-0.608528020402188,-0.541438739
-0.009665,0.026959857,-0.057977701,0.849520876
1.250764,-0.471297921,-0.422877253,0.331138779677439
0.492947,-0.702836291941178,-0.186012750,-0.728984576
-0.545842,-0.157233154,-0.442082492,0.409378731
-1.882971,-0.112367342,-0.486894955,0.679356863
-1.970670,-0.162827871,-0.557313929687203,0.032802947
-1.049814,-0.527002875,0.364537570,0.596416410
-0.959943,-0.680136478,0.761445973,0.383040437798008
1.741447,0.253355419,3.783071281,0.596550128
0.757488,-0.0336197525777512,0.153279935,1.971786223
1.097444,1.471448178,0.210895653400116,-1.064515300
0.178934,-0.161106559,-0.237226485,-1.108048521
0.670503,-0.235137977,-0.346056294,0.063795308
0.395771,-0.219854668320109,-0.333252683,0.174014786
-0.399063,-0.302874336,-0.416475625,-0.759541341
0.405636,0.045509736,0.070057348,-0.354785794
0.573560,0.236871863013435,-0.218021245,-0.609116659
-0.704099,-0.127951753,1.427228086,0.939933698
-0.670001,-0.148836488,0.268511372,0.171417842
-0.381476,-0.152078983065247,-0.339654666,0.042513887
0.875449,-0.003474752,0.281314983692284,1.58808205310127
2.031020,-0.190591743,2.329874346,1.938944039
2.040787,-0.007465437,0.121271439,-0.445784534
1.467411,0.186648828,-0.294842203623205,-1.282044037
0.695175,-0.094302346,-0.448484476,-0.799534754
-2.415240,-0.944593737,-0.531707062,-1.690336441
0.433555,-0.262029262,-0.442082492,0.962542445
1.157393,-0.109842424,0.236502877,-0.432336302
0.695518,0.297248393,-0.371663162,-0.492354158
-1.202278,-0.022344937,-0.531707062,-1.687989370
0.714938,-0.216532737,1.804931249,1.628805963
-0.178556,0.336258295,-0.371663162,0.604285748
0.522765,-0.253688730,-0.589322781,-0.397326572
1.665190,2.558613987,-0.576519169,-1.297565270
1.630445,2.849634370,-0.230824857,-1.213861801
0.935602,-0.186558306,1.075131790,0.628096434542784
-0.102339,-0.161678686,0.537385082,0.378207643
-0.829830,-0.160956954,-0.390868402,0.446230093
-1.46909640118997,-0.215211091,0.415751661,-1.413657976
-3.262978,-0.954704897,-0.198816006,-0.954171771
-0.15181776525811,-0.126891266,-0.179610766,-0.075594434
0.0620927300116409,-0.155963942,-0.518903451,-1.141563848
-1.20480584979087,-0.151044754,-0.416475625,-1.635384719
1.268614,-0.295772934,1.126345881,1.254622945
1.059076,-0.315136105,4.51287074079907,1.926750358
1.530604,-0.476441124,-0.346056294,-0.450292315
1.231314,-0.188197992,0.492572619,2.07821295566167
0.525074,-0.138420952,-0.013165594,0.407384249
0.499150,9.885039821,-0.499698211,-1.305134136
0.668022,-0.202565847,0.767847957,0.898125125
1.627578,-0.111744925,0.646214536,0.050228748
-1.254800,-0.268904897,-0.365261533915373,0.081133005
0.389032,-0.145988773,-0.077182940,1.184405172
0.364970,-0.351282286,0.819061692,1.614247869
0.479758,-0.308561781434144,-0.614929755077,-1.535290776
-0.23936863364442,3.301936435,-0.486894955,-0.119445935
-0.334578,-0.468379535,-0.480492971,-0.208696890771871
0.877980,1.534519813,-0.365261533915373,-0.269124406
-1.132144,-0.385541339,-0.499698211,0.223659368
0.295220,-0.193518849,-0.557313929687203,0.759138085
2.137764,0.008422585,3.14929837529159,2.214355303
0.114446,0.023807521,1.721708308,1.744185159
1.522818,-0.086887448,-0.218021245,-0.054958632
1.704026,-0.661284316,-0.352457922,-1.098737694
0.874527,1.138119296,0.082860960,0.168140476
-0.044894,-0.523755093,-0.62133152530456,-1.542571527
-0.026520,-0.752482262,-0.62133152530456,0.150502266
0.341862,-0.077074292,-0.544510674,1.061609919
2.464107,-0.0967605931637365,-0.570117541,-0.815210121
0.442809556902141,-0.244984842,-0.595724515499817,0.826127989
0.795623,-0.133124653,-0.544510674,1.015308288
0.441996,-0.181847248,-0.595724515499817,-1.411553457
1.420904,0.0497824856575289,-0.390868402,-0.669942610
0.942824,-0.124075659,-0.563715913,-1.016109787
0.564649,1.219310017,-0.608528020402188,-1.373670832
-0.247445,-0.157315103,-0.531707062,0.214442652
2.276633,-0.154364459,-0.454886104,-0.438763505
0.434580,-0.207871502,-0.307645815,0.710767772
-0.968751,-0.306200221,-0.525305434,-1.02386551953021
0.614313,-0.190117472,-0.416475625,-1.262056472
-0.291471,-0.197292298,-0.333252683,-0.054035980
-0.056669,-0.254420769,-0.243628468435712,-0.069056507
-0.380715,-0.149527323,-0.499698211,-0.329408749
-0.413120,-0.275988802,-0.62133152530456,-1.390418546
-1.371689,-0.152792445,-0.557313929687203,-1.49994334219897
-0.313005,-0.516142469,-0.557313929687203,-0.171501404
-1.674759,3.229295134,-0.525305434,-1.609434729
-0.047441,-0.065027251,-0.570117541,-1.103667633
-0.909167,-0.200667048,1.433629714,1.186109911
-0.235025,0.153613918,-0.480492971,-0.650551924267985
-1.477054,0.125802842,-0.480492971,-0.155299080
-1.818330,-0.16549222663597,3.39256450625091,1.316282381
-0.699647,-0.498381882,-0.173209138143544,0.396019644254124
-0.387011,-0.793298583,-0.429279236,0.953868551
-1.097578,-0.943316958,-0.237226485,-0.897277916
-1.451673,-0.952958736,-0.544510674,0.964119552
-1.415615,-0.299479029,-0.518903451,0.338227265
1.664279,-0.077347546,2.656363774,-0.867307928
-0.249136,-0.506432126,-0.595724515499817,-0.982377845
2.146229,-1.215952396,-0.243628468435712,-0.855057974720736
-1.824158,-0.296217838,-0.378065145,-1.559108146
-0.330858,-0.313921034,0.300520223,0.785759104
0.064679,0.357159130,0.178887158,-1.334002443
0.51350543010726,0.034647657,0.313323835,1.527808237
-1.048691,-0.896226492445245,-0.595724515499817,-1.187308763
-0.925660,0.051368953,-0.397270385,-0.701186634
1.822954,-0.136347992,-0.50610019449971,-1.321816366
1.688174,0.184423989,0.018843257628286,-0.461230950
0.877208750890101,-0.163504278,-0.243628468435712,-0.093275336
-0.408265,2.245431536,-0.480492971,0.567443537
1.147235,-0.459170076,-0.186012750,-0.106288467
0.092168,-0.415775037,-0.525305434,0.093621374
1.400327,-0.370769775,-0.326851055,2.236338938
0.961243,-0.057604391,-0.173209138143544,-0.667896012
1.074359,-0.244973639,-0.038772461,1.170851190
0.952095,-0.047662769,-0.435680864207542,-1.024253702
0.810367,-0.199987524,-0.531707062,0.531896991616595
0.911688,-0.155212479,-0.557313929687203,0.453788959
-0.687500,-0.207219748,1.612878498,0.934943523
-0.494756154673527,-0.721022599,-0.544510674,-0.434739999
-1.076658,-0.236241117,-0.467689715,-0.840005859
-0.507938980058189,-0.181331877,-0.512501823,1.074266427
-0.854066,-0.157575763,-0.282038948,-1.280770496
-1.140147,-0.195392347,0.831865303,1.123117466
-0.967197971014018,0.379356455,1.721708308,-0.115265604
-0.860956,-0.159196533,-0.211619617,-1.583913733
-0.440717,0.036285937,-0.346056294,1.749574497
-0.388028,-0.291306205,-0.493296583,-1.434824405
-0.561796,-0.188054601,-0.570117541,-1.380435112
-1.551181,-0.160951457,-0.563715913,-1.666234332
-0.114021,-0.154703671,-0.474091343,0.145694155
-0.306305,-0.222529972,-0.589322781,-0.048007061
0.161680,-0.202047413,-0.442082492,-0.670846241
-0.565885,-0.129611968,-0.403672013,1.034105047
-0.515516,-0.546654995648727,-0.288440576,0.365140768
-0.502967,-0.075181238,-0.282038948,0.225629835
-1.039492,-0.454758461891068,0.0892625879204542,0.544640037
-1.818037,-0.635297146,0.006039646,-1.055027476
0.821009,-0.349273383,-0.544510674,-0.362321979
0.410191,0.039217295,1.728109936,1.469233244
0.417522,-0.300787745,-0.627733259979371,-1.691315850
-0.704707,1.03632091993901,-0.486894955,-1.18394951112346
-0.359895,-0.087233760,-0.333252683,-0.521366902
-1.525286,-0.130679792,-0.531707062,-0.842573792
1.02240475953559,-0.203252704,0.466965751,1.065821970
-0.674584,-0.593110789,0.018843257628286,-1.227321217
-0.180194,-0.283588894,-0.563715913,-0.931897957
0.636734,0.855310804,-0.538108690,-0.070958183
-1.299804,-0.096556794,-0.525305434,-0.403833243822609
-0.016008,-0.666014835,-0.544510674,-0.447313216
-0.170290,-0.248346111,-0.147602271,0.792481299
-0.108623537795341,-0.108304165,0.716633866,1.238095850
-0.108145,-0.327844039,-0.614929755077,0.083448133
0.149448,-0.148195861,-0.550912302,0.423497366
-0.106037,0.0130674210538852,2.803603707,0.131235177
0.545910,-0.252934693,5.293883935,1.204003868
1.838081,1.391232466,0.198092397,-1.273305670
0.201479,-0.152505536,-0.550912302,-0.420579794
0.304702,-0.148996574,-0.614929755077,-0.520865346
0.560499,-0.161948560,-0.563715913,-1.002763229
0.127568,-0.365445542,-0.314047443,0.965878824
0.123632,-0.238246080,-0.269235336,1.246399309
0.292211,0.829398027702751,-0.154003899,1.693279797
0.920194,-0.014924300,-0.582921153,-1.119227887
0.651954,1.938659887,-0.538108690,-0.972557788
1.105303,-0.196373031,-0.070781312,-0.0897229382163552
-0.083019,-0.109429821,0.684625015,0.559897837
-1.581478,-0.267792660,-0.50610019449971,-0.012037053
0.103057,0.450195832,-0.544510674,-0.605625438
0.654057,-0.238170020,-0.595724515499817,-1.385036498
0.809953,-0.267373864,-0.486894955,0.305502701
-0.974830,-0.195544737750197,-0.371663162,-0.434309367
-1.351973,-0.338456215,0.332529074,0.757216084
-0.921372,0.024699722,-0.346056294,-1.502862458
-0.735370,0.381299360,-0.499698211,-0.201082127
-0.329823,-0.475889069,-0.275636964,-1.821750711
-0.227863,-0.121992821,1.472040193,0.453775257
1.916520,-0.152060200578572,-0.474091343,-0.741558147617734
-0.168374,0.085463446,-0.525305434,0.499786386
0.799647,0.036271866,0.300520223,0.841545160973985
0.494246352541292,0.071703013,-0.365261533915373,0.475471829
-0.821204,-0.241263981,-0.570117541,-0.510147290
2.03314968216439,-0.343729374,-0.371663162,0.615231603
0.848775,-0.471221233,-0.563715913,-0.701784039
-0.531753,1.012318558,-0.057977701,0.718696123
0.172966,-0.423786741,-0.480492971,-0.309795381
-0.528577,0.204569690,-0.057977701,-0.858220047
2.012559,-0.728330487,-0.186012750,-0.235091173
1.603728,-0.085330973,0.082860960,-0.927695805303649
1.548838,-0.029998010,2.445105783,0.482108781
0.949720,-0.951233525,-0.307645815,-0.595318235
0.215603,0.151540844,0.895882650,2.178508538
0.457194,-0.254290572,1.491245433,0.721525699
-1.445906,-0.189058693,-0.230824857,0.592648661
-0.493205,0.091922688,-0.454886104,-0.332288206
-0.101620,-0.387226194,-0.314047443,1.61735137353699
-0.319855,0.453043988,-0.243628468435712,-0.619507396
0.454695,-0.526448068,-0.166807510,-0.528845186
0.533224,-0.511902980,-0.589322781,-2.071079683
1.057605,-0.069010180,0.639812908,-0.361525836
0.735009,-0.147428812,-0.493296583,-1.529903036
0.143570,-0.416016747,-0.403672013,-0.991399009
1.070312,-0.412127416539272,-0.602126250174628,-0.521141206
0.012265,-0.546924244,-0.589322781,-1.712352161
2.492121,-0.402826384,-0.538108690,-1.634800884
-1.986114,-0.410855849,-0.397270385,0.222345269
0.094664,-0.241031635,-0.474091343,0.897479133
-1.335527,-0.498889435,0.076458976,1.454588027
-1.915453,-0.373366040,-0.480492971,-0.376311031
0.541208,-0.191442915,-0.230824857,0.917916461
0.275911,-0.237416440,-0.352457922,0.852711463
0.465545,-0.151577739,-0.602126250174628,0.394784487567216
0.248552,0.435357237,-0.301244187,-0.407157317
-0.703010,0.0331906461883284,-0.147602271,-2.460426633
-0.712809,-0.332667635,-0.000361982,0.871785639
-0.929787,-0.097169153,-0.282038948,0.216550083
-1.107520,-0.056195503,-0.397270385,-0.401844040
-0.005100,-0.191880111,-0.480492971,-0.864570896
-0.647156,-0.227929866,-0.550912302,-0.423969655
-0.876038,-0.239392707,0.940695113,1.026337595
0.546336,-0.154164826,-0.339654666,-1.321754461
0.722542,-0.25495778906871,-0.410073997,0.899924470
0.216564,-0.396342475,0.102066199,1.197443421
-0.170823,-0.240529786,0.044450481,1.01151515860971
-0.255846,-0.429677319,-0.403672013,0.595139046
-0.199170,-0.154668530,-0.576519169,-0.854955269
0.756563,-0.095075528,-0.230824857,0.494108103
-0.266764605435791,0.247520130,-0.614929755077,-1.686967461
0.204927,-0.161542113,-0.262833708,1.694011305
0.714035,-0.625189117,1.171157988,0.147106511
-2.544617,-0.396081644,0.038048497,0.579440018
0.285645973308248,-0.323214741,-0.589322781,0.392991321
-1.689353,0.328752793143277,0.364537570,-1.20682513877873
1.791098,-0.211447185,-0.154003899,-1.179785612
0.755692,0.377971087,-0.512501823,-0.450847968867323
1.093103,-0.180034953,1.587271630,2.29198263070722
1.551543,-1.19420401796333,-0.474091343,0.170128476
1.304110,-0.195374843,5.953264065,2.519789571
0.453232,1.388042988,0.262109744,0.089148995
2.15874317290323,0.171647374,1.996983645,1.995051954
-2.004631,-0.547060434,-0.570117541,-0.962753829370595
0.212549,-0.215911574,-0.262833708,0.126265883
0.197826,-0.3595151129571,-0.544510674,-1.050309714
0.572398,-0.301820977,-0.570117541,-0.262379439565368
0.116465,-0.668514750,-0.589322781,-0.110644682
0.800450,-0.223934934,-0.461287732,0.489188489
1.379630,-0.266054307,-0.121995047,-0.611487948
2.018530,1.064818149,-0.307645815,-1.373922616
-2.446533,-0.635182916961691,-0.499698211,-0.267967394
0.153731,-0.395597651,0.178887158,1.071482638
0.310769955153565,-0.182106606,-0.378065145,0.751844634
2.951171,-0.410083036,-0.262833708,-0.021675856
-0.072208,-0.203587533,-0.416475625,0.529752936
-0.441732,-0.176422307137254,-0.531707062,0.310237380360702
-0.161622,0.266965478,-0.390868402,-1.670662088
-0.111939,-0.076705160,-0.403672013,1.69949419744063
0.508922,-0.633342468,-0.50610019449971,0.730943983202699
-0.029011,-0.140601673,-0.448484476,-0.750606745
-0.199691,-0.698485003,-0.461287732,1.027599437
0.389439,0.301166133329801,0.947096741,-1.141114570
-0.574762,-0.053329130,-0.525305434,-1.132169832
0.053795,-0.387914951,-0.557313929687203,-1.61369202076382
-0.065107,2.720079780,-0.314047443,-0.534571581
-1.270810,-1.146934116,-0.512501823,0.987897425762039
-0.81357446477793,0.331134138,-0.250030096,-0.224159404
-0.244980,-0.660222161,0.863874155,1.22080328274542
-1.167461,-0.029841975,1.459236581,0.495164180
0.619671,-0.203955602,-0.346056294,0.534920309
1.258805,0.374096587,0.863874155,0.295114047
0.278193,0.128140860,0.134074695,1.730215815
0.592690,-0.497047606,-0.077182940,1.611223560
0.771603,-0.229815998240306,-0.339654666,1.534909200
-0.314994,-0.489528597,-0.070781312,0.781255155893192
-1.733675,-0.599647318,-0.512501823,-0.835398042
1.778214,-0.435126821,-0.582921153,-1.979623727
-2.749096,-0.168136819,-0.595724515499817,-1.408350928
0.791925,-0.488864312,-0.454886104,1.550932423
-0.497820699251925,-1.078584581,-0.416475625,0.429106918
-0.430395,-0.114320656,-0.538108690,1.080621159
1.05320871624604,-0.266050860111263,-0.544510674,-0.690431053575897
1.320083,-0.420999556,0.044450481,1.377347223
-0.696313,4.159591986,-0.186012750,-0.445636214
-0.541551,0.312952216,-0.563715913,-0.578121659
0.388860,-0.125810866,-0.467689715,1.267342399
0.837597,-0.298751852,-0.614929755077,0.503029039
-0.541987,-0.299008729,-0.186012750,-0.508676397
-0.083920,-0.169415736,0.486170991,1.460949018
0.135914,-0.328898445,-0.576519169,0.239521551


EQ_readme.txt

# stard R Console on MacOS

# get a working directory
getwd()

# change the working directory
setwd("/Users/yoshi/Downloads/")

dat <- read.csv('EQ_z.csv')

head(dat)
#         SR InstAUMNetFlow       Views PassedScreens
#1  0.168106    -0.07055196 -0.33920655   -1.70145662
#2  0.155648    -0.09568507 -0.56343473   -1.66043722
#3 -0.335729    -0.13467625  1.28164263   -0.01438553
#4  1.249691    -0.14978894 -0.05731972   -0.77510918
#5  0.502996    -0.13648457 -0.49296294    0.27428721
#6  0.451817    -0.04294418  0.11565621   -0.21678547

# All the data are expressed in z-score.




########## 1.1 Regression Analysis (SR ~ InstAUMNetFlow)

reg_InstAUMNetFlow <- lm(SR~InstAUMNetFlow,data=dat)
summary(reg_InstAUMNetFlow)

#Call:
#lm(formula = SR ~ InstAUMNetFlow, data = dat)
#
#Residuals:
#    Min      1Q  Median      3Q     Max
#-3.2166 -0.6249 -0.0146  0.6306  2.9913
#
#Coefficients:
#               Estimate Std. Error t value Pr(>|t|)
#(Intercept)     0.02248    0.04290   0.524   0.6006
#InstAUMNetFlow  0.51079    0.22452   2.275   0.0233 *
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
#Residual standard error: 0.9966 on 564 degrees of freedom
#Multiple R-squared:  0.009094, Adjusted R-squared:  0.007337
#F-statistic: 5.176 on 1 and 564 DF,  p-value: 0.02328


#Multiple R-squared:  0.009094
# InstAUMNetFlow does not explain SR (Sharpe Ratio) very much.

plot(dat$InstAUMNetFlow,dat$SR,xlab='Inst AUM Net Flow 1Y (%)',ylab='Sharpe Ratio (USD, 1Y)')
abline(reg_InstAUMNetFlow)




########## 1.2 Regression Analysis (SR ~ Views)

reg_Views <- lm(SR~Views,data=dat)
summary(reg_Views)

#Call:
#lm(formula = SR ~ Views, data = dat)
#
#Residuals:
#    Min      1Q  Median      3Q     Max
#-3.2360 -0.5970 -0.0080  0.6647  2.9892
#
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)  
#(Intercept) 0.001304   0.041664   0.031 0.975047  
#Views       0.140832   0.041670   3.380 0.000776 ***
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
#Residual standard error: 0.9912 on 564 degrees of freedom
#Multiple R-squared:  0.01985, Adjusted R-squared:  0.01811
#F-statistic: 11.42 on 1 and 564 DF,  p-value: 0.0007759


#Multiple R-squared:  0.01985
# Views do not explain SR (Sharpe Ratio) very much.

plot(dat$Views,dat$SR,xlab='Views 1Y (%)',ylab='Sharpe Ratio (USD, 1Y)')
abline(reg_Views)




########## 1.3 Regression Analysis (SR ~ PassedScreens)

reg_PassedScreens <- lm(SR~PassedScreens,data=dat)
summary(reg_PassedScreens)

#Call:
#lm(formula = SR ~ PassedScreens, data = dat)
#
#Residuals:
#    Min      1Q  Median      3Q     Max
#-3.2178 -0.5859 -0.0177  0.6604  2.9539
#
#Coefficients:
#              Estimate Std. Error t value Pr(>|t|)
#(Intercept)   0.001163   0.041945   0.028   0.9779
#PassedScreens 0.081500   0.042048   1.938   0.0531 .
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
#Residual standard error: 0.9979 on 564 degrees of freedom
#Multiple R-squared:  0.006617, Adjusted R-squared:  0.004856
#F-statistic: 3.757 on 1 and 564 DF,  p-value: 0.05309

#Multiple R-squared:  0.006617
# Passed Screens do not explain SR (Sharpe Ratio) very much.

plot(dat$PassedScreens,dat$SR,xlab='PassedScreens 1Y (%)',ylab='Sharpe Ratio (USD, 1Y)')
abline(reg_PassedScreens)




########## 2 Multiple Regression Analysis (SR ~ InstAUMNetFlow + Views + PassedScreens)


##### multiple regression (with all explanatory variables)

reg_multiple <- lm(SR~InstAUMNetFlow+Views+PassedScreens,data=dat)
summary(reg_multiple)

#Call:
#lm(formula = SR ~ InstAUMNetFlow + Views + PassedScreens, data = dat)
#
#Residuals:
#    Min      1Q  Median      3Q     Max
#-3.1985 -0.5797 -0.0276  0.6324  3.0271 #
#
#Coefficients:
#               Estimate Std. Error t value Pr(>|t|)  
#(Intercept)     0.02399    0.04253   0.564  0.57288  
#InstAUMNetFlow  0.55353    0.22643   2.445  0.01481 *
#Views           0.11937    0.04466   2.673  0.00773 **
#PassedScreens   0.05587    0.04542   1.230  0.21916  
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
#Residual standard error: 0.9872 on 562 degrees of freedom
#Multiple R-squared:  0.03122, Adjusted R-squared:  0.02605
#F-statistic: 6.036 on 3 and 562 DF,  p-value: 0.0004748

#Multiple R-squared:  0.03122
# R2 is still very small.




##### stepwise regression

reg0 <- lm(SR~1,dat)

step(reg0,direction='both', scope=list(upper=~InstAUMNetFlow+Views+PassedScreens))

#Start:  AIC=1.37
#SR ~ 1
#
#                 Df Sum of Sq    RSS     AIC
#+ Views           1   11.2227 554.15 -7.9799
#+ InstAUMNetFlow  1    5.1412 560.23 -1.8022
#+ PassedScreens   1    3.7411 561.63 -0.3894
#<none>                        565.37  1.3683
#
#Step:  AIC=-7.98
#SR ~ Views
#
#                 Df Sum of Sq    RSS      AIC
#+ InstAUMNetFlow  1    4.9513 549.19 -11.0598
#<none>                        554.15  -7.9799
#+ PassedScreens   1    0.6017 553.54  -6.5948
#- Views           1   11.2227 565.37   1.3683
#
#Step:  AIC=-11.06
#SR ~ Views + InstAUMNetFlow
#
#                 Df Sum of Sq    RSS      AIC
#<none>                        549.19 -11.0598
#+ PassedScreens   1    1.4748 547.72 -10.5818
#- InstAUMNetFlow  1    4.9513 554.15  -7.9799
#- Views           1   11.0328 560.23  -1.8022
#
#Call:
#lm(formula = SR ~ Views + InstAUMNetFlow, data = dat)
#
#Coefficients:
#   (Intercept)           Views  InstAUMNetFlow
#       0.02199         0.13965         0.50131

#As a result of stepwise regression, Views is selected first, InstAUMNetFlow is selected second, and then PassedScreens is rejected.


#If you look at a correlation matrix of data, PassedScreens is highly correlated to Views.
#
cor(dat)
#                       SR InstAUMNetFlow      Views PassedScreens
#SR             1.00000000     0.09536016 0.14089073    0.08134542
#InstAUMNetFlow 0.09536016     1.00000000 0.01267051   -0.17021810
#Views          0.14089073     0.01267051 1.00000000    0.36147622
#PassedScreens  0.08134542    -0.17021810 0.36147622    1.00000000



##### multiple regression (after removing PassedScreens)

reg_multiple2 <- lm(SR~InstAUMNetFlow+Views,data=dat)
summary(reg_multiple2)

#Call:
#lm(formula = SR ~ InstAUMNetFlow + Views, data = dat)
#
#Residuals:
#    Min      1Q  Median      3Q     Max
#-3.1469 -0.6066  0.0004  0.6293  3.0273
#
#Coefficients:
#               Estimate Std. Error t value Pr(>|t|)  
#(Intercept)     0.02199    0.04252   0.517 0.605257  
#InstAUMNetFlow  0.50131    0.22251   2.253 0.024646 *
#Views           0.13965    0.04152   3.363 0.000823 ***
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
#Residual standard error: 0.9877 on 563 degrees of freedom
#Multiple R-squared:  0.02861, Adjusted R-squared:  0.02516
#F-statistic:  8.29 on 2 and 563 DF,  p-value: 0.0002829



########## stepwise regression (explained variable: InstAUMNetFlow)

reg0 <- lm(InstAUMNetFlow~1,dat)
step(reg0,direction='both', scope=list(upper=~SR+Views+PassedScreens))

#Start:  AIC=-382.15
#InstAUMNetFlow ~ 1
#
#                Df Sum of Sq    RSS     AIC
#+ SR             1   1.30504 61.456 -385.74
#<none>                       62.761 -382.15
#+ Views          1   0.10122 62.659 -380.58
#+ PassedScreens  1   0.00000 62.761 -380.15
#
#Step:  AIC=-385.74
#InstAUMNetFlow ~ SR
#
#                Df Sum of Sq    RSS     AIC
#<none>                       61.456 -385.74
#+ Views          1   0.09613 61.359 -384.15
#+ PassedScreens  1   0.00373 61.452 -383.75
#- SR             1   1.30504 62.761 -382.15
#
#Call:
#lm(formula = InstAUMNetFlow ~ SR, data = dat)
#
#Coefficients:
#(Intercept)           SR
#   -0.05398      0.07050


reg_multiple3 <- lm(InstAUMNetFlow~SR,data=dat)
summary(reg_multiple3)

#Call:
#lm(formula = InstAUMNetFlow ~ SR, data = dat)
#
#Residuals:
#    Min      1Q  Median      3Q     Max
#-0.7203 -0.1332 -0.0501  0.0531  5.9795
#
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.05398    0.02958  -1.825   0.0692 .
#SR           0.07050    0.02977   2.368   0.0186 *
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
#Residual standard error: 0.4825 on 264 degrees of freedom
#Multiple R-squared:  0.02079, Adjusted R-squared:  0.01708
#F-statistic: 5.606 on 1 and 264 DF,  p-value: 0.01862

plot(dat$SR,dat$InstAUMNetFlow,xlab='Sharpe Ratio (USD, 1Y)',ylab='Inst AUM Net Flow 1Y (%)')
abline(reg_multiple3)







Saturday, July 13, 2019

Python: Linear and Regularized Logistic Regression


0_runme.txt

##### Open Terminal on Mac OS

# Run the following commands on Terminal.

#Current working directory
pwd
#For instance, your MacOS environment goes like this:
#/Users/xxx/


#Change your working directory
#cd /Users/xxx/Downloads


which python3
#For instance, my MacOS environment goes like this:
#/Library/Frameworks/Python.framework/Versions/3.7/bin/python3


which pip3
#For instance, my MacOS environment goes like this:
#/Library/Frameworks/Python.framework/Versions/3.7/bin/pip3


#Run after connecting to the Internet
pip3 install matplotlib
pip3 install sklearn
pip3 install dtreeviz
pip3 install IPython

pip3 install pandas
pip3 install scipy



python3 -V
#For instance, my MacOS environment goes like this:
#Python 3.7.4


#Starting Python 3
#python3
#You do not use this since you're running py scripts on Terminal, rather than running your scripts on Pyton IDLE.



#Download files:
#ex1data1.txt
#ex1data2.txt
#ex2data1.txt
#ex2data2.txt
#
#From:
#https://github.com/LilianYe/Andrew-Ng-Machine-Learning-Programming-solutions-/find/master


#Running Python py scripts
python3 1a_liner_regression_w_one_variable.py
python3 1b_liner_regression_w_multiple_variables.py

python3 2.1_logistic_regression_or_classification.py
python3 2.2_regularized_logistic_regression.py



1a_liner_regression_w_one_variable.py
########## Python Implementation of Andrew Ng’s Machine Learning Course (Part 1)
########## Linear Regression with One Variable
#
# Reference:
#https://medium.com/analytics-vidhya/python-implementation-of-andrew-ngs-machine-learning-course-part-1-6b8dd1c73d80
#
#Here we will implement linear regression with one variable to predict profits for a food truck.
#
#ex1data1.txt
#contains the dataset for our linear regression exercise.
#column data
#1      population of a city
#2      profit of a food truck in that city


# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



###Reading and Plotting the data
data = pd.read_csv('ex1data1.txt', header = None) #read from dataset
X = data.iloc[:,0] # read first column
y = data.iloc[:,1] # read second column
m = len(y) # number of training example
data.head() # view first few rows of the data

plt.scatter(X, y)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.savefig('fig_1a.1.png')    # Save an image file
plt.show()


###Adding the intercept term
X = X[:,np.newaxis]    #(*1)
y = y[:,np.newaxis]    #(*1)
theta = np.zeros([2,1])    #set one initial parameter theta to 0
iterations = 1500
alpha = 0.01    # set another initial parameter, the learning rate alpha, to 0.01
ones = np.ones((m,1))
X = np.hstack((ones, X)) # adding the intercept term

#(*1)
#Note on np.newaxis:
# When you read data into X, you will observe that X, y are rank 1 arrays.
#rank 1 array will have a shape of (m, ) whereas rank 2 arrays will have a shape of (m,1).
#When operating on arrays its good to convert rank 1 arrays to rank 2 arrays because rank 1 arrays often give unexpected results.
#To convert rank 1 to rank 2 array we use someArray[:,np.newaxis].


###Computing the cost
def computeCost(X, y, theta):
    temp = np.dot(X, theta) - y
    return np.sum(np.power(temp, 2)) / (2*m)
J = computeCost(X, y, theta)
print(J)
#You should expect to see a cost of 32.07. More precisely, 32.072733877455676.


###Finding the optimal parameters using Gradient Descent
def gradientDescent(X, y, theta, alpha, iterations):
    for _ in range(iterations):
        temp = np.dot(X, theta) - y
        temp = np.dot(X.T, temp)
        theta = theta - (alpha/m) * temp
    return theta
theta = gradientDescent(X, y, theta, alpha, iterations)
print(theta)
#Expected theta values [-3.6303, 1.1664]
#Technically,
#[[-3.63029144]
# [ 1.16636235]]
#
# So, for instance, the first row of actual data goes like this.
# Population (10,000)    Profit ($10,000)
# 6.1101 17.592
#
# An estimated profit for this population is
# -3.6303 * 1 + 1.1664 * 6.1101 = 3.49652064

#We now have the optimized value of theta . Use this value in the above cost function.

J = computeCost(X, y, theta)
print(J)
#It should give you a value of 4.483 (4.483388256587726) which is much better than 32.07



### Plot showing the best fit line
plt.scatter(X[:,1], y)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.plot(X[:,1], np.dot(X, theta))
plt.savefig('fig_1a.2.png')    # Save an image file
plt.show()


1b_liner_regression_w_multiple_variables.py
########## Python Implementation of Andrew Ng’s Machine Learning Course (Part 1)
########## Linear Regression with multiple variables
#(also called Multivariate Linear Regression)


# ex1data2.txt
# a training set of housing prices in Portland, Oregon
# column data
# 1      size of the house (in square feet)
# 2      number of bedrooms
# 3      price of the house


### import
import numpy as np
import pandas as pd


### data loading
data = pd.read_csv('ex1data2.txt', sep = ',', header = None)
X = data.iloc[:,0:2] # read first two columns into X
y = data.iloc[:,2] # read the third column into y
m = len(y) # no. of training samples
data.head()


### Feature Normalization
X = (X - np.mean(X))/np.std(X)


###Adding the intercept term and initializing parameters
ones = np.ones((m,1))
X = np.hstack((ones, X))
alpha = 0.01
num_iters = 400
theta = np.zeros((3,1))
y = y[:,np.newaxis]


###Computing the cost
def computeCostMulti(X, y, theta):
    temp = np.dot(X, theta) - y
    return np.sum(np.power(temp, 2)) / (2*m)
J = computeCostMulti(X, y, theta)
print(J)
#You should expect to see a cost of 65591548106.45744.


###Finding the optimal parameters using Gradient Descent
def gradientDescentMulti(X, y, theta, alpha, iterations):
    m = len(y)
    for _ in range(iterations):
        temp = np.dot(X, theta) - y
        temp = np.dot(X.T, temp)
        theta = theta - (alpha/m) * temp
    return theta
theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
print(theta)
# your optimal parameters will be [[334302.06399328],[ 99411.44947359], [3267.01285407]]
# For instance, if you look at the sample data in the first row,
# 2104 3 399900
# first column's average: 2000.68085106383
# first column's SD: 794.70235353389
# second column's average: 3.17021276595745
# second column's SD: 0.7609818867801
#
# So, the first column (2104-2000.68085106383)/794.70235353389 = 0.1300 (SD)
# Then the second column (3-3.17021276595745)/0.7609818867801 = -0.2237 (SD)

# 334302.06399328 * 1 + 99411.44947359 * 0.1300 + 3267.01285407 * (-0.2237)
# = 346494.721649391 (estimated data) ~ 399,900 (actual data)


#We now have the optimized value of theta . Use this value in the above cost function.
J = computeCostMulti(X, y, theta)
print(J)
#This should give you a value of 2105448288.6292474 which is much better than 65591548106.45744



2.1_logistic_regression_or_classification.py
########## Python Implementation of Andrew Ng’s Machine Learning Course (Part 2.1)
########## Logistic Regression or Classification (part 2.1)
#
# Reference:
# https://medium.com/analytics-vidhya/python-implementation-of-andrew-ngs-machine-learning-course-part-2-1-1a666f049ad6
#
#


##### Logistic Regression


### import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.optimize as opt    # more on this later



### data
#ex2data1.txt
#column data
#1      Exam 1 score
#2      Exam 2 score
#3      0 (fail) or 1 (pass)

data = pd.read_csv('ex2data1.txt', header = None)
X = data.iloc[:,:-1]
y = data.iloc[:,2]
data.head()


### plotting

mask = y == 1
adm = plt.scatter(X[mask][0].values, X[mask][1].values)
not_adm = plt.scatter(X[~mask][0].values, X[~mask][1].values)
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend((adm, not_adm), ('Admitted', 'Not admitted'))
plt.savefig('fig_2.1.1.png')    # Save an image file
plt.show()


### Implementation

## Sigmoid Function

def sigmoid(x):
  return 1/(1+np.exp(-x))

## Cost Function
def costFunction(theta, X, y):
    J = (-1/m) * np.sum(np.multiply(y, np.log(sigmoid(X @ theta)))
        + np.multiply((1-y), np.log(1 - sigmoid(X @ theta))))
    return J


## Gradient Function
def gradient(theta, X, y):
    return ((1/m) * X.T @ (sigmoid(X @ theta) - y))

(m, n) = X.shape
X = np.hstack((np.ones((m,1)), X))
y = y[:, np.newaxis]
theta = np.zeros((n+1,1)) # intializing theta with all zeros
J = costFunction(theta, X, y)
print(J)
# This should give us a value of 0.693 for J.
# More precisely, 0.6931471805599453


### Learning parameters using fmin_tnc
temp = opt.fmin_tnc(func = costFunction,
                    x0 = theta.flatten(),fprime = gradient,
                    args = (X, y.flatten()))
#the output of above function is a tuple whose first element #contains the optimized values of theta
theta_optimized = temp[0]
print(theta_optimized)
# The above code should give [-25.16131862, 0.20623159, 0.20147149].

J = costFunction(theta_optimized[:,np.newaxis], X, y)
print(J)
#You should see a value of 0.203 (0.20349770158947486).
# Compare this with the cost 0.693 obtained using initial theta.


### Plotting Decision Boundary (Optional)
plot_x = [np.min(X[:,1]-2), np.max(X[:,2]+2)]
plot_y = -1/theta_optimized[2]*(theta_optimized[0]
          + np.dot(theta_optimized[1],plot_x))
mask = y.flatten() == 1
adm = plt.scatter(X[mask][:,1], X[mask][:,2])
not_adm = plt.scatter(X[~mask][:,1], X[~mask][:,2])
decision_boun = plt.plot(plot_x, plot_y)
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend((adm, not_adm), ('Admitted', 'Not admitted'))
plt.savefig('fig_2.1.2.png')    # Save an image file
plt.show()


def accuracy(X, y, theta, cutoff):
    pred = [sigmoid(np.dot(X, theta)) >= cutoff]
    acc = np.mean(pred == y)
    print(acc * 100)
accuracy(X, y.flatten(), theta_optimized, 0.5)

# This should give us an accuracy score of 89% . Hmm… not bad.


2.2_regularized_logistic_regression.py
########## Python Implementation of Andrew Ng’s Machine Learning Course (Part 2.2)
########## Regularized Logistic Regression (part 2.2)
#
# Reference:
# https://medium.com/analytics-vidhya/python-implementation-of-andrew-ngs-machine-learning-course-part-2-2-dceff1a12a12
#
#


##### Regularized logistic regression


### import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.optimize as opt    # more on this later


### data
#ex2data2.txt
#column data
#1      test 1 result
#2      test 2 result
#3      0 (rejected) or 1 (accepted)

data = pd.read_csv('ex2data2.txt', header = None)
X = data.iloc[:,:-1]
y = data.iloc[:,2]
data.head()


### plotting
mask = y == 1
passed = plt.scatter(X[mask][0].values, X[mask][1].values)
failed = plt.scatter(X[~mask][0].values, X[~mask][1].values)
plt.xlabel('Microchip Test1')
plt.ylabel('Microchip Test2')
plt.legend((passed, failed), ('Passed', 'Failed'))
plt.savefig('fig_2.2.1.png')    # Save an image file
plt.show()


### Feature mapping
def mapFeature(X1, X2):
    degree = 6
    out = np.ones(X.shape[0])[:,np.newaxis]
    for i in range(1, degree+1):
        for j in range(i+1):
            out = np.hstack((out, np.multiply(np.power(X1, i-j),                                     np.power(X2, j))[:,np.newaxis]))
    return out
X = mapFeature(X.iloc[:,0], X.iloc[:,1])


### Implementation

## Sigmoid Function
def sigmoid(x):
  return 1/(1+np.exp(-x))


## Cost Function
def lrCostFunction(theta_t, X_t, y_t, lambda_t):
    m = len(y_t)
    J = (-1/m) * (y_t.T @ np.log(sigmoid(X_t @ theta_t)) + (1 - y_t.T) @ np.log(1 - sigmoid(X_t @ theta_t)))
    reg = (lambda_t/(2*m)) * (theta_t[1:].T @ theta_t[1:])
    J = J + reg
    return J

## Gradient Function
def lrGradientDescent(theta, X, y, lambda_t):
    m = len(y)
    grad = np.zeros([m,1])
    grad = (1/m) * X.T @ (sigmoid(X @ theta) - y)
    grad[1:] = grad[1:] + (lambda_t / m) * theta[1:]
    return grad

(m, n) = X.shape
y = y[:, np.newaxis]
theta = np.zeros((n,1))
lmbda = 1
J = lrCostFunction(theta, X, y, lmbda)
print(J)
#This gives us a values of 0.69314718.


## Learning parameters using fmin_tnc
output = opt.fmin_tnc(func = lrCostFunction, x0 = theta.flatten(), fprime = lrGradientDescent, \
                         args = (X, y.flatten(), lmbda))
theta = output[0]
print(theta) # theta contains the optimized values


## Accuracy of model
pred = [sigmoid(np.dot(X, theta)) >= 0.5]
np.mean(pred == y.flatten()) * 100
print(np.mean(pred == y.flatten()) * 100)
# This gives our model accuracy as 83.05% (83.05084745762711).


## Plotting Decision Boundary (optional)
u = np.linspace(-1, 1.5, 50)
v = np.linspace(-1, 1.5, 50)
z = np.zeros((len(u), len(v)))
def mapFeatureForPlotting(X1, X2):
    degree = 6
    out = np.ones(1)
    for i in range(1, degree+1):
        for j in range(i+1):
            out = np.hstack((out, np.multiply(np.power(X1, i-j), np.power(X2, j))))
    return out
for i in range(len(u)):
    for j in range(len(v)):
        z[i,j] = np.dot(mapFeatureForPlotting(u[i], v[j]), theta)
mask = y.flatten() == 1
X = data.iloc[:,:-1]
passed = plt.scatter(X[mask][0], X[mask][1])
failed = plt.scatter(X[~mask][0], X[~mask][1])
plt.contour(u,v,z,0)
plt.xlabel('Microchip Test1')
plt.ylabel('Microchip Test2')
plt.legend((passed, failed), ('Passed', 'Failed'))
plt.savefig('fig_2.2.2.png')    # Save an image file
plt.show()



data

ex1data1.txt
6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
8.5781,12
6.4862,6.5987
5.0546,3.8166
5.7107,3.2522
14.164,15.505
5.734,3.1551
8.4084,7.2258
5.6407,0.71618
5.3794,3.5129
6.3654,5.3048
5.1301,0.56077
6.4296,3.6518
7.0708,5.3893
6.1891,3.1386
20.27,21.767
5.4901,4.263
6.3261,5.1875
5.5649,3.0825
18.945,22.638
12.828,13.501
10.957,7.0467
13.176,14.692
22.203,24.147
5.2524,-1.22
6.5894,5.9966
9.2482,12.134
5.8918,1.8495
8.2111,6.5426
7.9334,4.5623
8.0959,4.1164
5.6063,3.3928
12.836,10.117
6.3534,5.4974
5.4069,0.55657
6.8825,3.9115
11.708,5.3854
5.7737,2.4406
7.8247,6.7318
7.0931,1.0463
5.0702,5.1337
5.8014,1.844
11.7,8.0043
5.5416,1.0179
7.5402,6.7504
5.3077,1.8396
7.4239,4.2885
7.6031,4.9981
6.3328,1.4233
6.3589,-1.4211
6.2742,2.4756
5.6397,4.6042
9.3102,3.9624
9.4536,5.4141
8.8254,5.1694
5.1793,-0.74279
21.279,17.929
14.908,12.054
18.959,17.054
7.2182,4.8852
8.2951,5.7442
10.236,7.7754
5.4994,1.0173
20.341,20.992
10.136,6.6799
7.3345,4.0259
6.0062,1.2784
7.2259,3.3411
5.0269,-2.6807
6.5479,0.29678
7.5386,3.8845
5.0365,5.7014
10.274,6.7526
5.1077,2.0576
5.7292,0.47953
5.1884,0.20421
6.3557,0.67861
9.7687,7.5435
6.5159,5.3436
8.5172,4.2415
9.1802,6.7981
6.002,0.92695
5.5204,0.152
5.0594,2.8214
5.7077,1.8451
7.6366,4.2959
5.8707,7.2029
5.3054,1.9869
8.2934,0.14454
13.394,9.0551
5.4369,0.61705

ex1data2.txt
2104,3,399900
1600,3,329900
2400,3,369000
1416,2,232000
3000,4,539900
1985,4,299900
1534,3,314900
1427,3,198999
1380,3,212000
1494,3,242500
1940,4,239999
2000,3,347000
1890,3,329999
4478,5,699900
1268,3,259900
2300,4,449900
1320,2,299900
1236,3,199900
2609,4,499998
3031,4,599000
1767,3,252900
1888,2,255000
1604,3,242900
1962,4,259900
3890,3,573900
1100,3,249900
1458,3,464500
2526,3,469000
2200,3,475000
2637,3,299900
1839,2,349900
1000,1,169900
2040,4,314900
3137,3,579900
1811,4,285900
1437,3,249900
1239,3,229900
2132,4,345000
4215,4,549000
2162,4,287000
1664,2,368500
2238,3,329900
2567,4,314000
1200,3,299000
852,2,179900
1852,4,299900
1203,3,239500
ex2data1.txt
34.62365962451697,78.0246928153624,0
30.28671076822607,43.89499752400101,0
35.84740876993872,72.90219802708364,0
60.18259938620976,86.30855209546826,1
79.0327360507101,75.3443764369103,1
45.08327747668339,56.3163717815305,0
61.10666453684766,96.51142588489624,1
75.02474556738889,46.55401354116538,1
76.09878670226257,87.42056971926803,1
84.43281996120035,43.53339331072109,1
95.86155507093572,38.22527805795094,0
75.01365838958247,30.60326323428011,0
82.30705337399482,76.48196330235604,1
69.36458875970939,97.71869196188608,1
39.53833914367223,76.03681085115882,0
53.9710521485623,89.20735013750205,1
69.07014406283025,52.74046973016765,1
67.94685547711617,46.67857410673128,0
70.66150955499435,92.92713789364831,1
76.97878372747498,47.57596364975532,1
67.37202754570876,42.83843832029179,0
89.67677575072079,65.79936592745237,1
50.534788289883,48.85581152764205,0
34.21206097786789,44.20952859866288,0
77.9240914545704,68.9723599933059,1
62.27101367004632,69.95445795447587,1
80.1901807509566,44.82162893218353,1
93.114388797442,38.80067033713209,0
61.83020602312595,50.25610789244621,0
38.78580379679423,64.99568095539578,0
61.379289447425,72.80788731317097,1
85.40451939411645,57.05198397627122,1
52.10797973193984,63.12762376881715,0
52.04540476831827,69.43286012045222,1
40.23689373545111,71.16774802184875,0
54.63510555424817,52.21388588061123,0
33.91550010906887,98.86943574220611,0
64.17698887494485,80.90806058670817,1
74.78925295941542,41.57341522824434,0
34.1836400264419,75.2377203360134,0
83.90239366249155,56.30804621605327,1
51.54772026906181,46.85629026349976,0
94.44336776917852,65.56892160559052,1
82.36875375713919,40.61825515970618,0
51.04775177128865,45.82270145776001,0
62.22267576120188,52.06099194836679,0
77.19303492601364,70.45820000180959,1
97.77159928000232,86.7278223300282,1
62.07306379667647,96.76882412413983,1
91.56497449807442,88.69629254546599,1
79.94481794066932,74.16311935043758,1
99.2725269292572,60.99903099844988,1
90.54671411399852,43.39060180650027,1
34.52451385320009,60.39634245837173,0
50.2864961189907,49.80453881323059,0
49.58667721632031,59.80895099453265,0
97.64563396007767,68.86157272420604,1
32.57720016809309,95.59854761387875,0
74.24869136721598,69.82457122657193,1
71.79646205863379,78.45356224515052,1
75.3956114656803,85.75993667331619,1
35.28611281526193,47.02051394723416,0
56.25381749711624,39.26147251058019,0
30.05882244669796,49.59297386723685,0
44.66826172480893,66.45008614558913,0
66.56089447242954,41.09209807936973,0
40.45755098375164,97.53518548909936,1
49.07256321908844,51.88321182073966,0
80.27957401466998,92.11606081344084,1
66.74671856944039,60.99139402740988,1
32.72283304060323,43.30717306430063,0
64.0393204150601,78.03168802018232,1
72.34649422579923,96.22759296761404,1
60.45788573918959,73.09499809758037,1
58.84095621726802,75.85844831279042,1
99.82785779692128,72.36925193383885,1
47.26426910848174,88.47586499559782,1
50.45815980285988,75.80985952982456,1
60.45555629271532,42.50840943572217,0
82.22666157785568,42.71987853716458,0
88.9138964166533,69.80378889835472,1
94.83450672430196,45.69430680250754,1
67.31925746917527,66.58935317747915,1
57.23870631569862,59.51428198012956,1
80.36675600171273,90.96014789746954,1
68.46852178591112,85.59430710452014,1
42.0754545384731,78.84478600148043,0
75.47770200533905,90.42453899753964,1
78.63542434898018,96.64742716885644,1
52.34800398794107,60.76950525602592,0
94.09433112516793,77.15910509073893,1
90.44855097096364,87.50879176484702,1
55.48216114069585,35.57070347228866,0
74.49269241843041,84.84513684930135,1
89.84580670720979,45.35828361091658,1
83.48916274498238,48.38028579728175,1
42.2617008099817,87.10385094025457,1
99.31500880510394,68.77540947206617,1
55.34001756003703,64.9319380069486,1
74.77589300092767,89.52981289513276,1
ex2data2.txt
0.051267,0.69956,1
-0.092742,0.68494,1
-0.21371,0.69225,1
-0.375,0.50219,1
-0.51325,0.46564,1
-0.52477,0.2098,1
-0.39804,0.034357,1
-0.30588,-0.19225,1
0.016705,-0.40424,1
0.13191,-0.51389,1
0.38537,-0.56506,1
0.52938,-0.5212,1
0.63882,-0.24342,1
0.73675,-0.18494,1
0.54666,0.48757,1
0.322,0.5826,1
0.16647,0.53874,1
-0.046659,0.81652,1
-0.17339,0.69956,1
-0.47869,0.63377,1
-0.60541,0.59722,1
-0.62846,0.33406,1
-0.59389,0.005117,1
-0.42108,-0.27266,1
-0.11578,-0.39693,1
0.20104,-0.60161,1
0.46601,-0.53582,1
0.67339,-0.53582,1
-0.13882,0.54605,1
-0.29435,0.77997,1
-0.26555,0.96272,1
-0.16187,0.8019,1
-0.17339,0.64839,1
-0.28283,0.47295,1
-0.36348,0.31213,1
-0.30012,0.027047,1
-0.23675,-0.21418,1
-0.06394,-0.18494,1
0.062788,-0.16301,1
0.22984,-0.41155,1
0.2932,-0.2288,1
0.48329,-0.18494,1
0.64459,-0.14108,1
0.46025,0.012427,1
0.6273,0.15863,1
0.57546,0.26827,1
0.72523,0.44371,1
0.22408,0.52412,1
0.44297,0.67032,1
0.322,0.69225,1
0.13767,0.57529,1
-0.0063364,0.39985,1
-0.092742,0.55336,1
-0.20795,0.35599,1
-0.20795,0.17325,1
-0.43836,0.21711,1
-0.21947,-0.016813,1
-0.13882,-0.27266,1
0.18376,0.93348,0
0.22408,0.77997,0
0.29896,0.61915,0
0.50634,0.75804,0
0.61578,0.7288,0
0.60426,0.59722,0
0.76555,0.50219,0
0.92684,0.3633,0
0.82316,0.27558,0
0.96141,0.085526,0
0.93836,0.012427,0
0.86348,-0.082602,0
0.89804,-0.20687,0
0.85196,-0.36769,0
0.82892,-0.5212,0
0.79435,-0.55775,0
0.59274,-0.7405,0
0.51786,-0.5943,0
0.46601,-0.41886,0
0.35081,-0.57968,0
0.28744,-0.76974,0
0.085829,-0.75512,0
0.14919,-0.57968,0
-0.13306,-0.4481,0
-0.40956,-0.41155,0
-0.39228,-0.25804,0
-0.74366,-0.25804,0
-0.69758,0.041667,0
-0.75518,0.2902,0
-0.69758,0.68494,0
-0.4038,0.70687,0
-0.38076,0.91886,0
-0.50749,0.90424,0
-0.54781,0.70687,0
0.10311,0.77997,0
0.057028,0.91886,0
-0.10426,0.99196,0
-0.081221,1.1089,0
0.28744,1.087,0
0.39689,0.82383,0
0.63882,0.88962,0
0.82316,0.66301,0
0.67339,0.64108,0
1.0709,0.10015,0
-0.046659,-0.57968,0
-0.23675,-0.63816,0
-0.15035,-0.36769,0
-0.49021,-0.3019,0
-0.46717,-0.13377,0
-0.28859,-0.060673,0
-0.61118,-0.067982,0
-0.66302,-0.21418,0
-0.59965,-0.41886,0
-0.72638,-0.082602,0
-0.83007,0.31213,0
-0.72062,0.53874,0
-0.59389,0.49488,0
-0.48445,0.99927,0
-0.0063364,0.99927,0
0.63265,-0.030612,0


After running r scripts above, you'll get these png files on your working directory on your Terminal:






Saturday, July 6, 2019

R: Logistic Regression (OLD)

data.csv
number,age,blood_pressure,lung_capacity,sex,illness,weight
1,22,110,4300,M,1,79
2,23,128,4500,M,1,65
3,24,104,3900,F,0,53
4,25,112,3000,F,0,45
5,27,108,4800,M,0,80
6,28,126,3800,F,0,50
7,28,126,3800,F,1,43
8,29,104,4000,F,1,55
9,30,125,3600,F,1,47
10,31,120,3400,F,1,49
11,32,116,3600,M,1,64
12,32,124,3900,M,0,61
13,33,106,3100,F,0,48
14,33,134,2900,F,0,41
15,34,128,4100,M,1,70
16,36,128,3420,M,1,55
17,37,116,3800,M,1,70
18,37,132,4150,M,1,90
19,38,134,2700,F,0,39
20,39,116,4550,M,1,86
21,40,120,2900,F,1,50
22,42,130,3950,F,1,65
23,46,126,3100,M,0,58
24,49,140,3000,F,0,45
25,50,156,3400,M,1,60
26,53,124,3400,M,1,71
27,56,118,3470,M,1,62
28,58,144,2800,M,0,51
29,64,142,2500,F,1,40
30,65,144,2350,F,0,42


0_runme.txt

########## R: Logistic Regression

##### Run this script on your R Console


##### Background
#
# We use machine learning models to learn training data.
# The trained machine learning models are expected to predict in a reliable manner even when using new data (which is different from the training data above).
# If the machined learning model is over-fitting the training data (including noises and outlier),
# the model's prediction accuracy for new data could be lowered.
# This is because the model learn noises, outliers, and other meaningful data points of the training data, and regard the entire data as meaningful.
# To explain noises and outliers, the model is overly optimized.
#
# Reasons for over-fitting are mainly (1) numbers of data points are too small, (2) too many explanatory variables, and (3) too big parameters (coefficients).
#
# To avoid over-fitting, we can use regularization. This method is widely used in various machine learning models.
#
# Regularization: A way to find a model while avoiding over-fitting
#     L1 (Lasso): A penalty term is sum of absolute parameter values of the model
#                 By setting weight = 0 of certain data, deleting unnecessary data.
#                 "Dimension comperession to delete unnecessary explanatory variables"
#     L2 (Ridge): A penalty term is sum of squared parameter values of the model.
#                 This is to have a smoother model.
#                 "More accurate prediction while avoiding over-fitting"
# Under both L1 regularization and L2 regularization,
# models with lower dimensions have smaller penalty.
# If training data have exceptional data such as noises and outliers,
# models have to increase its dimensions to explain data including such exceptional data
# while trying not to be penalized for increased dimensions.
# (Both L1 and L2 can be simultaneously used as liner sum. This is elastic net regularization.)
#
#
# Regression:
#     A certain objective variable Y is predicted by using weighted explanatory variables X {x0, x1, x2, ..., xn}
#     Predicted Y = hθ(X) = θ0 * x0 + θ1 * x1 + ... + θn * xn =θT X
#
# Logistic regression:
#     Generally, hθ(X) above is a continuous value without any upper and lower boundaries.
#     To make 0 ≤ hθ(X) ≤ 1,
#     Logistic Function (AKA Sigmoid Function) g(z) = 1/(1 + e^(−z))
#     When doing logistic regressions,
#     hθ(X) = 1/(1 + e^(−θT X))
#
#     hθ(x)≥0.5, then Y = 1
#     hθ(x)<0.5, then Y = 0


# Set your working directory on your R Console
##### The following directory is dummy - set to your own directory where you save all the r files below.
setwd('/Users/XXX/Downloads/')

#source('1_install_packages.r') # You have to run this r script only for the first time.

#source('2_library.r')

source('3_logi_fun.r')

source('4_logistic_regression.r')



Source: https://qiita.com/katsu1110/items/e4ef613559f02f183af5



1_install_packages.r
########## install packages

#install.packages("glmnet")
#install.packages('glmnet_2.0-18.tgz')
#zip file dowloaded from https://cran.r-project.org/web/packages/glmnet/index.html


2_library.r
########## library setting

#library('glmnet')


3_logi_fun.r
logi_fun <- function(data,file,disease){
ans <- glm(data$Y~.,data=data,family=binomial) # family=binomial for logistics regression
s.ans <- summary(ans)
coe <- s.ans$coefficient
RR <- exp(coe[,1])
RRlow <- exp(coe[,1]-1.96*coe[,2])
RRup <- exp(coe[,1]+1.96*coe[,2])
N <- nrow(data)
aic <- AIC(ans)
result <- cbind(coe,RR,RRlow,RRup,aic,N)
colnames(result)[6:7] <- c("RR95%CI.low","RR95%CI.up")
if(nrow(result)>=2){
result[2:nrow(result),8:9] <- ""
}
write.table(disease,file,append=T,quote=F,sep=",",row.names=F,col.names=F)
write.table(matrix(c("",colnames(result)),nrow=1),file,append=T,quote=F,sep=",",row.names=F,col.names=F)
write.table(result,file,append=T,quote=F,sep=",",row.names=T,col.names=F)
write.table("",file,append=T,quote=F,sep=",",row.names=F,col.names=F)
}


4_logistic_regression.r
df <- read.csv("data.csv",header=T,row.names=1)

dat <- df[,c(5,1,2,6)]
# 5th column (illness): explanined (target) variable
# 1(age),2(blood_pressure),6(weight): explanatory variables

colnames(dat)[1] <- "Y"

logi_fun(dat,"results_logistic_reg.csv","illness")
# See this csv file.
# If significance level = 0.05, then only "age" has Pr (p-value) which is less than 0.05 (0.038665).





results_logistic_reg.csv
illness
,Estimate,Std. Error,z value,Pr(>|z|),RR,RR95%CI.low,RR95%CI.up,aic,N
(Intercept),-6.27037164366909,5.6269544356187,-1.11434555147231,0.265130972267466,0.00189152547909671,3.06938865463388e-08,116.566164818212,42.6982377386664,30
age,0.00171984691269398,0.0447696844915921,0.0384154351817462,0.969356454570465,1.00172132669761,0.91756786481243,1.09359280641977,,
blood_pressure,0.016973167573557,0.0445691192249947,0.380827978400756,0.703330897109487,1.01711803021436,0.932037428183773,1.10996517532894,,
weight,0.0801901371302698,0.0387817328646643,2.06772960378298,0.038665456531477,1.08349306035207,1.004186680477,1.16906271976586,,





Deep Learning (Regression, Multiple Features/Explanatory Variables, Supervised Learning): Impelementation and Showing Biases and Weights

Deep Learning (Regression, Multiple Features/Explanatory Variables, Supervised Learning): Impelementation and Showing Biases and Weights ...