Basic Commands
• help regress
• display "Hello"
• di di (l-normal(1.96))*2
• di sqrt(3.14)
• describe xl x2
• des using jeeshim.dbt
• list in 1/10
• list male-income
• list pop* pro? if male==l
• list-2
• summarize male grade
• sum xl-xlOy* post?
• sum income family if male~=.
• sum income if (male==l) & (class >=3)
• tabulate male educate
• tab male grade, chi2 row col
• tabi 12 33 \ 34 53, chi2 exact
• tabi 34 53 23 \ 23 56 34 \ 45 32 21, chi2 all
• tabstat math english, by(male) stats(n mean sum sd var max min skewness) Management Commands
• codebook male
• label data "Pew Internet and American Life Project"
• label variable male "Gender"
• label variable male; // to remove a variable lable
• label define yn 1 yes 0 no
• label values open yn
• compare math english
• cf math english using indiana.dta
• ci grade if male==0; /* confidence interval */
• count if math >90
• lookforgnp
• notes var: Need to be verified
• update all
• net search Spost
• net from
• net describe spost9_ado
• net spost9_ado
• net get spost9_do
• sscwhatsnew
• ssc describe
• ssc install
Operators
• + _ * / A
• >, >=, <, <=, = (equal), ~= (not equal)
• & (and), | (or), ~ (not)
• +=, =+, -=, =-, /=, %=, &=, |=, *=
• in (in if command)
• + (other variables), 1/b (from a through b),. (missing values)
• wild card (*, ?, /, -)
• concatenation (+)
Math Functions
• abs(); sin(); cos(); tan();asin(); acos(); atan()
• ceil(); floor(); int() or trunc(); round()
• exp(); sqrt(); log(); ln(); logl0()
• min(); max(); sign(); sum(); mod(x,y); comb(x,k)
Probability Distribution
• binomal(h,k,p) //joint cumulative distribution of bivariate normal
• chi2(df,x) // cumulative chi squared distribution
• chi2tail(n,x) // reverse of chi2()
• F(dfl, df2, f) // cumulative F distribution
• Ftail(dfl, df2, f) // reverse cumulative (upper-tail) F
• normal(z); normal(1.96) // returns .9750002
• ttail(df, t) // reverse cumulative (upper-tail) T
• uniform() // uniform distributionreverse cumulative (upper-tail) T
• di chi2tail(l0,18.31) // returns .04995417, p-value
• di F(5, 10, 3.325) // returns .9499661
• di Ftail(5, 10, 3.325) // returns .05000, the pa-value
• di (l-normal(z))*2 // compute the p-value for the two-tailed test
• di ttail(20, 2.086) // returns .02499818
• di ttail(df, t)*2 // compute the p-value for the two-tailed test
String Functions
• char(n); length(s); trim(s); ltrim(s); rtrim(s)
• string(n); substr(s,begin, length)
• real(s); reverse(s); wordQ; lowerQ; upper()
Handling Data Sets
• use "k:\kucc625\open.dta", clear
• use j eeshim. dta, clear nolabel
• use using jeeshim.dta if gender==l, clear nolabel
• save "c:\kucc625\open.dta", replace
• save open.dta, replace nolabel
• saveold "c:\kucc625\jeeshim.dta", replace
• log using "c:\kucc625\open.log", append
• log using open.log, append text
• log on // log off; log close
• logcmd using "c:\kucc625\open_cmd.log", replace
• logcmd on // logcmd off; logcmd close
Import
• infile a b c usingjeeshim.txt, clear
• infile strl5 name float weight int height using student.txt
• inf id _skip(l) ql-q3 using student.txt, clear
• inf str20 id long (ql-q3) using student.txt, clear
• inf id double (ql0-ql3) income if income >50000 using studenttxt
• inf using student.dct, clear
• infix year 1-4 gnp 5-9 interest 10-13 using macro.txt
• infix using macro.dct in 1/100, clear
• insheet using jeeshim.csv, clear
• insheet a b c d e f g using student.txt, comma clear
• insheet using student.txt, delim("#") clear
Export
• odbc list
• odbc load ID=year gnp interest in 1/500, table("macro") dsn("j eeshim")
• outfileusingjeeshim.txt
• outfile x 1 -x 10 using j eeshim.txt, wide replace
• outfile using jeeshim.txt, nolabel noquote replace
• outsheet using jeeshim.xls, nolabel /* tab delimited */
• outsheet using jeeshim.xls, comma replace
Editing
• keep gender grade korean math english if gender=l
• keepidql-q20
• drop templ-temp5 if gender=0
• drop temp* pro? if income <5000
• drop in 1/10
• drop if gender==l in 1/100
• edit
• editinl0/-5
• edit if gender==0
• edit in 1/100 if income >5000
• edit male class if income >=30000
• mark ynmiss
• markout yn_miss ql-qlO //0 if any one of variable has missing
• isid college // to check for unique identifiers Recoding
• generate gender; gen gender=male
• gen square=gnpA2
• gen grade=(score <= 90 | attendance=0) if final~=.
• egen avg = rowmean(english math stats)
• egen gnpbar = mean(gnp), by(country)
• replace gender=0
• replace gender=l if male=l
• replace male=l in 3
• recode class 1=0 2=1 *=.
• recode class 1/3=0 4=1 if male==0
• recode grade 1 2 3 5=1 4=2
• recode grade 9999=.
• recode grade min/5=min
• recode grade 6/max=max
Reshaping Data Sets
• set obs 100 // to change the numnber of observations
• sort male grade
• gsort -grade name, gen(rank)
• append using c:\data\class
• app using c:\data\class, keep(id state ql-qlO)
• expand 5 in -10/-1 // duplicate observations n-1 times
• merge using school // one-to-one merging
• merge state using school // match merging
• merge state using school university, update replace
• joinby id using secondary, unmatched(master) // unm(both), unm(using)
• move male grade
• order grade male // order variables as listed
• rename male gender; /* from male to gender */
• expand 5 if state=="IN" // duplicate a subset of observations
• collapse a b (sd) c (count) d (max)
• collapse a b (sd) c (count) d (max), by state
• contract gender degree area, freq(count) zero
• reshape long choice, i(id) j (orders)
• stack bestl-best3, into(best) clear
• pkshape id row coll-col3, order(abc cab bca) outcome(y) sequence(rows) treat(treat) period(columns)
• compress // all variable
• compress name grade
• xpose, clear vamame
Ordinary Least Squares (OLS)
• regress dv ivl iv2
• regress depend indep 1 -indep 10, noconstant
• regress income school job location if gender==0
• regress income school job location if gender==0, noconstant level(95)
• predict p // xb
• predict r, residual
• fitstat
• quietly fitstat, saving(modell)
• fitstat, using(modell)
• fitstat, dif
• bgodfrey, lag(l 2 3); estat bgodfrey, lag(l 2 3)
• dwstat; estat dwatson, lag(l 2 3)
• stepwise, pr(.2): regress y xl-xlO // backward stepwise regression
• constraint define 1 dl+d2+d3=0 // LSDV2 in Stata
• constraint define 2 gl+g2+g3+g4=0
• cnsregs y xl x2 dl-d3 gl-g4, constraint^ 2)
Hypothesis Test
• test school /* Wald Test */
• test school location; test school=location
• test job; test school, accumulate
• lrtest, saving(O); /* Likelihood Ratio Test */
• lrtest, saving(l)
• lrtest, using(l) model(O)
• boxcox //Box-Cox regression model
• eivreg // errors-in-variables regression
• ffacpoly // Fractional polynomial regression
• frontier //Stochastic frontier models
• glm // generalized linear model
• intreg //interval regression
• ivreg //instrumental variables (two-stage least squares) regression
• ivreg dv ivl iv2 (iv3= xl x2 x3) iv4 iv5
• mfp //multivariable fractional polynomial models
• mvreg //multivariate regression
• newey //Regression with Newey-West standard errors
• nl //nonlinear least-squares estimation
• orthog //Orthogonalize variables and compute orthogonal polynomials
• prais dvl rhs, rho(tscorr) twostep //Prais-Winsten two-step
• prais dvl rhs, rho(dw) // iterative two-step
• prais dvl rhs, rho(dw) core // Cochrane-Orcutt
• qreg //Quantile (including median) regression
• reg3 //three-stage estimation for systems of simultaneous equations
• reg3 (dvl xl x2) (dv2 xl x3)
• reg3 (dvl dv2 = xl x2 x3)
• reg3 (dvl dv2 = xl x2 x3) (dv3 xl x3)
• rocfit//fit ORC model
• rreg//robust regression
• stcox //fit Cox proportional hazards model
• streg //fit parametric survival model
• sureg //Zellner's seemingly unrelated regression
• stepwise //stepwise estimation
• treatreg //treatment-effects model
• treatreg y xl x2 x3, treat(x4=zl z2) twostep
• vwls //variance-weighted least squares
• tsset group year // set group and time
• xtreg y xl x2, re i(year) // random effect model
• xtreg y xl x2, fe i(group) // random effect model
• xtreg y xl x2, be i(group) // between effect model
• areg // linear regression with a large dummy-variable set
• xtabond // Arellano-Bond linear, dynamic panel-data estimator
• xtcloglog // Random-effects, population-averaged cloglog models
• xtgee // fit population-averaged panel-data models using GEE
• xtfrontier // stochastic frontier models for panel data
• xtgls // fit panel-data models using GLS
• xthtaylor // Hausman-Taylor estimator for error components models
• xtinreg // random-effects interval data regression models
• xtivreg // Instrumental variables and two-stage least squares
• xtlogit //fixed-effects, random-effects, population-averaged logit
• xtmixed // multilevel mixed-effects linear regression
• xtprobit // random-effects and population averaged probit models
• xttobit // random-effects tobit models
• xtnbreg //fixed-effects, random-effects, and population-averaged NB
• xtpcse // Prais-Winsten models with panel-corrected standard errors
• xtpoisson //fixed-effects, random-effects, population-averaged Poisson
• xtrc // random-coefficients models
• xtregar // fixed-and random-effects linear models with an AR(1)
Binary Logit/Probit
• logit dv ivl iv2
• logit card income school job if gender==0, nolog nocon
• logistict dv ivl iv2
• probit dv ivl iv2
• predict p
• prchange; prchange income, x(school=l job=l)
• prchange school, x(income=10000) help
• prtab income school, rest(mean)
• prgen income, ffom(O) to(10000) x(school=l) rest(mean)
• prvalue, rest (mean)
• prvalue, x(income=10000 j ob= 1) rest(mean)
Ordinal and Multinomial
• ologit dv ivl iv2 // Ordinal
• ologit grade income school job if gender==0, nolog nocon
• oprobit dv ivl iv2
• omodel logit card income school job // Approximate LR test
• mlogit dv ivl iv2; /* Nominal */
• mlogit mode income school job, basecategory(l) nolog
• mlogtest, lr
• mlogtest, wald
• mlogtest, hausman base
• mprobit // multinomial probit regression
• clogit dv ivl iv2, group(var) // Conditional logit
• clogit mode income school job, group(gender) nolog
• nlogit // nested logit regression
Special Logit/Probit
• asmprobit // alternative-specific multinomial probit
• biprobit (dvl=rhsl) (dv2=rhs2) // bivariate probit
• glogit //logit and porbit for grouped data
• heckprob dv rhs, select(rhs2) // probit model with selection
• hetprob // heteroskedastic probit
• ivprobit // probit model with endogenous regressions
• rologit // rank-ordered logistic
• scobit // Skewed logit
• slogit // sterotype logistic
• xtlogit // logit models for panel data
• xtprobit // probit models for panel data
Event Count Data Models
• poisson dv ivl iv2
• nbreg dv ivl iv2
• zip dv ivl iv2 // zero-inflated Poisson Model
• zinb dv ivl iv2 // zero-inflated NB Model
• ztp dv ivl iv2 // zero-truncated Poisson Model
• ztnb dv ivl iv2 // zero-truncated NB Model
Truncated/Censored/Self-selected
• cnreg // Censored-normal regression
• heckman // Heckman selection model
• ivtobit // Tobit model with endogenous regressors
• tobit // Tobit regression
• truncreg // truncated regression
• ztp dv ivl iv2 // zero-truncated Poisson Model
• ztnb dv ivl iv2 // zero-truncated NB Model
Related Commands
• bootstrap // bootstrap sampling and estimation
• bsample // Sampling with replacement
• jackknife // Jackknife estimation
• impute // imputation
• permute // Monte Carlo permutation test
• simulate // Monte Carlo simulation
• sampsi // sample size and power determination
T-Test
• ttest grade==10
• ttest grade, by(male)
• ttest grade, by(male), unequal welch
• ttest math=english; ttest math==english, unpaired
• ttesti 100 88.1 5.2 90; /* N mean sd hypothesis */
• ttesti 100 88.1 5.2 200 91 10.2; /* N1 meanl sdl N2 mean2 sd2 */
• ttesti 100 88.1 5.2 200 91 10.2, unequal
• mean // estimate means
• total // estimate totals
• ratio // estimate ratios
• proportion // one- and two-sample tests of proportions
• ci // confidence intervals for means, proportions, and counts
ANOVA
• anova score gender
• anova score gender year gender*year
• oneway score gender, tabulate
• loneway //large one-way ANOVA, random effect, and reliability
• sdtest // Variance-comparison test
• Related: .manova; .pkshape; .xtmixed
Correlation Analysis
• correlate gnp interest inflation
• corr gnp interest inflation, covariance
• pcorr xl-xlO // partial correlation coefficients
• pwcorr gnp interest inflation, sig
• pwcorr gnp interest inflation, print(5) // .05 significance level
• pwcorr gnp interest inflation, sig star(.05) // .05 level
Factor Analysis
• factor xl-x30 // by default pcf (principal component factor)
• factor xl-x30, ml // maximum likelihood factor
• factor xl-x30, factors(5)
• rotate, varimax // orthogonal, oblique, quartimax, equamax, parsimax, promax
• pea // principal component analysis
Other Analysis
• alphar // Cronbach's alpha
• ca // correspondence analysis
• canon // Canonical correlation
• cluster // cluster analysis
• mvreg // multivariate regression
• manova // multivariate MANOVA
• mds // multidimensional scaling for two way data
• mdslong
• mdsmat
• biplot
NONPARAMETRIC ANALYSIS
• swilk math english
• sffanciaxl-xlO
• ranksum //Equality tests on unmatched data
• signrank math=english // Equality tests on matched data
• runtest // test for random order
• spearman xl-x 10
• kwallis score, by(gender)
• ksmimov math, by(area)
• alpha xl-xlO, item
• kappa evall eval2
• bitest //Binomial probability test
• prtest // one- and two-sample tests of proportions
Graphics Basics
• sysuse auto, clear
• graph bar (mean) mpg turn, by(foreign)
• graph bar (mean) mpg turn, over(foreign)
• graph hbar (mean) mpg turn, over(foreign)
• graph hbar (mean) mpg, over(foreign) over(class)
• graph hbar (mean) mpg, over(class) over(foreign)
• graph hbar (mean) mpg, over(class) over(foreign, sort(l) descending)
• graph hbar (sum) mpg turn, over(class) stack
• graph hbar (sum) mpg turn, over(class) by(foreign)
• graph hbar (sum) mpg turn, over(class) by(foreign) stack
• graph dot (mean) mpg, over(class)
• graph dot (mean) mpg, over(class) over(foreign)
• graph matrix mpg price turn, half
• graph pie, over(class)
• graph pie mpg turn trunk, plabel(_all name)
Scatter and Two-way Plotting
• scatter mpg weight
• scatter mpg weight, sort
• scatter mpg weight, sort connect(l)
• scatter mpg weight, sort title("MPG versus Weight") subtitle("Year 2006")
• scatter mpg weight, title("MPG versus Weight") caption(" Source: Stata Corp. 2006")
• scatter mpg weight, title("MPG versus Weight") xsize(4) ysize(3)
• scatter mpg weight, ytitle("MPG (Mileage)") xtitle("Car Weight")
• scatter mpg weight, title("MPG versus Weight") ylabel(#8) xlabel(0(2000)6000)
• scatter mpg weight, title("MPG versus Weight") ylabel(minmax) xlabel(minmax)
• scatter mpg weight, title("MPG versus Weight") yscale(log) xlabel(#5) // log scales
• scatter mpg weight, sort xline(4000) yline(25)
• scatter mpg weight, title("MPG versus Weight") msymbol(triangle)
• scatter mpg weight || fpfit mpg weight
• twoway fpfitci mpg weight
• twoway fpfitci mpg weight || scatter mpg weight, m(d)
• scatter mpg weight, sort title("MPG versus Weight") m(diamond) by(foreign)
• scatter mpg weight, sort m(t) by (foreign, total row(l))
• twoway fpfitci mpg weight, sort m(t) by(foreign, total row(l))
• twoway fpfitci mpg weight || scatter mpg weight, sort m(t) by (foreign, total row(l))
• scatter mpg turn weight
• scatter mpg turn weight, yline(30) xline(3500)
• scatter mpg trunk turn weight
• scatter mpg weight || scatter trunk weight || scatter turn weight
• scatter mpg weight, sort c(l) || line trunk weight, sort || scatter turn weight
• twoway (line mpg weight, sort c(l)) (dropline trunk weight, sort) (scatter turn weight)
Plotting by Functions
• twoway function y=xA3, range(-5 5) xsize(4) ysize(3) xlabel(#10) xline(0)
• twoway function y=normalden(x), range(-5 5) xsize(4) ysize(2) xlabel(#10) xline(0)
• twoway function y=l/sqrt(2*_pi)*exp(-xA2/2), range(-5 5) xsize(4) ysize(2) xlabel(#10) xline(0)
• twoway function y=normalden(x), range(-4 -1.96) xlabel(#10) xline(0) recast(area) || function y=normalden(x), range(1.96 4) recast(area) || function y=normalden(x), range(-1.96 1.96) lstyle(foreground)
• twoway function t=tden(3, x), range(-5 5) xsize(4) ysize(2) xline(O)
• twoway function t=tden(l, x), range(-5 5) xsize(4) ysize(2) color(blue) lstyle(plsolid) xlabel(-5(l)5) recast(area) || function z=normden(x), range(-5 5) color(maroon) lwidth(thick)
• scatter gear ratio headroom, xsize(4) || function y=x, range(0 5)
• twoway function c=chi2(l,x), range(0 5) xsize(4) ysize(3) yline(.5)
• twoway function c=Fden(5, 10, x), range(0 5) xsize(4) ysize(3) yline(.3)