EpiData Analysis: Command and Function Reference Guide          Document version 1.1

Read & Save Data, sorting
View and edit data
Frequency, Cross and Summary Statistics Tables
Descriptive analysis & testing
Life table & Kaplan Meier plot
Graphs
SPC graphs - Pareto Charts, Ichart etc.
Show results, output, files and run scripts
Select observations
Generate/Change variables
Label data
Clean up & stop
Information
Disk and file commands
Setup parameters
Programming commands

Changes in this version
Obsolete Commands
Key differences from Epi Info v6 (click)

Syntax: command variables [/option] [/option=a|b|] [if condition] [ ]: optional specification. a|b|... indicates alternative choices


Top Read Data, Save Data etc.
read read [filename[.{rec|dbf|csv}]] [/close] [/CB]
Read a copy of the data file into memory Options:
  • /CB : reads separated data from the clipboard
    E.g. go to a spreadsheet, mark block, press Ctrl+C, then issue this command. Also found in menu-file
  • /close will clear memory - use if you have another file open
    /Close is only needed if you have changed the data. E.g. labels, add new variables, define missingvalue
  • Reading records marked for deletion is controlled with: set read deleted=ON[OFF]
Without parameters, the open file dialogue is started

Note: The default working folder changes when you use the open file dialog. Not if you provide the whole filename
savedata savedata filename [field list] [/replace]
Save a copy of all variables in memory to a file - to use the data again later Options:
  • /replace : Overwrite existing file
    Note: Chk files are also replaced or deleted with option /replace
Saves all variables, but only records included in current select.
append append [var1 var2...] [/file=filename[.{rec|dbf|csv}]]
Add records (observations) after all observations in current file Options:
  • /filename= :append this file - without /filename the file open dialog is shown.
- only fields with same name as variables in memory will be read
Variables from previous read which are not in the appended file will be set to missing for the appended records
merge merge key1 key2 ....keyn [/file=filename] [/table] [/update|/updateall]
merge the current data file to another data file based on key variables
Default: Values from all variables in memory are left unchanged.
Options:
  • /file : Name of external file to merge
  • /update : All non-missing value in the extenal merged file replaces values in common variables in memory
  • /Updateall : All values for common variables taken from external merged file
  • /table: The external file is used as a lookup table.
    E.g. to add person information to a file with clinical results.
    Without the /table option combination of key variables must be unique
To keep information in variables with same name (e.g. mergevar or name) from all files rename the variables before you merge the next file. E.g. read pt;rename name to ptname; merge hospid hospital;rename name to hospname; etc ....

After merge the variable mergevar indicates source of information for each observation (records). mergevar is defined with variable labels for these values: 1 : Only in memory (original) 2 : Only in external file 3:In both

aggregate
agg
aggregate variables /Options /statistics
Aggregate - collapse - combine - data when you wish to change from individual to group level. See explanation in Stattables for all Options. Options:
  • /close : replace current data in memory with result of aggregate
  • /save=filename
  • /replace : With save, will overwrite your data !!
  • Further Options:
    • Use names of statistics as indicated in Stattables, but in aggregate use the form:
      / in front of the name, e.g. /mci /mean /min /max,
    • Example: With aggregate: aggregate school class /min="age weight" /max="age weight"
    • Example: With stattables: stattables age weight /stat="min max" /by="school class"
    Respects current select
sort sort variable1 [variable2] [variable3 ...]
sort the current dataset by one or more variables
Top Common analysis commands
count
count if
count
count if [logical statement]
Counts number of records.
With count if (logical statement) only records for which the logical statement is true are counted.
Count is saved as result variable $count var result to see names and structure.
Note: if must be used with caution with float variables
count if 10*dectime >= 31 // is ok if you wish to count records where dectime>= 3.1
Remember the user has the responsibility to control that complex if statements work correctly
describe
des
describe [variable1 ] [variable2 ...] [/Options]
Descriptive distributional statistics for each numeric field. describe will describe all variables. Options:
  • /NM: Records with missing in any variable are excluded.
  • Save as a data file with aggregate
  • Estimates saved as result variables, try result to see names and structure.
  • Specify design: set table design=line[box][filled][shaded][...]
Without variable names all are used. (des /NM does not work, indicate: des * /NM)

Percentiles can be imprecise for small number of observations (< 11)
Note percentiles are weighted for distance btw. observations. 
Ref: Bland M.Introduction to Medical Statistics, Oxford, 1995. Second Ed. p 55

means
mea
means variable1 [/by=variable2] [/T] [/q]
basic descriptive statistics for numeric variable1, optionally stratified by variable2
Options:
  • /by=variable2 : Stratify by this variable
  • /t :test for homogeneity of the mean across strata. (same or different mean)
    Including Bartletts test for homogeneity of variance with more than two strata
    F-test is provided, can be interpreted as T-test with two strata
  • Without /by and with /t : Test that mean=0
    (e.g. as a paired T-test for difference in before and after measure)
  • Estimates saved as result variables, try var result to see names and structure.
  • Specify design: set table design=line[box][filled][shaded][...]
    Anova table: set table design summary=line[box][filled][shaded][...]
    NoteConfidence Intervals given are based on the T-distribution with N-1 degrees of freedom.
kwallis kwallis variable1 /by=variable2
Kruskall-Wallace analysis of variance, where variable2 is a categorical factors

Specify design: set table design=line[box][filled][shaded][...]
regress regress yvar xvar1 [xvar2 xvar3 ...]
linear regression with yvar as the dependent variable
Maximum of 5000 records
Specify design: set table design=line[box][filled][shaded][...]
correlate
cor
correlate var1 var2 [var3 ...]
Calculate correlation coefficients between all variables. Record limits same as regression.
** if correlation is undefined a floating point error will be generated
Specify design: set table design=line[box][filled][shaded][...]
Top Tables
freq
fre
freq variable1 [variable2 variable3 ...] [/Options]
Frequency distribution for each variable.
 (Alternative to Freq: Tables ..../F or for counts only "Stattables, e.g. Stab variable1 Options:
  • /M include observations with missing data (.) in each variable
  • /NM Records with missing in any variable are excluded.
  • /CUM Add Cumulative percentage
  • /C Add percentage
  • /CI Include confidence interval
    CI is based on the proportion of all observations in each row (Altman et al. Statistics with confidence, London, BMJ books. ISBN 0 7279 1375 1, 2nd edition,2005, p 47)
  • /D1 /D0: Use one or zero decimals for the percentages, default is 2
Specify design: set table design freq=line[box][filled][shaded][...]
Notice you can obtain a table and graph of CI for several variables with CIPLOT

Notice you can obtain a graph of CI for several variables with CIPLOT
tables
tab
tables variable1 [variable2 variable3 ...][/Options]
The tables (brief: tab) command shows frequency or cross tables for the variables chosen.

Default table without Options:

  • Type
    • one variable: frequency table.
    • Two variables: crosstable.
    • 3+ : First two variables stratified by remaining variables
  • Sorting depends on table type.
    • tables with epidemiological estimation (Odds Ratio, RR or outbreak table): Descending on value
    • All other tables: increasing on value.
    • Only counts are shown. To add estimates and/or percentages see Options below.

With user specified Options many aspects can be controlled:

  • Specify desired type of table :
    • Frequency tables: /F see freq
         (/F overrules other Options)
    • Compact Tables - notice outcome is given as first variable
      • Case-Control table: /CT /O Odds ratio and 95% CI
      • Risk ratio table: /CT /RR Risk Ratio and 95% CI
      • Outbreak analysis: /CT /AR [or /OA] Attack rates and RR with optional /CI: 95% CI for attack rate See epicurve
      • Strafied table of proportions with Confidence Interval: See ciplot
    • General: /FV: Show single cross tables of first by all other variables
    • Summary table of N and selected statistics: /S (Relevant with more than one strafied table)
  • Percentages and missing:
    • Crosstables: Row Column Total percentage: /R /C /RP / CP /TP
    • Frequency tables: cumulative percentage: /CUM Row: /C
    • Number of decimals in percentages: /D0 /D1 /D2
    • Place each type of percentage in single column: /PCT
      (Without /PCT all chosen percentages are written in the same column)
    • /M: include missing (.) or defined missing values in tables.
  • Estimation and testing:
    • /T:Chi2
    • /EX: Exact test
    • /GAM: Goodmann & Kruskals gamma
    • /ADV: extended and advanced tests
    • /OA Outbreak table with attack rates (cohort assumption) /CI: add Confidence Interval to estimates (outbreak tables)
    • /CI: 95% CI in frequency tables
      CI gives the CI for the proportion of all observations in each row
      (Altman et al. Statistics with confidence, London, BMJ books. ISBN 0 7279 1375 1, 2nd edition,2005, p 47)
  • Epidemiological tables (2x2 table): See also Options /OA /CC /FV
    • Estimation: /RR : Risk Ratio /O : Odds Ratio /OEX: exact MLE based estimates (as Epi6)
    • Default: highest value of variables considered exposure and case.
    • /SA: Reverse outcome and exposure to take low value,
      e.g. case=0 and exposure=0 (when coded 0 & 1 ). See sorting below.
       (Notice Odds Ratios are from v2.2.2 standard Mantel-Haenzel estimates, unless you use /OEX)
       Estimates can be imprecise with small number of observations. It is always the responsibility of the user to ensure if there is a "sparse data" problem in an analysis by looking at the stratified tables
  • Sorting
    • Indicate by /Sxxx where the x indicate:
      R:row C:Column A:Ascending D:Descending T:Total L:label (else numerical)
    • Accepted combinations:
      • On value of category for Row and Column: /SA /SD
      • On label (text sort) for row and column: /SLA, /SLD
      • Row and/or Column Totals: /SRAT, /SCAT, /SRDT, /SCDT
      • Indicate specific column or row: /SRD=x, /SCD=x, /SRA=x, /SCA=x (e.g. /sca=2)
      • For string variables use "label" sorting, e.g. /SLA /SLD
      • Frequency tables: /SA /SD /SLA /SLD /SRAT /SRDT

  • Content and specification table :
    • Hide tables: /Q: all /NT:subtables. /NC: unstratified (crude) table
    • /OBS: show observations as 2x2 table in outbreak and summary tables
    • /W=variable: Use number of observations in the variable as frequency weight
    • metadata:
      • value labels:
        /v : show only values
        /vl: show values and value labels
      • Variable labels
        /vn: show variable name
        /vnl: show variable name and label

  • Specify design:
    • set table design=line[box][filled][shaded][...]
    • Specify design summary table: set table design summary=line[box][filled][shaded][...]
    • See also other set

  • Notes: Several result variables are saved, including the table: try result to see names.
    Match removed (try: tab disease outcome matchvar)
stattables
stab
stattables variables /stat="...statistics key words ..." [/by=] [Options]
Show a collapsed table with the same summary statistics for all the variables optionally grouped or stratified Options:
  • /by="group variables" : use for aggregation level
  • /strata=variable : repeat table for each value of stratification variable
  • /header=", , , , , , ," : Use to specify your own header, e.g. /header=",Mean,(CI),Max"
    You can include valid html specifiers om the definitions, e.g. line breaks <br;>
  • /page : add page change after each stratum
  • /close : close current data and use summary data set instead Note !! any unsaved data will be lost
  • /save="filename" Save summary table to a new file, to replace add: /replace
  • /m Include observations where byvariables have missing values
  • /q Hide table
  • Statistics: /stat=".....keyword for type of statistics ......"
    • mean SD max min p5 p10 p25 p50 p75 p90 p95 p99 sum
      • short forms, will show (contents):
      • des (min median max)
      • iqr (p25 p75)
      • idr (p10 p90)
      • isr (p5 p95)
      • mci (mean +/- 1.96*sqrt(sd/n) = 95% Confidence Interval)
      • mv (counts of defined missing values and blanks for each variable)
      • n Number of observations
      • nv Number of valid observations for each variable

Example Stab age weight /by="sex class" /stat="mci min max"
Specify design: set table design=line[box][filled][shaded][...]
See also: Aggregate

set parameters for Tables All tables:
set table design=[stat][system][freq][summary]=line[box][filled][shaded][system]... (Design of tables)
Tables with percentages (Options: /r /c /to /pct)
set table percent format col="P1{}"     (Col Percents format, e.g. "P2 %"
set table percent format row="P1()"     (Row Percents format, e.g. "P0[]"
set table percent format total="P0[]"    (Row Percents format, e.g. "P0[]"
set table percent header="%" (Contents of column header for percents)
set table percent header [row][col][total]="%" (Contents of column header for percents row/col/total one at a time !)
set table ct or header"= "OUTCOME:,CASE,NON CASE,N,EXPOSED,NON EXPOSED"
set table ct rr header"= "OUTCOME:,EXPOSED,NOT EXPOSED,N,N,ILL,RR,AR (%)"
Statistics:
Specify confidence interval text:
set TABLE CI FORMAT [HEADER]="()-"
set TABLE CI HEADER="(95% CI)"
Top Life Tables and Kaplan-Meier Plots
lifetable
ltab
lifetable outcome time [/by=group variable] [/Options]
lifetable outcome TimeStart TimeEnd [/by=group variable] [/Options]


The lifetable command creates a standard life table and Kaplan-Meier curve depending on Options. The time variable is read as integer. If a float variable is used a default interval of 1 will be used.
  • Indicate observation times in one variable: lifetable outcome time
  • Indicate two variables lifetable outcome Starttime Endtime
    where the command will subtract time = (End - Start) and use this in the analysis.
  • Default: Missing data handling:
    • missing value in outcome variable: observation excluded.
    • missing value in time variable (-s): observation excluded
  • /MT ("Missing Time"): highest non-missing value assigned when time variable is missing
         (second time variable with two variables)
       All observations assigned time with /MT option are censored at exit.
  • /Exit=value: Observations are censored on this date if their END time is later than value.
    e.g. ltab out time1 time2 /MT /exit="01/08/2009"
    would censor observations on August 1st 2008 and assign this date when time2 is missing.
Options:
  • General
    • /BY=variable : Variable used as group indicator
    • /O=x : Use the value x to indicate outcome value (1 is default).
    • /I : apply intervals as defined by Set lifetable interval see below.
    • /I=bx : aggregate by the value x, e.g. /I=b7 would aggregate data to week if time is date or days
    • /I="x,y,z,,,,": aggregate at indicated values e.g. /I="0,3,10,100"
  • Hide/show:
    • /NG do notshow the Kaplan Meier plot
    • /NT : do not show summary documentation table
    • /NOLT: do not show lifetable
    • /E0 /E1 /E2 /E3 /E4 : show results with this number of decimals
    • /NOCI : Hide confidence intervals in tables and graph (e.g. if graphs overlap for groups)
  • Estimation:
    • /time=x :Calculate Survival estimate at indicated time value (values)
    • /p25 p50 p75 :Calculate Time span estimates at Quartiles and Median of Survival Proportions (0.75 0.5 0.25)
    • /t : Log-rank test and Hazard Ratio (only relevant with /by)
    • /ref=x : Indicates value of /by variable to use as reference for comparisons
      Difference from reference is estimated for /time /p and /t Options
    • /adj : Censored observations contribute to half the time of period (default: censored at start of period)
  • Other:
    • /CLOSE: close current file and replace with life table.
    • /MT /EXIT=value see explanation above
    • Most other standard graph Options apply.
Confidence Intervals and other estimates are based on the formulae found in Altman et al. Statistics with confidence. BMJ books. Second Edition p94. In this the std. errors are adjusted for effective samplesize (died+still in observation) and there is no weighting for censoring.
With the Statistics dialog only the lifetable is shown lifetable outcome Time /NG
With the graph dialog lifetable outcome Time /NOLT is used
set parameters for life tables set table design=[stat][system][freq][summary]=line[box][filled][shaded][system]... (Design of tables)
set lifetable header="INTERVAL,NAT RISK,DEATHS,LOST,SURVIVAL,STD. ERROR"
set lifetable interval="0,7,15,30,60,90,180,360,540,720,3600,7200,15000"

set TABLE CI FORMAT [HEADER]="()-"
set TABLE CI HEADER="(95% CI)"
Top Basic graphs
bar bar variable1 [/by=...]
A bar graph shows counts of the given variable with a categorical x axis. Only values included in the variable will be shown. Bars are centered at each tick mark, and value labels (if defined) are shown. Compare with definition of histogram
Options:
  • /by=varname: show a bar chart by another variable
  • /PCT : use percentages instead of counts on Y axis.
  • /v /vl /vn /vnl changes appereance of labels
histogram
his
histogram variable1[/xmin /xmax /xinc] [/by=...]
A histogram shows counts grouped into into "bins", but scaled on the X-axis. Each group (bin) is started at the tick mark, but centered on the tick mark with /by. If By variable has more than four values use bar Compare with definition at bar
Options
  • /by=varname: will technically show a bar chart by another variable - check axis
  • /PCT : use percentages instead of counts on Y axis.
  • /Bins=x : Group into x bins (e.g. /bins=7)
  • /start=x The first bin is started at this value
  • /width=x The width of each bin has this datavalue
    Bins takes precedence over width
boxplot boxplot variable1 [variable2 ....] [/out] [/by=...] [/R] [/P1090]
Box and whisker plot of field. The box shows interquartile range (25-75) with median highlighted.
Whiskers cover the interval from (p25- 1.5* interquartile range) to (p75+1.5*interquartile range),
    (But only if a data value is present, otherwise the nearest inside value is found Options:
  • /P1090 Length of whiskers are at 10 and 90 percentiles
  • /R Length of whiskers are at minimum and maximum of data
  • /out will show all outliers outside end of whiskers with a circle for each observation
  • /by=varname show one box for each value of varname
  • If "/by" boxes overlap, then increase size x. E.g. /sizex=600
line line xvar yvar [yvar2] [yvar3 ...] [/by=...]
line plot of one or more yvar against xvar; multiple y variables may be plotted against xvar
scatter
sca
scatter xvar yvar1 [yvar2 ...] [/by=...]
scatter plot of yvar against xvar; multiply y variables may be plotted against xvar
Options
  • /BY: a plot of xvar by yvar1 will be made for each value of "By variable"
dotplot dotplot variable [/by=group variable] [/Options]
Dotplot shows one dot per observation. A small displacement is added to the value, such that all observations can be seen. If overlap between groups happen, then extend width of graph by e.g. dotplot var /sizex=600.
The variable chosen is used as Y-axis
Options:
  • /c: Center dots instead of left align
  • /M : Include missing values in the group variable
  • /DI=x : Value for x-separation of dots. Default is /DI=0.015. To separate more try, e.g. /DI=0.045
  • Most standard graph Options apply
cdfplot cdfplot variable [/Options]
CDFplot shows a scatter plot of cumulated percentage points (counts) with variable used as X-axis
Options:
  • /P: Calculate and show a probit diagram instead.
  • Probits are calculated as inverse normal of halfway points for cumulative percentage on x axis
  • /All: Do not aggregate on X axis values. Use this with float variables or when you wish to see all points.
  • Most standard graph Options apply
ciplot ciplot outcome variables [/Options]
CIplot shows a table and a plot of proportions of outcome with 95% Confidence intervals in strata defined by individual values of the remaining variables.
Options:
  • /O=x : Use the value x to indicate outcome value. Default is highest non-missing value
  • /NOTOT : remove crude (total) estimate from the graph
  • /NOCI : remove coloured line indicating crude confidence interval
  • /NL : Do not show vertical separation dotted lines btw. by variables
  • /NT : Do not show summary table
  • /NM Records with missing in any variable are excluded.
  • Most other standard graph Options apply.
Number of observations depend on included variables. Use /N to see in the graph the number of observations
epicurve epicurve outcome time [/by=group] [/Options]
Epicurve shows development in a possible epidemic as stacked bars on each day from start to end of data.
Example: epicurve case dayonset /by=floor /legend /xa /frame /tab
Options:
  • /nt: Do not show a table of number of cases, min and max date of outcome
  • /Yinc : change default y ticks from 1.
    Notice that this also changes the number of cases per line in the graph
  • /xa: alternate label on x axis
  • EpiCurve shows bars centered on tick mark, not from tick mark
  • To indicate date for x-axis: EpiCurve outcome dayonset /xmin="01/10/2005" /xmax="31/07/2005"
    Notice that the dates must be DMY formatted not MDY.
  • Most other standard graph Options apply
  • Result variable naming $case# varies with coding of "by" variables
pie pie var1
pie chart of frequencies of the values of var1
Percentages can be incorrect in some instances
erasepng erasepng [/noconfirm] [/all]
Erases graph*.png files in current folder shown on statusbar left side. Confirm each erase. Options:
  • /all includes ALL files of type png (*.png) regardless of name
  • /noconfirm NOTE: deletes without confirmation
  • /d deletes all graph*.png created earlier than current date
Common Graph Options /save="file.png[.wmf][.bmp]" /xlabel="variable"
/text="x,y,text,box" /ti="title" /sub="subtitle" /noedit /nolegend

  • Overall
    • /by=varname: Group the graph by the indicated varname (scatter, line, box, bar, dotplot)
    • /edit : manual specification of a large number of graph Options
    • /q : do not show graph (useful for just saving a graph)
    • /save=" " : save file with that name and type - default is png with name of current time. File cannot exist,
    • /replace : replace file if existing (used with /save).
    • /xlabel=variable : Use contents of variable as label along x-axis. Works only in SPC charts
    • Show labels or values: /v (include values) /vv (values only)
        /vn (name + label) /vnn (only name)

  • Text, size and content of graph:
    • /legend : Show legend
    • /ti="....." Use text as title
    • /sub="....." Use text as subtitle
    • /fn="....." Use text as footnote
    • /xtext="....." Use text as description below the X axis
    • /ytext="....." Use text as description to the left of the Y axis
    • /n Show number of observations in upper left corner of graph
    • /sizex=value /sizey=value change physical size of graph to values

  • reference lines, text and values:
  • /VGRID /HGRID : Show vertical or horisontal lines inside graph area
  • /FRAME : Frame the graph area
    Text and lines can be used several times in the same graph:
    • /xline= /yline= Add reference line __ at x or y value, e.g. /xline=12 /yline=500
    • /xlined= /ylined= Add reference dotted line ..... at x or y value, e.g. /xlined=12 /ylined=500
    • /Text="x,y,text,box": add text at x,y in pixels from top left. Box= 1:yes 0:no. E.g. /text="120,50,my text,1"
    • /yvalue: add box with counts (works only in SPC graphs)

  • Axis :
    • /xmin /xmax /ymin /ymax sets axis scales
    • /xinc /yinc defines increase of tick marks on axia. (Scatter, line and SPC plots)
    • /NoXTICK /NOYTICK: hide tick marks
    • /NoXLabel /NoYLabel : Do not show labels at the tick marks
    • /X90 /X45 X-axis unit texts at 90 or 45 degrees instead of horisontal
    • /XA : alternating X axis label texts
    • /YLOG /XLOG : use logaritmic scale on axis
    • /XINV /YINV: reverse axis from high to low values
    • /XHIDE /YHIDE: Hide axis

  • color setting:
    Set Graph Colour="1234567890"
    Use this to select colour sequence for graphs
    Indicate 10 numbers each from 0 to 9. Colour codes are:
    Red, Blue, Black, Green, Yellow, White, SkyBlue, Fuchsia, Gray , Aqua;
  • Colour for graph elements
    Set Graph Colour Text="2133"
    Defines colour for "tfxxxyyyb" where t:Titles f:footnote axis: xxx yyy (Axis,tickmarks,texts) b:other, e.g. text boxes
    Indicate 9 colour codes from 0 to 9. Default is: "213333333"
    Note: White (6) can be used to hide a text element, but it is more efficient to set to " ", e.g. titles or footnote.
    /bw: all colour in black and white
  • symbol setting: (line, scatter and pareto)
    Set Graph Symbol="1234567890"
    Use this to select symbols for graphs containing symbols
    Indicate 10 numbers each from 0 to 9. Sequence codes are:
    Circle, Upwards Triangle, Downwards Triangle, Cross, Star, Diamond, Left triangle, Right triangle, square, Small Dot
    The "Small Dot (9)" can be shown to minimise or hide the symbol.
set parameters for Graphs set option graph= "/sizex=value /sizey=value Default Options for all graphs (width of graph, e.g. value=600)
set graph savetype=png[wmf][bmp] (Which type of file to save)
set graph clipboard=on[off] (copy graph to clipboard after creation)
set graph footnote=text (footnote for graphs - default: EpiData Analysis Graph)
set graph filename show=off[on] (show name of file below the graph)
set graph filename folder=off[on] (Include folder in graph file name)
set graph font size=value (font size - default 10. Titles are scaled relatively)
set graph colour="1234567890" See above
set graph colour text="2133" See above
set graph symbol="1234567890" See above
Top Select observations
select if
select
select [logical expression]
work with selected records
- multiple select commands are joined by and
- without parameters, clears all select commands.
Current select will be shown when running analysis commands
Warning: Be careful when you select on float variables, e.g. v1 > 3.1
To test this use "browse ...." Make sure program handles missing data as you expect
Note: String variables are compared excluding trailing blanks
To exclude leading blanks use trim function:
e.g. select trim(User) = "Jorge" or select trim(upper(User)) = "JORGE"
Select cannot be used with UPDATE
temporary
select
follow another command with if (logical_expression)
processes the data file using only those records for which logical_expression is true
- for complex logical expressions, use parentheses; they are optional for simple expressions
Note also: Options must be placed before if
Always control what happens with missing data and with float variables with many decimals
Note: String variables are compared right trimmed, that is without trailing blanks.
e.g "Lion " is the same as "Lion", but not the same as "  Lion".
But make sure upper and lower case works correctly !!!
You can query by asking the user. E.g. "count if sex = ?Write value 1 2?"
Current if and select will be shown when running analysis commands
Top Statistical Process Control
pareto pareto groupvar [/Options]
SPC: A Paretodiagram shows a bar chart of the variable, where columns are sorted in descending order of frequency. Superimposed is a line showing cumulative percentages.
Counts are shown with the left Y-axis and Cumulative Percentage with the right Y-axis.
Pareto charts are in particular suited for decision making, when multiple outcomes are possible for a given situation. e.g. To find aspects responsible for 80 percent of errors among many possible causes.
Options:
runchart runchart measurement [time] [/Options] SPC: A runchart shows the median of the measurements

Without a time variable sequence of observations is used as x-axis
The chart includes median and indication of tests of special cause
Runchart is a process control type graph.
Runs and tests adapted in v2 (ref: )
Options: SPC Options, general graph Options and SPC tests for attributable cause
ichart ichart measurement [time] [/Options]
SPC: I-chart (Individual - also called XMR chart) showing overall mean, control limits and actual measurements
Without a time variable sequence of observations is used as x-axis
"Measurement" can be any continous measurement or count at that time
pchart pchart count total [time] [/Options]
SPC: A P-chart is created with the proportion of count/total for each time value.
Without a time variable sequence of observations is used as x-axis
The chart includes overall mean, control limits and proportions
P-chart is a process control type graph
Options: SPC Options, general graph Options and SPC tests for attributable cause
The sampling basis for pcharts is a binomial process.
xbar xbar measurement time/sequence
A X-bar chart is created with mean value of measurement for each subgroup. A subgroup is one time/sequence or date value used for grouping individual measurements of different samples or observations observed at the same time. Subgroup values serves as X-axis. If any X-value has only one observation there will be a zero value shown for Sigma (or Range) at this X and the point value shown in the Xbar Chart.
Xbar-chart is a process control type graph. Notice problems can occur with some date variables
It is displayed together with either:
Range chart which indicates the range between Max and Min measurements within each subgroup
or Sigma chart which indicates the process variation using a weighted method. (Xbar and S should always be used when subgroup size >1)
The sigma limits for the S-chart is calculated with varying limits depending on n in each subgroup. The Sigma-average is an aritmetic overall average. Currently we are looking into the optimal way of calculating this, since there is some disagreement in the litterature. Current implementation follows the principle mentioned in Hart & Hart, page 331, but with an aritmetic average of the individual Sigma-bar's .
uchart uchart count volume [time] [/Options]
A U-chart is created with the ratio of count/volume.

Without a time variable sequence of observations is used as x-axis
The chart includes overall mean, control limits for the ratio
U-chart is a process control type graph. The basis for the count is "defectives", e.g. falls in a hospital ward. The total is the observation volume, which typically varies btw. time points.
Options: SPC Options, general graph Options and SPC tests for attributable cause
/Per=x Use x as multiplicator (e.g. to show per 1000)
Cchart cchart count [time] [/Options]
A C-chart is created on the basis of counts for each data point.
Volume/total is assumed to be constant observation volume for all time points

Without a time variable sequence of observations is used as x-axis
The chart includes overall mean, control limits for the ratio
C-chart is a process control type graph. The basis for the count is "defectives", e.g. falls in a hospital ward. The total is the observation volume, which is constant for all time points.
Options: SPC Options, general graph Options and SPC tests for attributable cause
/Per=x Use x as multiplicator (e.g. to show per 1000)
Gchart gchart variable
A G-Chart is used for rare outcomes. E.g. infection following surgical procedures.
Typically the variable used contains recorded date-of-occurence and the G-Chart will calculate and graph the number of days btw. the occurrences. Each measurement in the variable is graphed as an x-value.
A G-chart is a process control type graph.
If you have data already summarised use technique such as (assume the presummarised data are in "datavar":
gen i sumdata
sumdata = datavar
if _n > 1 then sumdata = sumdata[_n-1] + datavar
gchart sumdata
Notice that for G-charts it is desirable to have many observations away from 0. For other SPC charts "good" performance is near the bottom.
Options: SPC Options, general graph Options and SPC tests for attributable cause
Options for process control charts
  • /F=y Freeze the calculations to the first y observations. (e.g. /f=12 )
    The remaining part of the graph is shown dashed. For variable limit charts (pchart, uchart, gchart, xbar) the limits for the freeze period based on initial mean, but limits on point specific data.
  • /B (one or more) divide the chart into subperiods after these observations. (e.g. /b=6 or /b="01/10/2008")
      Value can be date "dd/mm/yyyy" or "mm/dd/yyyy" of same format as the time field.
    /B=Bx Breaks for every x data points, e.g. /b=b30 would create a break by 30 observations
  • /t1 /t2 /t3 /t4 /t5: Perform one of test 1-5 and mark in the graph (see below)
    /t2=x /t3=y /t4=z /t5=w : use the value indicated for tests of the sequences. e.g. /t2=3
  • /t: Perform test 1+2+3
  • /tlimit : Sigma depend on size of samples (ref: Hart & HArt, book)
  • /tab: Add table of counts below graph
  • /nt: Hide documentation table (but mark tests if added in graph)
  • /neglcl: show negative as well as positive lower control limit values
  • /xlabel=var : Variable contains the labels to show at the X-axis.
  • /NoXLabel : Do not show any label at the tick marks on X-axis
  • /point : Only show points for observations, do not connect with a line
  • /exv=y : Exclude all points with Y-value => to value (here y)
  • /exp=x1 : Exlude observation with value of x=indicated
  • /exz : Exclude all observations with Y= zero
    Notice: /exv and /exz does not work with Gcharts
  • /noinf : Exclude text below X-axis showing central values (no central value information)
  • /SL : Show lines at 1,2 and 3 Sigma (default: only at 3)
  • /NoL : Do not show Control Limit lines (when user created values are included with /yline=)
Most standard graph Options apply
Principles and tests Many principles exist for SPC graph testing. The principles implemented in EpiData Analysis are documented further in a reference document which also includes references (see epidata.dk . In short these principles are implemented:
SPC charts use a time variable along the X-axis and the observation as the Y-value. I- and P- Charts have control limits calculated as center line ± 3x Sigma. Where 3x as default is equal to 3, but with the option /tlimit can depend on number of observations in each subsection of the graph.
The graphs do not demand a specific number of observations. The assumption for using tests for special variation is that btw. 20 and 30 observations are included in each portion (break). With fewer observations the chance of a Type II error is larger (overlooking an actual special cause) and with larger numbers of observations the chance of a type I error is larger (false positive test). Therefore users should explore the usage of the /tlimit principle. See Hart & Hart for more information.
In SPC - Statistical Process Control Charts: Notice Control Limits are NOT the same as Confidence limits. Confidence Intervals indicate limits for the mean (phrased: the mean or central line of the SPC chart is this ..., but we cannot with the current sample decide whether it could be as high as .. (upper CI) or as low as (lower CI). Whereas control limits are indications of the type "Within these limits one should expect that 99.5 % (percent depend on sigma value) of the observations would be contained given the proces is in control".
The choice of SPC chart depends on the data at hand. In technical terms which type of process generated the data. For an overview of this open the "graph" menu and choose "spc", which will show a grid assisting you in deciding which graph to choose among the implemented ones. Currently: Run-Chart, I-Chart, P-Chart, Xbar-S, Xbar-R, U-Chart, C-Chart, G-Chart, Pareto.
  • Tests indicate signs of "Special Variation" or "Attibutable Cause" in the data. Criteria depend on the type of Chart. The rules below works for all charts, unless mentioned specifically.
    • Type 1:
      • Run chart: Total number of runs and limits for expected number of runs. Points on median ignored in runs. Expected numbers based on a standard table provided n > 13 and n < 40. A run is defined as a sequence of one or more numbers on the same side of the centerline. V2 adaptation: Points on the centerline are disregarded.
      • Control Charts: Special Variation occurs for each observation outside the control limits.
    • Type 2:
      In Run-, I- and P-charts: Eight or more points in sequence on the same side of centerline (shift in the process). Values on center line are excluded from the count. The test counts number of sequences of eight points.
    • Type 3:
      In Run-, I- and P-charts: Six or more points decreasing or increasing in sequence (Trend). Sequential values of same size count as one. The test counts number of sequences of six points.
    • Type 4:
      2 out of 3 successive points more than 2 standard deviations (sigma) away from the centerline.
    • Type 5:
      test 5: 4 out of 5 successive points more than one standard deviation (sigma) away from the centerline.
Change limits: Notice that in all charts you can change the limit of when a test is positive:
  • By adding a value. E.g. /t2=6 would give a positive test with 6 points on the same side of the centerline.
  • By issuing the set command: set spc testlimit="1,6,6,2,3". The five numbers are test 1-5 (first number always 1).
Top Save & Clear output
cls cls
clears the output screen
notice F12 will do the same and can be used during execution if speed slows down
logopen logopen [filename[.{html|txt}]] [/close] [/append]
start a log file
- without parameters, the open file dialogue is started
/Close will close an open logfile.
/Append adds to existing file
/replace replace existing logfile
logclose logclose
close the current log file
Top View data
browse browse [variable1 [variable2 ...]]
browse values in a spreadsheet for all variables listed
- without parameters, browse all variables
Note that browse is much faster than list
Note that the browse window closes when you move away from "browse", unless you allow the browser to be open by: then: Set display databrowser=on
The same way "minimise" browse is equal to "close", unless you have : Set display databrowser=on
But remember that the browse window can be quickly opened at any time by F6 key.
Notice use of right click on form (sorting, copy to clipboard)
list list [variable1 [variable2 ...]] [/no] [/v /vl] [if ... ]
show values on the screen for all variables listed, with one record per line and no limit to the width of the display
- without parameters, list all variables. Values are shown - not labels.
/NO : do not show record (observation) numbers
/v /vl: control whether values or labels are shown (or both)
Note that browse is much faster than list. The choice of font might make list display incorrect
Select or If the sequence is within current select or "if". if you use list with temporary if, the number is not the same as recnumber
Update Update [variable1 [variable2 ...]] [/id=variable]
Allows grid editing of data - without parameters, works on all variables
/id=variable Indicate variable containing unique id
Update cannot be combined with SELECT
Notice use of right click on update form (sorting, select id, copy to clipboard)
Top Generate/change variables
define define var1 fieldtype [cumulative|global]
create a new variable based on an EpiData fieldtype (###, ___, "<Y>, <AAAAAA>, or valid dateformat)
- var1 will initially be missing in all records
- cumulative variables retain their values from one record to the next - not functioning
- global variables retain their values following a close command and are like constants (only one value)
gen gen var1 = expression | resultvar
create a new numeric variable based on the expression, or equal to a constant from a result variable
- equivalent to define and let, with the variable type implied by the expression
- if the result of expression is boolean, variable1 will be 0 (FALSE) or 1 (TRUE)
- Result variables are created by some commands, e.g. means and describe
IF the user specifies type, that type of variable is generated (examples):
gen s(10) var1 = expression
gen d var1 = expression
gen i var1 = expression
gen f var1 = expression
Compare with values from other records:
nbsp;nbsp; gen i age = (age - age[_n+1]) if id = id[_n+1]
nbsp;nbsp; let bmidif = (bmi - bmi[_n-1]) if id = id[_n-1] //-1 could be -4 +1 etc Notice that integer variables are maximum 4 digits. For larger integers use type float with zero decimals
Always verify generation of complex variables or logical statements.
e.g. gen .... if ... with
define ...
if ... then ...
.
generate generate value
creates a new empty dataset with value records. E.g. for simulation or testing.
- note the difference to gen command, which creates variables.
if ... then if (logical_expression) then [let] ... [else [let] ...]
evaluates logical_expression for each record; the else clause is optional
- for complex logical expressions, use parentheses; they are optional for simple expressions
- some other commands might work, but only let is practical
let [let] var1= expression | resultvar
assign a value to an existing variable; the word let is optional
- if the result of expression is boolean, var1 will be 0 (FALSE) or 1 (TRUE)
- only means and describe commands create result variables
recode recode variable1 to var2 values1 = newval1 [values2 = newval2 ...]
create or change codes for subgroups of records
- values1 takes one of three forms: a single value, a series of values separated by commas, or a range of consecutive values like 7-12 or "A"-"D"
e.g.recode v1 to v2 lo-18.499=1 18.50-hi=2 values up to 18.50,but not 18.50 gets the value 1
e.g. recode x lo-3.00=1 3.0001-4.0000=4 4.0001-5.49000=5 5.5-hi=7
- if to var2 is omitted, variable1 the original values will be lost.
recode variable1 to var2 by value (Value must be integer > 0)
- the variable1 values will be recoded to numerical variable var2 with value label indicating the limits
E.g. recode age to agegroup by 10 to recode age variable to 10 year age groups.
Note: define agegroup before recoding define agegrp ###<"/font>
Note: EpiData Analysis shows the if ... then and the labelvalue commands doing the recode
Top Label data - also called metadata
labeldata labeldata "text"
Assign the descriptive text as a label for the data file.
An existing label will be replaced with the new one.
To keep the label you must save the data.
label label var "text"
Assign the descriptive text as a label for the variable.
An existing variable label will be replaced with the new one.
To keep the variable label you must save the data.
labelvalue labelvalue var /x="text with spaces" /y=text2 /z=text3 [/clear]
Assign the descriptive text as a value label for the values (x y z)
/clear will remove any value not mentioned on the line
For several variables in sequence: labelvalue v1-v17 /1="Yes" /0="No"
Note !!! - If you change valuelabels for a variable, which shares labels with other variables then the label is changed for all the variables !!!!
Shared valuelabels are defined as part of dataentry in EpiData Entry
Note that valuelabels are automatically created by the command recode
missingvalue missingvalue var [var1-varx] /x /y /z [/clear]
Assign from 1 to 3 values as a defined missing value
/clear will remove any previous definition
For several variables in sequence: missingvalue v1-v17 /9
Top Clean up - stop
close close
stop using a dataset
- all unsaved variables and changes to existing fields will be lost
- global variables will remain in memory
quit or exit quit
exit
Exits from EpiData Analysis. Closes any open output file.
NOTE: To save data in memory before closing use the savedata command.
Automatic save of command history is done on exit. Filename defined by "set command history filename", default temp.pgm
If you write exit or quit in command prompt no confirmation question will be asked.
savepgm savepgm filename[.pgm]
saves recent commands in a program file
- without a parameter, the save file dialogue is opened
Automatic save of command history is done on exit. Filename defined by "set command history filename", default temp.pgm
clear output window cls
Clear the output screen with results - when output slows down press F12.
F12 (=cls) can be used in the middle of other commands running.
clear command buffer clh
Clear the buffer of previous commands.
- It is the same list shown when pressing F7, right click on "F7" window to clear
Notice - set commands on history. See below.
Top Set parameters
set set [parameter=value]
  • Change the value of a EpiData "set" parameter

    - without parameters, provides a list of available parameters and their current values

    - Add the set definitions to the file Epidatastat.ini file to modify from default
  • values can be a number, text or ON/OFF, see table below
  • to see current value: set [parameter] = ? e.g.: set echo = ?
  • For set's with ON/OFF: set echo = off---set echo = on
  • For any command: set option [cmd] = [Options] e.g. set option means = /t
    When the specified command is executed the Options mentioned will be added to the command.
Set commands not mentioned above are shown in the next section.
Option Default ValueComments or function
BROWSER FONT SIZE10Font size in browser and update
DEBUG LEVEL0Used for testing to write information from internal modules. When equal to 0, no information written
DEBUG FILENAMEMODULETEST.LOGFilename where debug information is written
DISPLAY COMMAND HISTORYOFFShow the command history window (F7)
DISPLAY COMMAND PROMPTONShow command prompt
DISPLAY COMMANDTREEOFFShow commandtree window (F2)
DISPLAY DATABROWSEROFFShow databrowser when a datafile is open (F6)
DISPLAY MAINMENUONShow main menu
DISPLAY TOOLBARONShow toolbar
DISPLAY VARIABLESOFFShow variable window (F3)
DISPLAY WORKTOOLBARONShow work toolbar
ECHOONWhen = on show results, off: "silent"
EDITOR FONT SIZE10Font size in editor
EDITOR PRINT INFOONWhen printing from editor add footer
HISTORY COMMAND PGMOFFAdd commands from pgm files to history. When off only the name of the pgm is shown
HISTORY COMMENTONAdd comments when found in pgm files to history
HISTORY NAMETEMPName of file to use as autosave, when closing the programme. The previous three files will be saved, e.g. as temp3.pgm temp2.pgm temp1.pgm
LANGUAGEENLanguage file abbreviation
OPTION GRAPH/sizex=600 /sizey=200Default Options for graphs, here size in pixels
OPTION TABLES/t /RDefault Options for cross- and frequency tables
OPTION SPC/sizex=600 /sizey=200 /tDefault Options for SPC graphs
OUTPUT FOLDER..folder name...Logfiles and graphs are saved here
OUTPUT NAMEEAOUTPUT.HTMName of output files
OUTPUT OPENONOpen logfile automatically
PRINT PREVIEW CMONUse cm for measures when printing
RANDOM SEED9Use as seed for random number generation
RANDOM SIMULATIONS500Number of simulations
READ DELETEDOFFRead records marked for deletion
RECODE INTERVAL TEXT-Marker to use in value labels with recode
SAVEDATA FIRSTWORDONUse filetype first word. WHen off Epi6 automatic naming will be used
SHOW COMMANDONshow commands after execution
SHOW ERRORONShow errors
SHOW INFOONhow information type feedback from commands
SHOW RESULTONShow results from commands
SHOW SYSTEMINFOOFFShow extended information (test purpose)
START PAGESTART.HTMFile to show when the programme starts (F8)
STYLE SHEET"folder name\EPIOUT.CSS"Find extended definition of output etc. on www.epidata.dk/documentation.php
File containing the style sheet
STYLE SHEET EXTERNALONIn output files refer to css file. If off, then a copy of current css definitons will be copied to the logfile header.
E.g. if you wish to copy the outfile to internet.
VAR GENERATE TYPEFDefault varible type with Gen command
VIEWER FONT CHARSETISO-8859-1Character set for output window
VIEWER FONT NAMEVERDANA,COURIERMust at least contain one proportional and one fixed type font
VIEWER FONT SIZE10Font size for output window
WINDOW FONT SIZE10Font size for other windows (prompt, variable, history, commands)
Top Information
newpage newpage
When printing the output force top of page after this line
will not added to output as "hidden" information.
type type "Text to display [@$result1] " [/class=x] [/style=" "] [/h1] [/h2] [/h3] [/h4] [/h5]
echo Text to display [@$result1]

display text on the screen;
if Options are not used the text will added as a standard paragraph (html: < p >)
Options adds html specifications: /class: (html: <p class= > text </p>).
     (h1..h5: <hx> text </hx>)
/style="valid css style definition", e.g. /style="color:blue; Font-size=0.6em"
result or globally defined variables may be displayed by putting @ before the variable name
USE ' ' to include text in type commands, e.g. for < href=' ....'>
title title "Text to display [@$result1] "
Display text on the screen as (html: <h1> text </h1>)
result or globally defined variables may be displayed by putting @ before the variable name
show show filename"
Add the contents of "filename" to the output window

The file must be plain text, e.g. NOT a word processor file, but may contain HTML formatting blocks without header.
View View filename"
View an html file in the viewer.
The file must be HTML formatted.
HelpView Helpview filename"
View an html file in the help file viewer.
The file must be HTML formatted.
rename rename oldname to newname
rename the variable from "oldname" to "newname"
var
variables
variables or var
list currently defined variable names, types, formats and labels
drop
var drop
drop variable1 [variable2 ...]
remove the listed variables from memory
keep
var keep
keep variable1 [variable2 ...]
Remove all variables not listed from memory
result
var result
result
list all current result variables and their values
- means, describe, tables and other estimation commands create result variables, e.g. $mean1 or $count
All result variables are cleared when running a new command, except for $assert and $assert_error,
See var temp clear and runtest
var temp clear var temp clear
Removes ALL result variables and all tempory global variables defined as global
$assert, $assert_error and other internal variables are also cleared
Version Version"
Compare current version of EpiDataStat.exe with latest version (requires internet)

Note: No information is transferred from your PC
Latest version is read from Http://www.epidata.dk/version/epidatastat.version if you are connected to internet.
assert assert if (logical statement)
Check if the statement is correct (will not test all observations !! Return text "Assert failed" if statement failed
E.g. assert ((pregnant = "Yes" and age < 40) or (pregnant = "No")) if id = 1
? ? (statement)
Show result of statement, e.g. a calculation or logical check. Does not depend on or check any data.
E.g. ? 241/34 ? (23>19) ? "a " + "b " + "c" ? findfile("myfile.pgm")
Top Obsolete commands
output output {describe ... | means ...}
Command replaced by new command aggregate
route command replaced by SAVEDATA and LOGOPEN commands
write command replaced by SAVEDATA and LOGOPEN commands
Top Disk commands
cd cd "directory name"
change the working director
copyfile copyfile "filespec1" "filespec2"
copy file specified by filespec1 to new file specified by filespec2
- filespec must identify only one file - do NOT include wild cards (* or ?)
To overwrite: ../replace Could overwrite your data !!
erase erase filename
permanently erase file specified by filename
- filename must identify only one file - do NOT include wild cards (* or ?)
rename file use copy and erase
To rename a file use copyfilefrom the existing file
Afterwards you can erase the existing file with erase.
dir dir [filespec]
list files in a directory
- filespec may include wild cards (* or ?)
Define design by set table design system=line[box][filled][shaded][system]...
dos
!
dos text
execute any valid MS-DOS command and return to EpiData
- dos command will open an MS-DOS window
/open : Keep window open after execution
!works only on XP+ Pc's
Top Programming aids - not normally used in interactive mode
* * [any text]
Use to document programs, usually as the first character in a line. * is not recognized in interactive mode.
\ \
Any command can be extended on next line, e.g. to specify many Options for graphs
; ;
to specify more than one command on a given command line in prompt or pgm
// [any command] // [any text]
Use to document programs and may appear anywhere on a line.
imif IMIF (logical condition) then ..... [else] ..... endif
Use to divert course in a pgm file depending on parameters, which could be acquired by "? ?"
closehelp closehelp
Will close the help window if this is open.
? ? [any command] [parameters] "?Prompt to user? [parameters]
The text between the two ? will be a prompt to the user to type a response, followed by <Enter>. The response will then be treated as part of the command.
For select if age<=?Maximum age to include?
if the user types 50 then EpiData sees select if age<=50
EpiData does no checking of the typed response before making the substitution.
run run [filename[.pgm]]
Execute sequence of commands saved in a pgm file
- without parameters, the open file dialogue is started
runtest runteset [filename|folder name]
Run all pgm's /single pgms to verify function.
- suited for testing of correct estimation etc.

Functions available in EpiData Analysis

In the following, takes indicates the variable type for each parameter and result indicates the type of the result of the function:
     s: string; b: boolean; d: date; i: integer; f: floating point; n: any numeric
parameters may be variables read from fields, new created variables, or any expression that evaluates to the correct type

Top String functions
functiontakesresultexample
length(str)silength("Abcde") => 5
lower(str)sslower("Abcde") => "abcde"
pos(instr,findstr)sipos("Abcde","cd") => 3
pos("Abcde","z") => 0
substr(str,start,len)
copy(str,start,len)
s,i,issubstr("Abcde",2,3) => "bcd"
copy("Abcde",2,3) => "bcd"
trim(str)sstrim("Abcde ") => "Abcde"
upper(str)ssupper("Abcde") => "ABCDE"
Top Arithmetic functions (including Random numbers)
functiontakesresultexample
abs(x)nnabs(-12) => 12
exp(x)nfexp(1) => 2.71828182845905
frac(x)fffrac(12.34) => 0.34
int(x)
trunc(x)
ffint(12.34) => 12.0
trunc(12.34) => 12.0
integer(x)fiinteger(12.34) => 12
ln(x)nfln(2.71828182845905) => 1
ln(0) => missing
log(x)nflog(10) => 1
log(0) => missing
power(x,a)n,nfpower(2,3) => 8
round(x,digits)ffround(12.44,1) => 12.4
round(12.5,0) => 13
sqr(x)nfsqr(4) => 16
sqrt(x)ffsqrt(4) => 2
ran(x) nnRandom integer from 0 to x. gen integer x = ran(100)
rnd(1) 1fRandom float from 0 to 1. gen float x=rnd(1)
rang(mean,sd) f,ffRandom based on mean and sd. Gen float=rang($mean1,$sd1)
Top Trigonomety functions
functiontakesresultexample
arctan(x)ffarctan(1) => pi/2
cos(r)ffcos(pi/2) => 6.12303176911189E-17
cos(pi) => -1
pi-fpi => 3.14159265358979
sin(r)ffsin(pi/2) => 1
sin(pi) => 6.12303176911189E-17
Top Date functions
functiontakesresultexample
today-d/ireturns today's date; may be assigned to a date variable or an integer
date(datestr)sddate("31/12/04") => "31/12/2004"
datestr must be of form <dd/mm/yy> or <dd/mm/yyyy>
date(datestr,fmtstr)s,sddate("12/31/04","%mdy") => "31/12/2004"
fmtstr must be "%mdy" or "%dmy".
Date separator can be anything e.g. "31-12-2004" is accepted
day(d)diday("31/12/2004") => 31
dayofweek(d)didayofweek("31/12/2004") => 5
Monday=1, Sunday=7
dmy(d,m,y)i,i,iddmy(31,12,2004) => "31/12/2004"
month(d)dimonth("31/12/2004") => 12
weeknum(d)diweeknum("22/02/2001") => 8
year(d)diyear("31/12/2004") => 2004
Top Logic functions
functiontakesresultexample
b1 and b2b,bb(1=1) and (2=2) => TRUE
(1=1) and (1=2) => FALSE
b1 or b2b,bb(1=1) or (1=2) => TRUE
(1=2) and (2=3) => FALSE
not(b)bbnot(1=1) => FALSE
not(1=2) => TRUE
iif(b,x,y)b,any,anybiif(1=1,2,0) => 2
iif(1=2,sqrt(4),sqr(4)) => 16
Top Conversion functions
functiontakesresultexample
boolean(x)nbboolean(x) => TRUE, for any non-zero x
boolean(0) => FALSE
integer(x)fiinteger(1.23) => 1
integer(s)siinteger("12") => 12
float(i)iffloat(1) => 1.000
string(x)nsstring(1.23) => "1.23"
Top Test and special functions
functiontakesresultexample
lre(x,y)nnlre($mean1,1.23456789123456) returns number of digits precision of $mean1
samenum(x,y)nbsamenum($mean1,1.23456789123456) returns true or false indicating if |x| = |y|
samenum(x,y,z)nbsamenum($mean1,1.23456789123456,10-7) returns true or false indicating if |(x-y)| < z
mv(var)variable name0,1,2Returns 0 if variable has a valid value, 1 if system missing (.), and 2 if a defined missing value
var[recnumber]ndata valueNot a function, but a way to get a value for a given record. E.g. gen i x=age[recnumber] = age[recnumber-1] or gen i x=age[_n] = age[_n-1]
findfile("filename.ext")s1 or 0Checks if the file exists and returns a 1 if so otherwise a 0.
use e.g. imif findfile("myexport.csv") then ....... endif

Top Operators used in EpiData Analysis
operatorsyntaxresultmeaningexample
+n+nnaddition1+2 => 3
+s+any
any+s
sconcatenation"A"+"B" => "AB"
"A"+1 => "A1"
+d+nddate addition"30/11/2004"+31 => "31/12/2004"
-n-nnsubtraction2-1 => 1
-d-dndate subtraction"31/12/2004"-"30/11/2004" => 31
-d-nddate subtraction"31/12/2004"-31 => "30/11/2004"
*n*nnmultiplication2*3 => 6
/n/nndivision5/2 => 2.5
5/0 => missing
divn div niinteger result of division5 div 2 => 2
5 div 0 => missing
^n^nfexponentiation5^2 => 25
4^0.5 => 2
( )group expressions(5*(2+4))/2 => 15
5*2+4/2 == (5*2)+(4/2) => 12
<n<nbless than1<2 => TRUE
>n>nbgreater than1>2 => FALSE
<=n<=nbless than or equal1<=2 => TRUE
2<=2 => TRUE
>=n>=nbgreater than or equal1<=2 => FALSE
2>=2 => TRUE
<>n<>nbnot equal to1<>2 => TRUE
1<>1 => FALSE
@@var1value substitutionused in any command, replaces @var1 with the contents of var1 before executing the command
$$resultvarresult valueused in let or gen, takes content of $resultvar as a constant