EpiData Analysis Overview of commands & functions

EpiData Analysis: Command and Function Reference Guide Document version 1.1

- Read & Save Data, sorting - View and edit data - Frequency, Cross and Summary Statistics Tables - Descriptive analysis & testing - Life table & Kaplan Meier plot - Graphs - SPC graphs - Pareto Charts, Ichart etc.	- Show results, output, files and run scripts - Select observations - Generate/Change variables - Label data - Clean up & stop - Information	- Disk and file commands - Setup parameters - Programming commands - Changes in this version - Obsolete Commands
Functions	Operators	Key differences from Epi Info v6 (click)

Syntax: command variables [/option] [/option=a|b|] [if condition] [ ]: optional specification. a|b|... indicates alternative choices

Default table without Options:

Type
- one variable: frequency table.
- Two variables: crosstable.
- 3+ : First two variables stratified by remaining variables
Sorting depends on table type.
- tables with epidemiological estimation (Odds Ratio, RR or outbreak table): Descending on value
- All other tables: increasing on value.
- Only counts are shown. To add estimates and/or percentages see Options below.

With user specified Options many aspects can be controlled:

Specify desired type of table :
- Frequency tables: /F see freq
  (/F overrules other Options)
- Compact Tables - notice outcome is given as first variable
  - Case-Control table: /CT /O Odds ratio and 95% CI
  - Risk ratio table: /CT /RR Risk Ratio and 95% CI
  - Outbreak analysis: /CT /AR [or /OA] Attack rates and RR with optional /CI: 95% CI for attack rate See epicurve
  - Strafied table of proportions with Confidence Interval: See ciplot
- General: /FV: Show single cross tables of first by all other variables
- Summary table of N and selected statistics: /S (Relevant with more than one strafied table)
Percentages and missing:
- Crosstables: Row Column Total percentage: /R /C /RP / CP /TP
- Frequency tables: cumulative percentage: /CUM Row: /C
- Number of decimals in percentages: /D0 /D1 /D2
- Place each type of percentage in single column: /PCT
  (Without /PCT all chosen percentages are written in the same column)
- /M: include missing (.) or defined missing values in tables.
Estimation and testing:
- /T:Chi²
- /EX: Exact test
- /GAM: Goodmann & Kruskals gamma
- /ADV: extended and advanced tests
- /OA Outbreak table with attack rates (cohort assumption) /CI: add Confidence Interval to estimates (outbreak tables)
- /CI: 95% CI in frequency tables
  CI gives the CI for the proportion of all observations in each row
  (Altman et al. Statistics with confidence, London, BMJ books. ISBN 0 7279 1375 1, 2nd edition,2005, p 47)
Epidemiological tables (2x2 table): See also Options /OA /CC /FV
- Estimation: /RR : Risk Ratio /O : Odds Ratio /OEX: exact MLE based estimates (as Epi6)
- Default: highest value of variables considered exposure and case.
- /SA: Reverse outcome and exposure to take low value,
  e.g. case=0 and exposure=0 (when coded 0 & 1 ). See sorting below.
  (Notice Odds Ratios are from v2.2.2 standard Mantel-Haenzel estimates, unless you use /OEX)
  Estimates can be imprecise with small number of observations. It is always the responsibility of the user to ensure if there is a "sparse data" problem in an analysis by looking at the stratified tables
Sorting
- Indicate by /Sxxx where the x indicate:
  R:row C:Column A:Ascending D:Descending T:Total L:label (else numerical)
- Accepted combinations:
  - On value of category for Row and Column: /SA /SD
  - On label (text sort) for row and column: /SLA, /SLD
  - Row and/or Column Totals: /SRAT, /SCAT, /SRDT, /SCDT
  - Indicate specific column or row: /SRD=x, /SCD=x, /SRA=x, /SCA=x (e.g. /sca=2)
  - For string variables use "label" sorting, e.g. /SLA /SLD
  - Frequency tables: /SA /SD /SLA /SLD /SRAT /SRDT
Content and specification table :
- Hide tables: /Q: all /NT:subtables. /NC: unstratified (crude) table
- /OBS: show observations as 2x2 table in outbreak and summary tables
- /W=variable: Use number of observations in the variable as frequency weight
- metadata:
  - value labels:
    /v : show only values
    /vl: show values and value labels
  - Variable labels
    /vn: show variable name
    /vnl: show variable name and label
Specify design:
- set table design=line[box][filled][shaded][...]
- Specify design summary table: set table design summary=line[box][filled][shaded][...]
- See also other set
Notes: Several result variables are saved, including the table: try result to see names.
Match removed (try: tab disease outcome matchvar)

stattables
stab

stattables variables /stat="...statistics key words ..." [/by=] [Options]
Show a collapsed table with the same summary statistics for all the variables optionally grouped or stratified Options:

/by="group variables" : use for aggregation level
/strata=variable : repeat table for each value of stratification variable
/header=", , , , , , ," : Use to specify your own header, e.g. /header=",Mean,(CI),Max"
You can include valid html specifiers om the definitions, e.g. line breaks <br;>
/page : add page change after each stratum
/close : close current data and use summary data set instead Note !! any unsaved data will be lost
/save="filename" Save summary table to a new file, to replace add: /replace
/m Include observations where byvariables have missing values
/q Hide table
Statistics: /stat=".....keyword for type of statistics ......"
- mean SD max min p5 p10 p25 p50 p75 p90 p95 p99 sum
- - short forms, will show (contents):
  - des (min median max)
  - iqr (p25 p75)
  - idr (p10 p90)
  - isr (p5 p95)
  - mci (mean +/- 1.96*sqrt(sd/n) = 95% Confidence Interval)
  - mv (counts of defined missing values and blanks for each variable)
  - n Number of observations
  - nv Number of valid observations for each variable

Example Stab age weight /by="sex class" /stat="mci min max"
Specify design: set table design=line[box][filled][shaded][...]
See also: Aggregate

set parameters for Tables

All tables:
set table design=[stat][system][freq][summary]=line[box][filled][shaded][system]... (Design of tables)
Tables with percentages (Options: /r /c /to /pct)
set table percent format col="P1{}"     (Col Percents format, e.g. "P2 %"
set table percent format row="P1()"     (Row Percents format, e.g. "P0[]"
set table percent format total="P0[]"    (Row Percents format, e.g. "P0[]"
set table percent header="%" (Contents of column header for percents)
set table percent header [row][col][total]="%" (Contents of column header for percents row/col/total one at a time !)
set table ct or header"= "OUTCOME:,CASE,NON CASE,N,EXPOSED,NON EXPOSED"
set table ct rr header"= "OUTCOME:,EXPOSED,NOT EXPOSED,N,N,ILL,RR,AR (%)"
Statistics:
Specify confidence interval text:
set TABLE CI FORMAT [HEADER]="()-"
set TABLE CI HEADER="(95% CI)"

Top Life Tables and Kaplan-Meier Plots

lifetable
ltab

lifetable outcome time [/by=group variable] [/Options]
lifetable outcome TimeStart TimeEnd [/by=group variable] [/Options]

The lifetable command creates a standard life table and Kaplan-Meier curve depending on Options. The time variable is read as integer. If a float variable is used a default interval of 1 will be used.

Indicate observation times in one variable: lifetable outcome time
Indicate two variables lifetable outcome Start_time End_time
where the command will subtract time = (End - Start) and use this in the analysis.

Default: Missing data handling:
- missing value in outcome variable: observation excluded.
- missing value in time variable (-s): observation excluded
/MT ("Missing Time"): highest non-missing value assigned when time variable is missing
(second time variable with two variables)
All observations assigned time with /MT option are censored at exit.
/Exit=value: Observations are censored on this date if their END time is later than value.
e.g. ltab out time1 time2 /MT /exit="01/08/2009"
would censor observations on August 1st 2008 and assign this date when time2 is missing.

Options:

General
- /BY=variable : Variable used as group indicator
- /O=x : Use the value x to indicate outcome value (1 is default).
- /I : apply intervals as defined by Set lifetable interval see below.
- /I=bx : aggregate by the value x, e.g. /I=b7 would aggregate data to week if time is date or days
- /I="x,y,z,,,,": aggregate at indicated values e.g. /I="0,3,10,100"
Hide/show:
- /NG do notshow the Kaplan Meier plot
- /NT : do not show summary documentation table
- /NOLT: do not show lifetable
- /E0 /E1 /E2 /E3 /E4 : show results with this number of decimals
- /NOCI : Hide confidence intervals in tables and graph (e.g. if graphs overlap for groups)
Estimation:
- /time=x :Calculate Survival estimate at indicated time value (values)
- /p25 p50 p75 :Calculate Time span estimates at Quartiles and Median of Survival Proportions (0.75 0.5 0.25)
- /t : Log-rank test and Hazard Ratio (only relevant with /by)
- /ref=x : Indicates value of /by variable to use as reference for comparisons
  Difference from reference is estimated for /time /p and /t Options
- /adj : Censored observations contribute to half the time of period (default: censored at start of period)
Other:
- /CLOSE: close current file and replace with life table.
- /MT /EXIT=value see explanation above
- Most other standard graph Options apply.

Confidence Intervals and other estimates are based on the formulae found in Altman et al. Statistics with confidence. BMJ books. Second Edition p94. In this the std. errors are adjusted for effective samplesize (died+still in observation) and there is no weighting for censoring.
With the Statistics dialog only the lifetable is shown lifetable outcome Time /NG
With the graph dialog lifetable outcome Time /NOLT is used

set parameters for life tables

set table design=[stat][system][freq][summary]=line[box][filled][shaded][system]... (Design of tables)
set lifetable header="INTERVAL,N_{AT RISK},DEATHS,LOST,SURVIVAL,STD. ERROR"
set lifetable interval="0,7,15,30,60,90,180,360,540,720,3600,7200,15000"

set TABLE CI FORMAT [HEADER]="()-"
set TABLE CI HEADER="(95% CI)"

Top Basic graphs

bar

bar variable1 [/by=...]
A bar graph shows counts of the given variable with a categorical x axis. Only values included in the variable will be shown. Bars are centered at each tick mark, and value labels (if defined) are shown. Compare with definition of histogram
Options:

/by=varname: show a bar chart by another variable
/PCT : use percentages instead of counts on Y axis.
/v /vl /vn /vnl changes appereance of labels

histogram
his

histogram variable1[/xmin /xmax /xinc] [/by=...]
A histogram shows counts grouped into into "bins", but scaled on the X-axis. Each group (bin) is started at the tick mark, but centered on the tick mark with /by. If By variable has more than four values use bar Compare with definition at bar
Options

/by=varname: will technically show a bar chart by another variable - check axis
/PCT : use percentages instead of counts on Y axis.
/Bins=x : Group into x bins (e.g. /bins=7)
/start=x The first bin is started at this value
/width=x The width of each bin has this datavalue
Bins takes precedence over width

boxplot

boxplot variable₁ [variable₂ ....] [/out] [/by=...] [/R] [/P1090]
Box and whisker plot of field. The box shows interquartile range (25-75) with median highlighted.
Whiskers cover the interval from (p25- 1.5* interquartile range) to (p75+1.5*interquartile range),
(But only if a data value is present, otherwise the nearest inside value is found Options:

/P1090 Length of whiskers are at 10 and 90 percentiles
/R Length of whiskers are at minimum and maximum of data
/out will show all outliers outside end of whiskers with a circle for each observation
/by=varname show one box for each value of varname
If "/by" boxes overlap, then increase size x. E.g. /sizex=600

line

line xvar yvar [yvar2] [yvar3 ...] [/by=...]
line plot of one or more yvar against xvar; multiple y variables may be plotted against xvar

scatter
sca

scatter xvar yvar1 [yvar2 ...] [/by=...]
scatter plot of yvar against xvar; multiply y variables may be plotted against xvar
Options

/BY: a plot of xvar by yvar1 will be made for each value of "By variable"

dotplot

dotplot variable [/by=group variable] [/Options]
Dotplot shows one dot per observation. A small displacement is added to the value, such that all observations can be seen. If overlap between groups happen, then extend width of graph by e.g. dotplot var /sizex=600.
The variable chosen is used as Y-axis
Options:

/c: Center dots instead of left align
/M : Include missing values in the group variable
/DI=x : Value for x-separation of dots. Default is /DI=0.015. To separate more try, e.g. /DI=0.045
Most standard graph Options apply

cdfplot

cdfplot variable [/Options]
CDFplot shows a scatter plot of cumulated percentage points (counts) with variable used as X-axis
Options:

/P: Calculate and show a probit diagram instead.
Probits are calculated as inverse normal of halfway points for cumulative percentage on x axis
/All: Do not aggregate on X axis values. Use this with float variables or when you wish to see all points.
Most standard graph Options apply

ciplot

ciplot outcome variables [/Options]
CIplot shows a table and a plot of proportions of outcome with 95% Confidence intervals in strata defined by individual values of the remaining variables.
Options:

/O=x : Use the value x to indicate outcome value. Default is highest non-missing value
/NOTOT : remove crude (total) estimate from the graph
/NOCI : remove coloured line indicating crude confidence interval
/NL : Do not show vertical separation dotted lines btw. by variables
/NT : Do not show summary table
/NM Records with missing in any variable are excluded.
Most other standard graph Options apply.

Number of observations depend on included variables. Use /N to see in the graph the number of observations

epicurve

epicurve outcome time [/by=group] [/Options]
Epicurve shows development in a possible epidemic as stacked bars on each day from start to end of data.
Example: epicurve case dayonset /by=floor /legend /xa /frame /tab
Options:

/nt: Do not show a table of number of cases, min and max date of outcome
/Yinc : change default y ticks from 1.
Notice that this also changes the number of cases per line in the graph
/xa: alternate label on x axis
EpiCurve shows bars centered on tick mark, not from tick mark
To indicate date for x-axis: EpiCurve outcome dayonset /xmin="01/10/2005" /xmax="31/07/2005"
Notice that the dates must be DMY formatted not MDY.
Most other standard graph Options apply
Result variable naming $case# varies with coding of "by" variables

pie

pie var1
pie chart of frequencies of the values of var1
Percentages can be incorrect in some instances

erasepng

erasepng [/noconfirm] [/all]
Erases graph*.png files in current folder shown on statusbar left side. Confirm each erase. Options:

/all includes ALL files of type png (*.png) regardless of name
/noconfirm NOTE: deletes without confirmation
/d deletes all graph*.png created earlier than current date

Common Graph Options

/save="file.png[.wmf][.bmp]" /xlabel="variable"
/text="x,y,text,box" /ti="title" /sub="subtitle" /noedit /nolegend

Overall
- /by=varname: Group the graph by the indicated varname (scatter, line, box, bar, dotplot)
- /edit : manual specification of a large number of graph Options
- /q : do not show graph (useful for just saving a graph)
- /save=" " : save file with that name and type - default is png with name of current time. File cannot exist,
- /replace : replace file if existing (used with /save).
- /xlabel=variable : Use contents of variable as label along x-axis. Works only in SPC charts
- Show labels or values: /v (include values) /vv (values only)
  /vn (name + label) /vnn (only name)
Text, size and content of graph:
- /legend : Show legend
- /ti="....." Use text as title
- /sub="....." Use text as subtitle
- /fn="....." Use text as footnote
- /xtext="....." Use text as description below the X axis
- /ytext="....." Use text as description to the left of the Y axis
- /n Show number of observations in upper left corner of graph
- /sizex=value /sizey=value change physical size of graph to values
reference lines, text and values:
/VGRID /HGRID : Show vertical or horisontal lines inside graph area
/FRAME : Frame the graph area
Text and lines can be used several times in the same graph:
- /xline= /yline= Add reference line __ at x or y value, e.g. /xline=12 /yline=500
- /xlined= /ylined= Add reference dotted line ..... at x or y value, e.g. /xlined=12 /ylined=500
- /Text="x,y,text,box": add text at x,y in pixels from top left. Box= 1:yes 0:no. E.g. /text="120,50,my text,1"
- /yvalue: add box with counts (works only in SPC graphs)
Axis :
- /xmin /xmax /ymin /ymax sets axis scales
- /xinc /yinc defines increase of tick marks on axia. (Scatter, line and SPC plots)
- /NoXTICK /NOYTICK: hide tick marks
- /NoXLabel /NoYLabel : Do not show labels at the tick marks
- /X90 /X45 X-axis unit texts at 90 or 45 degrees instead of horisontal
- /XA : alternating X axis label texts
- /YLOG /XLOG : use logaritmic scale on axis
- /XINV /YINV: reverse axis from high to low values
- /XHIDE /YHIDE: Hide axis
color setting:
Set Graph Colour="1234567890"
Use this to select colour sequence for graphs
Indicate 10 numbers each from 0 to 9. Colour codes are:
Red, Blue, Black, Green, Yellow, White, SkyBlue, Fuchsia, Gray , Aqua;
Colour for graph elements
Set Graph Colour Text="2133"
Defines colour for "tfxxxyyyb" where t:Titles f:footnote axis: xxx yyy (Axis,tickmarks,texts) b:other, e.g. text boxes
Indicate 9 colour codes from 0 to 9. Default is: "213333333"
Note: White (6) can be used to hide a text element, but it is more efficient to set to " ", e.g. titles or footnote.
/bw: all colour in black and white
symbol setting: (line, scatter and pareto)
Set Graph Symbol="1234567890"
Use this to select symbols for graphs containing symbols
Indicate 10 numbers each from 0 to 9. Sequence codes are:
Circle, Upwards Triangle, Downwards Triangle, Cross, Star, Diamond, Left triangle, Right triangle, square, Small Dot
The "Small Dot (9)" can be shown to minimise or hide the symbol.

set parameters for Graphs

set option graph= "/sizex=value /sizey=value Default Options for all graphs (width of graph, e.g. value=600)
set graph savetype=png[wmf][bmp] (Which type of file to save)
set graph clipboard=on[off] (copy graph to clipboard after creation)
set graph footnote=text (footnote for graphs - default: EpiData Analysis Graph)
set graph filename show=off[on] (show name of file below the graph)
set graph filename folder=off[on] (Include folder in graph file name)
set graph font size=value (font size - default 10. Titles are scaled relatively)
set graph colour="1234567890" See above
set graph colour text="2133" See above
set graph symbol="1234567890" See above

Top Select observations

select if
select

select [logical expression]
work with selected records
- multiple select commands are joined by and
- without parameters, clears all select commands.
Current select will be shown when running analysis commands
Warning: Be careful when you select on float variables, e.g. v1 > 3.1
To test this use "browse ...." Make sure program handles missing data as you expect
Note: String variables are compared excluding trailing blanks
To exclude leading blanks use trim function:
e.g. select trim(User) = "Jorge" or select trim(upper(User)) = "JORGE"
Select cannot be used with UPDATE

temporary
select

follow another command with if (logical_expression)
processes the data file using only those records for which logical_expression is true
- for complex logical expressions, use parentheses; they are optional for simple expressions
Note also: Options must be placed before if
Always control what happens with missing data and with float variables with many decimals
Note: String variables are compared right trimmed, that is without trailing blanks.
e.g "Lion " is the same as "Lion", but not the same as " Lion".
But make sure upper and lower case works correctly !!!
You can query by asking the user. E.g. "count if sex = ?Write value 1 2?"
Current if and select will be shown when running analysis commands

Top Statistical Process Control

pareto

pareto groupvar [/Options]
SPC: A Paretodiagram shows a bar chart of the variable, where columns are sorted in descending order of frequency. Superimposed is a line showing cumulative percentages.
Counts are shown with the left Y-axis and Cumulative Percentage with the right Y-axis.
Pareto charts are in particular suited for decision making, when multiple outcomes are possible for a given situation. e.g. To find aspects responsible for 80 percent of errors among many possible causes.
Options:

general graph Options
SPC Options
/W = variable: Use numeric value in "variable" as weight

runchart

runchart measurement [time] [/Options] SPC: A runchart shows the median of the measurements

Without a time variable sequence of observations is used as x-axis
The chart includes median and indication of tests of special cause
Runchart is a process control type graph.
Runs and tests adapted in v2 (ref: )
Options: SPC Options, general graph Options and SPC tests for attributable cause

ichart

ichart measurement [time] [/Options]
SPC: I-chart (Individual - also called XMR chart) showing overall mean, control limits and actual measurements
Without a time variable sequence of observations is used as x-axis
"Measurement" can be any continous measurement or count at that time

Options: SPC Options, general graph Options and SPC tests for attributable cause
/MR add Moving Range double chart.
/ymin /ymax /yinc etc: use these twice to define values for lower of the double plot

pchart

pchart count total [time] [/Options]
SPC: A P-chart is created with the proportion of count/total for each time value.
Without a time variable sequence of observations is used as x-axis
The chart includes overall mean, control limits and proportions
P-chart is a process control type graph
Options: SPC Options, general graph Options and SPC tests for attributable cause
The sampling basis for pcharts is a binomial process.

xbar

xbar measurement time/sequence
A X-bar chart is created with mean value of measurement for each subgroup. A subgroup is one time/sequence or date value used for grouping individual measurements of different samples or observations observed at the same time. Subgroup values serves as X-axis. If any X-value has only one observation there will be a zero value shown for Sigma (or Range) at this X and the point value shown in the Xbar Chart.
Xbar-chart is a process control type graph. Notice problems can occur with some date variables
It is displayed together with either:
Range chart which indicates the range between Max and Min measurements within each subgroup
or Sigma chart which indicates the process variation using a weighted method. (Xbar and S should always be used when subgroup size >1)
The sigma limits for the S-chart is calculated with varying limits depending on n in each subgroup. The Sigma-average is an aritmetic overall average. Currently we are looking into the optimal way of calculating this, since there is some disagreement in the litterature. Current implementation follows the principle mentioned in Hart & Hart, page 331, but with an aritmetic average of the individual Sigma-bar's .

/range Show range in double plot
/sigma : Show sigma for each subgroup in double plot (default)
/ymin /ymax /yinc etc: use these twice to define values for lower of the double plot
/mvlu : Center varies by subgroup size for the S-chart
Options: SPC Options, general graph Options and SPC tests for attributable cause

uchart

uchart count volume [time] [/Options]
A U-chart is created with the ratio of count/volume.

Without a time variable sequence of observations is used as x-axis
The chart includes overall mean, control limits for the ratio
U-chart is a process control type graph. The basis for the count is "defectives", e.g. falls in a hospital ward. The total is the observation volume, which typically varies btw. time points.
Options: SPC Options, general graph Options and SPC tests for attributable cause
/Per=x Use x as multiplicator (e.g. to show per 1000)

Cchart

cchart count [time] [/Options]
A C-chart is created on the basis of counts for each data point.
Volume/total is assumed to be constant observation volume for all time points

Without a time variable sequence of observations is used as x-axis
The chart includes overall mean, control limits for the ratio
C-chart is a process control type graph. The basis for the count is "defectives", e.g. falls in a hospital ward. The total is the observation volume, which is constant for all time points.
Options: SPC Options, general graph Options and SPC tests for attributable cause
/Per=x Use x as multiplicator (e.g. to show per 1000)

Gchart

gchart variable
A G-Chart is used for rare outcomes. E.g. infection following surgical procedures.
Typically the variable used contains recorded date-of-occurence and the G-Chart will calculate and graph the number of days btw. the occurrences. Each measurement in the variable is graphed as an x-value.
A G-chart is a process control type graph.
If you have data already summarised use technique such as (assume the presummarised data are in "datavar":
gen i sumdata
sumdata = datavar
if _n > 1 then sumdata = sumdata[_n-1] + datavar
gchart sumdata
Notice that for G-charts it is desirable to have many observations away from 0. For other SPC charts "good" performance is near the bottom.
Options: SPC Options, general graph Options and SPC tests for attributable cause

Options for process control charts

/F=y Freeze the calculations to the first y observations. (e.g. /f=12 )
The remaining part of the graph is shown dashed. For variable limit charts (pchart, uchart, gchart, xbar) the limits for the freeze period based on initial mean, but limits on point specific data.
/B (one or more) divide the chart into subperiods after these observations. (e.g. /b=6 or /b="01/10/2008")
Value can be date "dd/mm/yyyy" or "mm/dd/yyyy" of same format as the time field.
/B=Bx Breaks for every x data points, e.g. /b=b30 would create a break by 30 observations
/t1 /t2 /t3 /t4 /t5: Perform one of test 1-5 and mark in the graph (see below)
/t2=x /t3=y /t4=z /t5=w : use the value indicated for tests of the sequences. e.g. /t2=3
/t: Perform test 1+2+3
/tlimit : Sigma depend on size of samples (ref: Hart & HArt, book)
/tab: Add table of counts below graph
/nt: Hide documentation table (but mark tests if added in graph)
/neglcl: show negative as well as positive lower control limit values
/xlabel=var : Variable contains the labels to show at the X-axis.
/NoXLabel : Do not show any label at the tick marks on X-axis
/point : Only show points for observations, do not connect with a line
/exv=y : Exclude all points with Y-value => to value (here y)
/exp=x1 : Exlude observation with value of x=indicated
/exz : Exclude all observations with Y= zero
Notice: /exv and /exz does not work with Gcharts
/noinf : Exclude text below X-axis showing central values (no central value information)
/SL : Show lines at 1,2 and 3 Sigma (default: only at 3)
/NoL : Do not show Control Limit lines (when user created values are included with /yline=)

Most standard graph Options apply

Principles and tests

Many principles exist for SPC graph testing. The principles implemented in EpiData Analysis are documented further in a reference document which also includes references (see epidata.dk . In short these principles are implemented:
SPC charts use a time variable along the X-axis and the observation as the Y-value. I- and P- Charts have control limits calculated as center line ± 3x Sigma. Where 3x as default is equal to 3, but with the option /tlimit can depend on number of observations in each subsection of the graph.
The graphs do not demand a specific number of observations. The assumption for using tests for special variation is that btw. 20 and 30 observations are included in each portion (break). With fewer observations the chance of a Type II error is larger (overlooking an actual special cause) and with larger numbers of observations the chance of a type I error is larger (false positive test). Therefore users should explore the usage of the /tlimit principle. See Hart & Hart for more information.
In SPC - Statistical Process Control Charts: Notice Control Limits are NOT the same as Confidence limits. Confidence Intervals indicate limits for the mean (phrased: the mean or central line of the SPC chart is this ..., but we cannot with the current sample decide whether it could be as high as .. (upper CI) or as low as (lower CI). Whereas control limits are indications of the type "Within these limits one should expect that 99.5 % (percent depend on sigma value) of the observations would be contained given the proces is in control".

The choice of SPC chart depends on the data at hand. In technical terms which type of process generated the data. For an overview of this open the "graph" menu and choose "spc", which will show a grid assisting you in deciding which graph to choose among the implemented ones. Currently: Run-Chart, I-Chart, P-Chart, Xbar-S, Xbar-R, U-Chart, C-Chart, G-Chart, Pareto.

Tests indicate signs of "Special Variation" or "Attibutable Cause" in the data. Criteria depend on the type of Chart. The rules below works for all charts, unless mentioned specifically.
- Type 1:
  - Run chart: Total number of runs and limits for expected number of runs. Points on median ignored in runs. Expected numbers based on a standard table provided n > 13 and n < 40. A run is defined as a sequence of one or more numbers on the same side of the centerline. V2 adaptation: Points on the centerline are disregarded.
  - Control Charts: Special Variation occurs for each observation outside the control limits.
- Type 2:
  In Run-, I- and P-charts: Eight or more points in sequence on the same side of centerline (shift in the process). Values on center line are excluded from the count. The test counts number of sequences of eight points.
- Type 3:
  In Run-, I- and P-charts: Six or more points decreasing or increasing in sequence (Trend). Sequential values of same size count as one. The test counts number of sequences of six points.
- Type 4:
  2 out of 3 successive points more than 2 standard deviations (sigma) away from the centerline.
- Type 5:
  test 5: 4 out of 5 successive points more than one standard deviation (sigma) away from the centerline.

Change limits: Notice that in all charts you can change the limit of when a test is positive:

By adding a value. E.g. /t2=6 would give a positive test with 6 points on the same side of the centerline.
By issuing the set command: set spc testlimit="1,6,6,2,3". The five numbers are test 1-5 (first number always 1).

Top Save & Clear output

cls

cls
clears the output screen
notice F12 will do the same and can be used during execution if speed slows down

logopen

logopen [filename[.{html|txt}]] [/close] [/append]
start a log file
- without parameters, the open file dialogue is started
/Close will close an open logfile.
/Append adds to existing file
/replace replace existing logfile

logclose

logclose
close the current log file

Top View data

browse

browse [variable1 [variable2 ...]]
browse values in a spreadsheet for all variables listed
- without parameters, browse all variables
Note that browse is much faster than list
Note that the browse window closes when you move away from "browse", unless you allow the browser to be open by: then: Set display databrowser=on
The same way "minimise" browse is equal to "close", unless you have : Set display databrowser=on
But remember that the browse window can be quickly opened at any time by F6 key.
Notice use of right click on form (sorting, copy to clipboard)

list

list [variable1 [variable2 ...]] [/no] [/v /vl] [if ... ]
show values on the screen for all variables listed, with one record per line and no limit to the width of the display
- without parameters, list all variables. Values are shown - not labels.
/NO : do not show record (observation) numbers
/v /vl: control whether values or labels are shown (or both)
Note that browse is much faster than list. The choice of font might make list display incorrect
Select or If the sequence is within current select or "if". if you use list with temporary if, the number is not the same as recnumber

Update

Update [variable1 [variable2 ...]] [/id=variable]
Allows grid editing of data - without parameters, works on all variables
/id=variable Indicate variable containing unique id
Update cannot be combined with SELECT
Notice use of right click on update form (sorting, select id, copy to clipboard)

Top Generate/change variables

define

define var1 fieldtype [cumulative|global]
create a new variable based on an EpiData fieldtype (###, ___, "<Y>, <AAAAAA>, or valid dateformat)
- var1 will initially be missing in all records
- cumulative variables retain their values from one record to the next - not functioning
- global variables retain their values following a close command and are like constants (only one value)

gen

gen var1 = expression | resultvar
create a new numeric variable based on the expression, or equal to a constant from a result variable
- equivalent to define and let, with the variable type implied by the expression
- if the result of expression is boolean, variable1 will be 0 (FALSE) or 1 (TRUE)
- Result variables are created by some commands, e.g. means and describe
IF the user specifies type, that type of variable is generated (examples):
gen s(10) var1 = expression
gen d var1 = expression
gen i var1 = expression
gen f var1 = expression
Compare with values from other records:
nbsp;nbsp; gen i age = (age - age[_n+1]) if id = id[_n+1]
nbsp;nbsp; let bmidif = (bmi - bmi[_n-1]) if id = id[_n-1] //-1 could be -4 +1 etc Notice that integer variables are maximum 4 digits. For larger integers use type float with zero decimals
Always verify generation of complex variables or logical statements.
e.g. gen .... if ... with
define ...
if ... then ... .

generate

generate value
creates a new empty dataset with value records. E.g. for simulation or testing.
- note the difference to gen command, which creates variables.

if ... then

if (logical_expression) then [let] ... [else [let] ...]
evaluates logical_expression for each record; the else clause is optional
- for complex logical expressions, use parentheses; they are optional for simple expressions
- some other commands might work, but only let is practical

let

[let] var1= expression | resultvar
assign a value to an existing variable; the word let is optional
- if the result of expression is boolean, var1 will be 0 (FALSE) or 1 (TRUE)
- only means and describe commands create result variables

recode

recode variable1 to var2 values1 = newval1 [values2 = newval2 ...]
create or change codes for subgroups of records
- values1 takes one of three forms: a single value, a series of values separated by commas, or a range of consecutive values like 7-12 or "A"-"D"
e.g.recode v1 to v2 lo-18.499=1 18.50-hi=2 values up to 18.50,but not 18.50 gets the value 1
e.g. recode x lo-3.00=1 3.0001-4.0000=4 4.0001-5.49000=5 5.5-hi=7
- if to var2 is omitted, variable1 the original values will be lost.
recode variable1 to var2 by value (Value must be integer > 0)
- the variable1 values will be recoded to numerical variable var2 with value label indicating the limits
E.g. recode age to agegroup by 10 to recode age variable to 10 year age groups.
Note: define agegroup before recoding define agegrp ###<"/font>
Note: EpiData Analysis shows the if ... then and the labelvalue commands doing the recode

Top Label data - also called metadata

labeldata

labeldata "text"
Assign the descriptive text as a label for the data file.
An existing label will be replaced with the new one.
To keep the label you must save the data.

label

label var "text"
Assign the descriptive text as a label for the variable.
An existing variable label will be replaced with the new one.
To keep the variable label you must save the data.

labelvalue

labelvalue var /x="text with spaces" /y=text2 /z=text3 [/clear]
Assign the descriptive text as a value label for the values (x y z)
/clear will remove any value not mentioned on the line
For several variables in sequence: labelvalue v1-v17 /1="Yes" /0="No"
Note !!! - If you change valuelabels for a variable, which shares labels with other variables then the label is changed for all the variables !!!!
Shared valuelabels are defined as part of dataentry in EpiData Entry
Note that valuelabels are automatically created by the command recode

missingvalue

missingvalue var [var1-varx] /x /y /z [/clear]
Assign from 1 to 3 values as a defined missing value
/clear will remove any previous definition
For several variables in sequence: missingvalue v1-v17 /9

Top Clean up - stop

close
stop using a dataset
- all unsaved variables and changes to existing fields will be lost
- global variables will remain in memory

quit or exit

quit
exit
Exits from EpiData Analysis. Closes any open output file.
NOTE: To save data in memory before closing use the savedata command.
Automatic save of command history is done on exit. Filename defined by "set command history filename", default temp.pgm
If you write exit or quit in command prompt no confirmation question will be asked.

savepgm

savepgm filename[.pgm]
saves recent commands in a program file
- without a parameter, the save file dialogue is opened
Automatic save of command history is done on exit. Filename defined by "set command history filename", default temp.pgm

clear output window

cls
Clear the output screen with results - when output slows down press F12.
F12 (=cls) can be used in the middle of other commands running.

clear command buffer

clh
Clear the buffer of previous commands.
- It is the same list shown when pressing F7, right click on "F7" window to clear
Notice - set commands on history. See below.

Top Set parameters

set

set [parameter=value]

Change the value of a EpiData "set" parameter

- without parameters, provides a list of available parameters and their current values

- Add the set definitions to the file Epidatastat.ini file to modify from default
values can be a number, text or ON/OFF, see table below
to see current value: set [parameter] = ? e.g.: set echo = ?
For set's with ON/OFF: set echo = off---set echo = on
For any command: set option [cmd] = [Options] e.g. set option means = /t
When the specified command is executed the Options mentioned will be added to the command.

Set commands not mentioned above are shown in the next section.

Option	Default Value	Comments or function
BROWSER FONT SIZE	10	Font size in browser and update
DEBUG LEVEL	0	Used for testing to write information from internal modules. When equal to 0, no information written
DEBUG FILENAME	MODULETEST.LOG	Filename where debug information is written
DISPLAY COMMAND HISTORY	OFF	Show the command history window (F7)
DISPLAY COMMAND PROMPT	ON	Show command prompt
DISPLAY COMMANDTREE	OFF	Show commandtree window (F2)
DISPLAY DATABROWSER	OFF	Show databrowser when a datafile is open (F6)
DISPLAY MAINMENU	ON	Show main menu
DISPLAY TOOLBAR	ON	Show toolbar
DISPLAY VARIABLES	OFF	Show variable window (F3)
DISPLAY WORKTOOLBAR	ON	Show work toolbar
ECHO	ON	When = on show results, off: "silent"
EDITOR FONT SIZE	10	Font size in editor
EDITOR PRINT INFO	ON	When printing from editor add footer
HISTORY COMMAND PGM	OFF	Add commands from pgm files to history. When off only the name of the pgm is shown
HISTORY COMMENT	ON	Add comments when found in pgm files to history
HISTORY NAME	TEMP	Name of file to use as autosave, when closing the programme. The previous three files will be saved, e.g. as temp3.pgm temp2.pgm temp1.pgm
LANGUAGE	EN	Language file abbreviation
OPTION GRAPH	/sizex=600 /sizey=200	Default Options for graphs, here size in pixels
OPTION TABLES	/t /R	Default Options for cross- and frequency tables
OPTION SPC	/sizex=600 /sizey=200 /t	Default Options for SPC graphs
OUTPUT FOLDER	..folder name...	Logfiles and graphs are saved here
OUTPUT NAME	EAOUTPUT.HTM	Name of output files
OUTPUT OPEN	ON	Open logfile automatically
PRINT PREVIEW CM	ON	Use cm for measures when printing
RANDOM SEED	9	Use as seed for random number generation
RANDOM SIMULATIONS	500	Number of simulations
READ DELETED	OFF	Read records marked for deletion
RECODE INTERVAL	TEXT-	Marker to use in value labels with recode
SAVEDATA FIRSTWORD	ON	Use filetype first word. WHen off Epi6 automatic naming will be used
SHOW COMMAND	ON	show commands after execution
SHOW ERROR	ON	Show errors
SHOW INFO	ON	how information type feedback from commands
SHOW RESULT	ON	Show results from commands
SHOW SYSTEMINFO	OFF	Show extended information (test purpose)
START PAGE	START.HTM	File to show when the programme starts (F8)
STYLE SHEET	"folder name\EPIOUT.CSS"	Find extended definition of output etc. on www.epidata.dk/documentation.php File containing the style sheet
STYLE SHEET EXTERNAL	ON	In output files refer to css file. If off, then a copy of current css definitons will be copied to the logfile header. E.g. if you wish to copy the outfile to internet.
VAR GENERATE TYPE	F	Default varible type with Gen command
VIEWER FONT CHARSET	ISO-8859-1	Character set for output window
VIEWER FONT NAME	VERDANA,COURIER	Must at least contain one proportional and one fixed type font
VIEWER FONT SIZE	10	Font size for output window
WINDOW FONT SIZE	10	Font size for other windows (prompt, variable, history, commands)

Top Information

newpage

newpage
When printing the output force top of page after this line
will not added to output as "hidden" information.

type

type "Text to display [@$result1] " [/class=x] [/style=" "] [/h1] [/h2] [/h3] [/h4] [/h5]
echo Text to display [@$result1]
display text on the screen;
if Options are not used the text will added as a standard paragraph (html: < p >)
Options adds html specifications: /class: (html: <p class= > text </p>).
(h1..h5: <hx> text </hx>)
/style="valid css style definition", e.g. /style="color:blue; Font-size=0.6em"
result or globally defined variables may be displayed by putting @ before the variable name
USE ' ' to include text in type commands, e.g. for < href=' ....'>

title

title "Text to display [@$result1] "
Display text on the screen as (html: <h1> text </h1>)
result or globally defined variables may be displayed by putting @ before the variable name

show

show filename"
Add the contents of "filename" to the output window
The file must be plain text, e.g. NOT a word processor file, but may contain HTML formatting blocks without header.

View

View filename"
View an html file in the viewer.
The file must be HTML formatted.

HelpView

Helpview filename"
View an html file in the help file viewer.
The file must be HTML formatted.

rename

rename oldname to newname
rename the variable from "oldname" to "newname"

var
variables

variables or var
list currently defined variable names, types, formats and labels

drop
var drop

drop variable1 [variable2 ...]
remove the listed variables from memory

keep
var keep

keep variable1 [variable2 ...]
Remove all variables not listed from memory

result
var result

result
list all current result variables and their values
- means, describe, tables and other estimation commands create result variables, e.g. $mean1 or $count
All result variables are cleared when running a new command, except for $assert and $assert_error,
See var temp clear and runtest

var temp clear

var temp clear
Removes ALL result variables and all tempory global variables defined as global
$assert, $assert_error and other internal variables are also cleared

Version

Version"
Compare current version of EpiDataStat.exe with latest version (requires internet)
Note: No information is transferred from your PC
Latest version is read from Http://www.epidata.dk/version/epidatastat.version if you are connected to internet.

assert

assert if (logical statement)
Check if the statement is correct (will not test all observations !! Return text "Assert failed" if statement failed
E.g. assert ((pregnant = "Yes" and age < 40) or (pregnant = "No")) if id = 1

? (statement)
Show result of statement, e.g. a calculation or logical check. Does not depend on or check any data.
E.g. ? 241/34 ? (23>19) ? "a " + "b " + "c" ? findfile("myfile.pgm")

Top Obsolete commands

output

output {describe ... | means ...}
Command replaced by new command aggregate

route

command replaced by SAVEDATA and LOGOPEN commands

write

command replaced by SAVEDATA and LOGOPEN commands

Top Disk commands

cd "directory name"
change the working director

copyfile

copyfile "filespec1" "filespec2"
copy file specified by filespec1 to new file specified by filespec2
- filespec must identify only one file - do NOT include wild cards (* or ?)
To overwrite: ../replace Could overwrite your data !!

erase

erase filename
permanently erase file specified by filename
- filename must identify only one file - do NOT include wild cards (* or ?)

rename file

use copy and erase
To rename a file use copyfilefrom the existing file
Afterwards you can erase the existing file with erase.

dir

dir [filespec]
list files in a directory
- filespec may include wild cards (* or ?)
Define design by set table design system=line[box][filled][shaded][system]...

dos
!

dos text
execute any valid MS-DOS command and return to EpiData
- dos command will open an MS-DOS window
/open : Keep window open after execution
!works only on XP+ Pc's

Top Programming aids - not normally used in interactive mode

* [any text]
Use to document programs, usually as the first character in a line. * is not recognized in interactive mode.

\
Any command can be extended on next line, e.g. to specify many Options for graphs

;

;
to specify more than one command on a given command line in prompt or pgm

[any command] // [any text]
Use to document programs and may appear anywhere on a line.

imif

IMIF (logical condition) then ..... [else] ..... endif
Use to divert course in a pgm file depending on parameters, which could be acquired by "? ?"

closehelp

closehelp
Will close the help window if this is open.

? ?

[any command] [parameters] "?Prompt to user? [parameters]
The text between the two ? will be a prompt to the user to type a response, followed by <Enter>. The response will then be treated as part of the command.
For select if age<=?Maximum age to include?
if the user types 50 then EpiData sees select if age<=50
EpiData does no checking of the typed response before making the substitution.

run

run [filename[.pgm]]
Execute sequence of commands saved in a pgm file
- without parameters, the open file dialogue is started

runtest

runteset [filename|folder name]
Run all pgm's /single pgms to verify function.
- suited for testing of correct estimation etc.

Date functions
Logic functions

Conversion functions
Test and special functions

Operators

In the following, takes indicates the variable type for each parameter and result indicates the type of the result of the function:
s: string; b: boolean; d: date; i: integer; f: floating point; n: any numeric
parameters may be variables read from fields, new created variables, or any expression that evaluates to the correct type

Top String functions
function	takes	result	example
length(str)	s	i	length("Abcde") => 5
lower(str)	s	s	lower("Abcde") => "abcde"
pos(instr,findstr)	s	i	pos("Abcde","cd") => 3 pos("Abcde","z") => 0
substr(str,start,len) copy(str,start,len)	s,i,i	s	substr("Abcde",2,3) => "bcd" copy("Abcde",2,3) => "bcd"
trim(str)	s	s	trim("Abcde ") => "Abcde"
upper(str)	s	s	upper("Abcde") => "ABCDE"
Top Arithmetic functions (including Random numbers)
function	takes	result	example
abs(x)	n	n	abs(-12) => 12
exp(x)	n	f	exp(1) => 2.71828182845905
frac(x)	f	f	frac(12.34) => 0.34
int(x) trunc(x)	f	f	int(12.34) => 12.0 trunc(12.34) => 12.0
integer(x)	f	i	integer(12.34) => 12
ln(x)	n	f	ln(2.71828182845905) => 1 ln(0) => missing
log(x)	n	f	log(10) => 1 log(0) => missing
power(x,a)	n,n	f	power(2,3) => 8
round(x,digits)	f	f	round(12.44,1) => 12.4 round(12.5,0) => 13
sqr(x)	n	f	sqr(4) => 16
sqrt(x)	f	f	sqrt(4) => 2
ran(x)	n	n	Random integer from 0 to x. gen integer x = ran(100)
rnd(1)	1	f	Random float from 0 to 1. gen float x=rnd(1)
rang(mean,sd)	f,f	f	Random based on mean and sd. Gen float=rang($mean1,$sd1)
Top Trigonomety functions
function	takes	result	example
arctan(x)	f	f	arctan(1) => pi/2
cos(r)	f	f	cos(pi/2) => 6.12303176911189E-17 cos(pi) => -1
pi	-	f	pi => 3.14159265358979
sin(r)	f	f	sin(pi/2) => 1 sin(pi) => 6.12303176911189E-17
Top Date functions
function	takes	result	example
today	-	d/i	returns today's date; may be assigned to a date variable or an integer
date(datestr)	s	d	date("31/12/04") => "31/12/2004" datestr must be of form <dd/mm/yy> or <dd/mm/yyyy>
date(datestr,fmtstr)	s,s	d	date("12/31/04","%mdy") => "31/12/2004" fmtstr must be "%mdy" or "%dmy". Date separator can be anything e.g. "31-12-2004" is accepted
day(d)	d	i	day("31/12/2004") => 31
dayofweek(d)	d	i	dayofweek("31/12/2004") => 5 Monday=1, Sunday=7
dmy(d,m,y)	i,i,i	d	dmy(31,12,2004) => "31/12/2004"
month(d)	d	i	month("31/12/2004") => 12
weeknum(d)	d	i	weeknum("22/02/2001") => 8
year(d)	d	i	year("31/12/2004") => 2004
Top Logic functions
function	takes	result	example
b1 and b2	b,b	b	(1=1) and (2=2) => TRUE (1=1) and (1=2) => FALSE
b1 or b2	b,b	b	(1=1) or (1=2) => TRUE (1=2) and (2=3) => FALSE
not(b)	b	b	not(1=1) => FALSE not(1=2) => TRUE
iif(b,x,y)	b,any,any	b	iif(1=1,2,0) => 2 iif(1=2,sqrt(4),sqr(4)) => 16
Top Conversion functions
function	takes	result	example
boolean(x)	n	b	boolean(x) => TRUE, for any non-zero x boolean(0) => FALSE
integer(x)	f	i	integer(1.23) => 1
integer(s)	s	i	integer("12") => 12
float(i)	i	f	float(1) => 1.000
string(x)	n	s	string(1.23) => "1.23"
Top Test and special functions
function	takes	result	example
lre(x,y)	n	n	lre($mean1,1.23456789123456) returns number of digits precision of $mean1
samenum(x,y)	n	b	samenum($mean1,1.23456789123456) returns true or false indicating if \|x\| = \|y\|
samenum(x,y,z)	n	b	samenum($mean1,1.23456789123456,10-7) returns true or false indicating if \|(x-y)\| < z
mv(var)	variable name	0,1,2	Returns 0 if variable has a valid value, 1 if system missing (.), and 2 if a defined missing value
var[recnumber]	n	data value	Not a function, but a way to get a value for a given record. E.g. gen i x=age[recnumber] = age[recnumber-1] or gen i x=age[_n] = age[_n-1]
findfile("filename.ext")	s	1 or 0	Checks if the file exists and returns a 1 if so otherwise a 0. use e.g. imif findfile("myexport.csv") then ....... endif

Top Operators used in EpiData Analysis
operator	syntax	result	meaning	example
+	n+n	n	addition	1+2 => 3
+	s+any any+s	s	concatenation	"A"+"B" => "AB" "A"+1 => "A1"
+	d+n	d	date addition	"30/11/2004"+31 => "31/12/2004"
-	n-n	n	subtraction	2-1 => 1
-	d-d	n	date subtraction	"31/12/2004"-"30/11/2004" => 31
-	d-n	d	date subtraction	"31/12/2004"-31 => "30/11/2004"
*	n*n	n	multiplication	2*3 => 6
/	n/n	n	division	5/2 => 2.5 5/0 => missing
div	n div n	i	integer result of division	5 div 2 => 2 5 div 0 => missing
^	n^n	f	exponentiation	5^2 => 25 4^0.5 => 2
( )			group expressions	(5(2+4))/2 => 15 52+4/2 == (5*2)+(4/2) => 12
<	n<n	b	less than	1<2 => TRUE
>	n>n	b	greater than	1>2 => FALSE
<=	n<=n	b	less than or equal	1<=2 => TRUE 2<=2 => TRUE
>=	n>=n	b	greater than or equal	1<=2 => FALSE 2>=2 => TRUE
<>	n<>n	b	not equal to	1<>2 => TRUE 1<>1 => FALSE
@	@var1		value substitution	used in any command, replaces @var1 with the contents of var1 before executing the command
$	$resultvar		result value	used in let or gen, takes content of $resultvar as a constant