Comorbidity SAS Macro (2021 version)

The macro below calculates comorbidity scores for both the Charlson and the NCI Comorbidity Indices. The macro considers both ICD-9 and ICD-10 diagnosis codes on the claims (procedure codes and HCPCS are not used).

Investigators who want to use the macro must decide if they want to use claims only from the hospital file (MEDPAR) or to also include the diagnoses on claims submitted by physicians (NCH) and outpatient facilities (OUTPAT), as described in Klabunde et al. The rationale for including the latter claims is that many more people see a physician or receive care in an outpatient clinic than are hospitalized, thus increasing the possibility of identifying more comorbid conditions.

If both hospital claims and physician/outpatient claims are being used to identify comorbid conditions then the rule-out algorithm, which is built into the 2021 version of the comorbidity macro, should be used. This algorithm requires that for physician and outpatient claims, a patient's diagnoses (not just specific codes) must appear on at least two different claims that are more than 30 days apart. The reason for this is that the diagnoses on the physician and outpatient claims have not been validated and it is possible that physicians may have recorded a diagnosis as being present when the correct coding would be to “rule-out” the condition. Conditions that do not appear on two different claims are considered to be “rule-out” diagnoses and are not counted as comorbid conditions. This is necessary to prevent over-estimation of the comorbidity when using physician or outpatient claims.

The macro is available to download here: NCI.comorbidity.macro.sas (SAS, 19 KB).

Building an Input File for the Macro

Regardless of which Medicare files an investigator decides to use as input, the files should only include a limited number of variables as described below. The “version” variables (i.e. DGNS_VRSN_CD_1- DGNS_VRSN_CD_25) are not needed, since it was determined that checking if the claim end date was before 10/1/2015 (for ICD-9) provided more accurate results.

Variables to keep for the macro:

  • MEDPAR - keep Patient_ID, admission date (ADMSN_DT), discharge date (DSCHRG_DT), length of stay (LOS_DAY_CNT), and diagnosis codes (ADMTG_DGNS_CD DGNS_1_CD--DGNS_25_CD). Set filetype='M'.
  • NCH - keep Patient_ID, claim from date (CLM_FROM_DT), claim thru date (CLM_THRU_DT), and diagnosis codes (PRNCPAL_DGNS_CD ICD_DGNS_CD1-ICD_DGNS_CD12 LINE_ICD_DGNS_CD). The carrier data can have more than one claim for the same date of service and all claims for each date should be included. Set filetype='N'.
  • OUTPAT - keep Patient_ID, claim from date (CLM_FROM_DT), claim thru date (CLM_THRU_DT), and diagnosis codes (PRNCPAL_DGNS_CD ICD_DGNS_CD1-ICD_DGNS_CD25). Set filetype='O'.

The final SAS file which combines data from any of the above sources must include the variables used in the Macro call. It is most efficient if the diagnosis code variables are renamed to be the same in each file (i.e. rename DGNS_1_CD-DGNS_25_CD=ICD_DGNS_CD1-ICD_DGNS_CD25). If the comorbidity score for the 12 months prior to diagnosis is to be calculated, then the file should be subset to include only records with claim dates falling within that window. However, if the rule-out algorithm is invoked, claims for 30 days before and after the window of analysis should also be included.

An example SAS program to build an input file is available here: comorbidity.example.program.sas (SAS, 5 KB).

These parameters are needed for the macro call:

INFILE - Dataset name: SAS dataset of Medicare claims.

PATID - Variable name(s): Unique ID for each patient.

STARTDATE - Variable name: Date the comorbidity window starts, in SAS date format.

ENDDATE - Variable name: Date the comorbidity window ends, in SAS date format.

CLAIMFROMDATE - Variable name: Start date of the claim found on the claim file, in SAS date format.

CLAIMTHRUDATE - Variable name: End date of the claim found on the claim file, in SAS date format.

CLAIMTYPE - Variable name: The source of the claim record ('M'=MEDPAR, 'O'=OUTPAT, 'N'=NCH). Note, do not use DME.

DXVARLIST - List of variable names: The diagnosis codes in ICD-9 or ICD-10 (e.g. ICD_DGNS_CD1-ICD_DGNS_CD25). If there are multiple variables, some of which cannot be included in a range, list them using spaces to separate each single element or range (e.g. PRNCPAL_DGNS_CD ICD_DGNS_CD1-ICD_DGNS_CD25 LINE_ICD_DGNS_CD).

RULEOUT - Flag: Set this to ‘Y’ or ‘R’, if the “ruleout algorithm” should be invoked, otherwise set this to ‘N’ or leave it blank.

OUTFILE - Dataset name: Output SAS dataset (if not specified, the default name is Comorbidities). This output dataset contains one record for every subject in the input file who has at least one claim within the window identified by STARTDATE and ENDDATE. Subjects who don’t have any claims within the window will not be included in this file. It is up to the user to decide if these subjects should be treated as having a comorbidity score that is unknown or if they are considered to have zero comorbidities.

An example call statement for the macro would be:

%COMORB(Claims,patient_id,start_date,end_date,claim_from_date,claim_thru_date,filetype,prncpal_dgns_cd icd_dgns_cd1-icd_dgns_cd25 line_icd_dgns_cd,Y,Comorb);

Last Updated: 24 Sep, 2021