Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error seen running ETLSyntheaBuilder::LoadEventTables Using Microsoft Sql Server #197

Closed
greshje opened this issue Jun 10, 2024 · 1 comment

Comments

@greshje
Copy link

greshje commented Jun 10, 2024

I'm getting the error shown below trying to run ETLSyntheaBuilder::LoadEventTables using Microsoft Sql Server 2019, Synthea v3.0.0, and the latest commit of ETLSyntheaBuilder (all running on Windows).

> ETLSyntheaBuilder::LoadEventTables(
+ 	connectionDetails = cd, 
+ 	cdmSchema = cdmSchema, 
+ 	syntheaSchema = syntheaSchema, 
+ 	cdmVersion = cdmVersion, 
+ 	syntheaVersion = 
+ 	syntheaVersion
+ )
Connecting using SQL Server driver
Running: insert_location.sql
  |                                                                                                          |   0%Error in `.createErrorReport()`:
! Error executing SQL:
com.microsoft.sqlserver.jdbc.SQLServerException: The function 'row_number' must have an OVER clause with ORDER BY.
An error report has been created at  C:/temp/ponos/errorReportSql.txt
Run `rlang::last_trace()` to see where the error occurred.
> 

I also needed to remove a number of columns from the Synthea files to get the uploads of the csv files to work:
PATIENT

  • MIDDLE
  • FIPS
  • INCOME
  • Also needed to remove the NOT NULL for YEAR_OF_BIRTH

PAYER_TRANSITIONS

  • START_DATE
  • END_DATE
  • PLAN_OWNERSHIP
  • OWNER_NAME

PAYERS

  • OWNERSHIP

PROVIDERS

  • ENCOUNTERS
  • PROCEDURES

The entire script is shown below

# ---
#
# This file has been adapted from the documentation for the ETL-Synthea project:
# https://github.com/OHDSI/ETL-Synthea
#
# The SHA code used to specify the version used here is for the 2023-03-10 commit.  
#
# ---

print("Installing ETL-Synthea...")
devtools::install_github("OHDSI/ETL-Synthea@134c7af9f6e840434d705eabfb939b653097b6ff")
print("Done installing.")
print("Setting parameters...")

library(ETLSyntheaBuilder)

cd <- DatabaseConnector::createConnectionDetails(
  dbms     = "sql server", 
  server   = "localhost", 
  user     = "synthea_csv", 
  password = "Sneaker01", 
  port     =  1433, 
  pathToDriver = "C:\\temp\\fhir-to-omop\\drivers\\sqlserver",
  extraSettings = "encrypt=false;trustServerCertificate=true;databaseName=synthea_native_csv"
)

cdmSchema      <- "synthea_omop.dbo"
cdmVersion     <- "5.4"
syntheaVersion <- "3.0.0"
syntheaSchema  <- "synthea_native_csv.dbo"
syntheaFileLoc <- "C:\\temp\\fhir-to-omop\\synthea-csv\\csv"

print("")
print("")
print("# * * *")
print("#")
print("# CREATING TABLES")
print("#")
print("# * * *")
print("")
print("")

ETLSyntheaBuilder::CreateSyntheaTables(
	connectionDetails = cd, 
	syntheaSchema = syntheaSchema, 
	syntheaVersion = syntheaVersion
)

print("")
print("")
print("# * * *")
print("#")
print("# LOADING DATA")
print("#")
print("# * * *")
print("")
print("")

ETLSyntheaBuilder::LoadSyntheaTables(
	connectionDetails = cd, 
	syntheaSchema = syntheaSchema, 
	syntheaFileLoc = syntheaFileLoc
)

print("")
print("")
print("# * * *")
print("#")
print("# CREATING MAP AND ROLLUP TABLES")
print("#")
print("# * * *")
print("")
print("")

ETLSyntheaBuilder::CreateMapAndRollupTables(
	connectionDetails = cd, 
	cdmSchema = cdmSchema, 
	syntheaSchema = syntheaSchema, 
	cdmVersion = cdmVersion, 
	syntheaVersion = syntheaVersion
)

print("")
print("")
print("# * * *")
print("#")
print("# LOADING EVENT TABLES")
print("#")
print("# * * *")
print("")
print("")

ETLSyntheaBuilder::LoadEventTables(
	connectionDetails = cd, 
	cdmSchema = cdmSchema, 
	syntheaSchema = syntheaSchema, 
	cdmVersion = cdmVersion, 
	syntheaVersion = 
	syntheaVersion
)

print("")
print("")
print("# * * *")
print("#")
print("# Done.")
print("#")
print("# * * *")
print("")
print("")

The error report is shown below

DBMS:
sql server

Error:
com.microsoft.sqlserver.jdbc.SQLServerException: The function 'row_number' must have an OVER clause with ORDER BY.

SQL:
insert into synthea_omop.dbo.location (
location_id,
address_1,
address_2,
city,
state,
zip,
county,
location_source_value
)
select
row_number() over () as location_id,
locations.*
from
(select distinct
cast(null as varchar)               address_1,
cast(null as varchar)               address_2,
p.city                              city,
states_map.state_abbreviation       state,
cast(null as varchar)               county,
p.zip                               zip,
p.zip                               location_source_value
from synthea_native_csv.dbo.patients p
left join synthea_omop.dbo.states_map states_map on p.state=states_map.state) locations


R version:
R version 4.2.3 (2023-03-15 ucrt)

Platform:
x86_64-w64-mingw32

Attached base packages:
- stats
- graphics
- grDevices
- utils
- datasets
- methods
- base

Other attached packages:
- ETLSyntheaBuilder (2.0)
burrowse added a commit that referenced this issue Jul 24, 2024
burrowse added a commit that referenced this issue Jul 24, 2024
* Align insert and select clauses

* Add order by for SQL server

#197
@burrowse
Copy link
Collaborator

@greshje Thanks for reaching out!

In terms of the row number() order by error, I have updated the script to avoid this error for SQL server.

In terms of the missing fields from synthea, the latest version of synthea that this package supports is v3.2 which is also the latest official release of synthea. I suspect you may be using the generator from the latest available version https://github.com/synthetichealth/synthea/releases/tag/master-branch-latest which includes patients.middle in the csv output. All other fields should be captured by the v3.2 release. Would you be able to try generating your csv files using the [v3.2] (https://github.com/synthetichealth/synthea/releases/tag/v3.2.0) release from synthea and specifying syntheaVersion <- "3.2.0" in your script? It appears you are also specifying version 3.0.0.

burrowse added a commit that referenced this issue Jul 25, 2024
* Align insert and select clauses

* Add order by for SQL server

#197

* Update insert_location.sql
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants