Attributes for External Sources & Events #1591

JosephVolosin · 2024-10-31T17:46:49Z

JosephVolosin
Oct 31, 2024
Collaborator

What

As an extension of external events & sources, we would like to implement something referred to as arguments that will allow for expanded capabilities for incorporating events & sources with procedural scheduling (and eventually, constraints).

The main goal is to allow procedural scheduling & constraints access to more data about specific external sources & events, which in turn allows mission planners access to more complex planning involving external sources & events.

One important note (discussed later in 'Why' and 'Terminology'): although the arguments term is meant to share similarity with an activity directive's arguments the arguments introduced here refer to existing data from the external source/event (that is currently not captured), and are immutable. They are not input directly by the user or editable after creation by the user (unless they alter the actual event/source definition). The reference to activity directive's arguments is to represent that these arguments are specific to a source/event the same way a set of arguments is specific to an activity directive.

Why

With the initial introduction of external events & sources, we are able to 'categorize' our events & sources during procedural scheduling by the following: Derivation Group, External Source Type, and External Event Type. That is to say: if we are looking for specific events and/or sources we can 'filter' down our potentials based on those 3 fields.

For example: if we are looking for all events that pertain to contact periods with the DSN, we will probably have a DSN Contact external event type that contains them. We may also have a DSN Contact File external source type and even DSN Contact Baseline or DSN Contact Backup derivation groups. This is helpful for very basic filtering and classification of our events & sources, however mission planning may require more detailed information about the specific sources and/or events in order to fulfil planning conditions (see the example below).

Adding arguments to our external events & sources allows mission planners a powerful interface for filtering and classifying events & sources that can directly query events or sources within a shared type that may have unique data/values. These arguments represent data that already exists in the original event definition, and creates an immutable interface to access it from within the event/source itself.

These arguments were not needed for a baseline implementation of external events/source and procedural scheduling, but are needed for a powerful, complex implementation that can cover what mission planners need access to in order to properly generate activities from external events.

arguments are also especially helpful when you have sources or events used for something generic (ex: a Weather Report source, as seen in the example below, that may refer to the weather report from a number of different locations). To avoid creating a dozen redundant source or event types to categorize a single piece of data about the source or event, arguments let us keep the generic source or event type and attach as much additional, classifying data to the source or event as we need.

Refer to the example below for more information.

Example/Use Case

Using the 'Weather' example used for external sources & events, the following will depict the use-case of arguments within sources and events.

Arguments in an External Source

Say we have an external source, WEATHER_REPORT_2024_012_019.json of type Weather Report. This weather report happens to contain the forecasts for the week of January 12th => 19th of 2024. Below is the file's content:

{
    "source": {
        "key": "WEATHER_REPORT_2024_012_019.json",
        "source_type": "Weather Report",
        "valid_at": "2024-012T00:00:00Z",
        "period": {
            "start_time": "2024-012T00:00:00Z",
            "end_time": "2024-019T00:00:00Z"
        },
        "arguments": {
            "location": "Greenbelt, MD",
            "locationElevation": 102,
            "locationLatitude": 38.99,
            "locationLongitude": 76.89,
            "hasPrecipitation": true,
            "readingFrom": "National Weather Service",
            "lastUpdate": "2024-010T22:54:16Z",
            "avgHumidity": 54
        }
    }
}

Along with our usual information of key, valid_at, and the period, we can see this source has a field of arguments that contain additional data about our external source. In this example, we can see the source tells us where the report came from (readingFrom, lastUpdate), where the reading is for (location, locationElevation, locationLatitude, locationLongitude), and some information about the content (avgHumidity, hasPrecipitation).

This additional information from the external source can help us filter potential sources within the features of procedural scheduling & constraints. Say we want to gather the events from all sources for Greenbelt, MD, we can now use the location argument on the source to then get all the contained events, and use them in a goal or constraint.

What if we not only want all the events pertaining to Greenbelt, MD weather reports, but all the events for Greenbelt, MD that have a low of <= 32, and a chance of precipitation? Our source's events may look like the following:

{
    "events": [
        {
            "key": "Forecast_2024_012",
            "event_type": "Weather Forecast",
            "start_time": "2024-012T00:00:00Z",
            "duration": "24:00:00",
            "arguments": {
                "chanceOfPrecipitation": 85,
                "low": 28,
                "high": 52
            }
        }
    ]
}

Again, we get information relevant to the event type: chanceOfPrecipitation, low, and high - and values that are specific to that event instance: 85, 28, 52.

With arguments fully configured on our external source & it's encapsulated events, we can describe a complex scheduling goal that can...

Find all sources that are for Greenbelt, MD
Filter those down to all with hasPrecipitation: true
Find all events within those sources that have a low <= 32 and chanceOfPrecipitation >= 80

How

Approach 1: No Schema

Explanation

The first approach is to not use schemas and keep the arguments being implemented to a basic key: value mapping on the external source & event. This is presented as the first approach as it is the simplest, lowest overhead, lowest friction, and 'loosest' implementation.

This approach is not ideal as it lacks safety for the user and potentially makes interacting with external events/sources more difficult than if the user had submitted a well-defined schema for their arguments, however provides the 'quickest' and easiest process for the user to get started with external source/event arguments.

Example: External Source

Below is an example of what the arguments field of an external source would look like with this approach. Note that there is no change on an external event/source-type level to accommodate for this change.

{
    "arguments": {
        "location": "Greenbelt, MD",
        "high": 75,
        "low": 46
    }
}

Approach 2: Strict Implementation

Explanation

The second approach is referred to as the strict approach. This implementation implements arguments as a schema-like object on the external source/event type level, as well as then implementing arguments at the external source/event level as a key: object mapping. Our approach to implementation would utilize JSON Schema in order to allow the user to define their schemas through the JSON Schema language, and then upload or otherwise ingest them into Aerie for use. See below for examples

Example: External Event Type

"Weather Report": {
    "additionalProperties": false,
    "description": "This is an example event type representing weather reports.",
    "properties": {
        "location": {
            "type": "string",
            "description": "Where this weather reading was taken."
        },
        "precipitationChance": {
            "type": "int",
            "description": "The chance of precipitation, represented as an integer."
        },
    },
    "required": ["location"],
    "type": "object"
}

Example: External Event

{
    "arguments": {
        "location": "Greenbelt, MD",
        "precipitationChance": 30
    }
}

Problem 1: The user CAN'T make mistakes

One of the most impactful problems with this approach is that the user cannot make mistakes, and to a greater extent, the source cannot make mistakes. This means, if the user translates the original source incorrectly to the uploaded JSON format and includes a field that is not expected for a given source or event type, the source cannot be uploaded. Additionally, if the source itself includes an undefined argument, it cannot be uploaded.

Problem 2: UX Friction

The other heavily impactful problem is that to actually implement this approach, the user will need to pre-define their external source and event types manually so that the arguments schema can be written and then used to validate the to-be-uploaded external source. This means that to upload an external source a user needs to:

Define the external source type, along with a full schema of the expected arguments
Define the external event type, along with a full schema of the expected arguments
(Possibly) Define the derivation group
Upload the external source

Approach 3: Optional Schemas

The third approach is a mixture of the previous two, where the user by default would not need to upload any schemas for their arguments, but could opt-in to utilizing schemas. This does essentially incur the complexity of implementing both approach 1 & 2, but would help solve some of the problems (namely UX) by allowing the simplest option by default. Implementing this approach also raises the question of how does a user 'switch' in/out of 'schema' mode and how is that applied (per-source, or per-user, or per-Aerie instance?).

Questions/Discussion Points

Supporting complex arguments

Consider an argument that is a nested JSON structure:

{
    "arguments": {
        "options": {
            "opt1": {
                "code": "abc123"
            },
            "opt2": {
                "code": "abc321"
            },
            "opt3": {
                "code": "asdf1234"
            }
        }
    }
}

We currently believe these sorts of complex structures should be supported by this feature, as it is entirely possible that planners will have source files that make use of complex structures like above. This is also a major factor in why we believe we can implement the 'schemas' through JSON Schema, as it inherently supports the building and 'schema'-tizing of complex objects like this.

Do we need arguments for external sources?

The case of arguments for external events is more direct: we have an event that may be of a shared type, but have a completely unique piece of data about itself. This is also true for arguments on external sources, however when we filter down the results of a query based on an argument of external events, we get the specifically matching external event(s) to plan against. With external sources, we can filter down to specific sources, but those sources may have external events we don't care about inside. In this case, it would probably be a more efficient query to just query the external events out-right.

Updating Schemas

Should these schemas be updatable? One option is to instead of making the schema updatable, use a new source type/event type when the schema needs to be updated. Additionally, we can choose to avoid implementing any sense of updating in this first iteration, and leave that as a problem to solve later (and so schemas become immutable for now).

Terminology: Metadata vs. Arguments vs. Properties

Initially, we were referring to this feature as metadata (and even further back, properties for external events specifically), however after a conversation we concluded that arguments may be a more sufficient name as this feature is closer to the arguments an activity directive may have rather than metadata. Another option is to use properties again - not sure if this term is used elsewhere in Aerie.

An argument against arguments as the name is that due to the user's familiarity with activity directive arguments, they may assume that the arguments on an external source or event are mutable and defined by the user - which is NOT the case.

How do we define the schemas?

The following applies to approach 2 & 3 which make use of schemas for validating arguments. Additionally, we are assuming the schemas would be implemented with JSON Schema as mentioned above.

In the UI

CreateGroupsOrTypesModal already lets the user define external source & event types (though it currently isn't required, as the GraphQL mutation already does that). However because we do expect to support complex objects as 'types', this could become a nightmare to develop within the UI (as opposed to if the options were limited to a simple dropdown of, for example, 'boolean' or 'integer' or 'string'). Note there are tools (1, 2) that could be integrated with Aerie's UI to make this process easier, but may not be worth the trade off of implementing.

File Input

File input can be implemented for schemas by having the user upload a JSON Schema that defines their new type's arguments. There is more work to be done to figure out exactly how this would work on the back-end, but by using JSON Schema this approach is heavily simplified (i.e., we don't have to do work of defining/re-defining a schema structure).

Conclusion

External event and source arguments would allow for more complex procedural scheduling (and eventually, constraint writing) to be performed as Aerie could capture additional data on a source-specific or event-specific level.

Mythicaeda · 2024-11-04T23:50:02Z

Mythicaeda
Nov 4, 2024
Collaborator

I am strongly in favor of Option 2: enforcing arguments to have a well defined schema.

This matches the precedent of similar fields.
It helps the UI render the fields if it can pull up a reference of what data type it's even trying to render.
The fact that the user can't make mistakes is a boon, especially if these arguments are meant to be used in goals/constraints. Consider the weather example where you want to find all events with a chanceOfPrecipitation lower than 80%. Let's say chanceOfPrecipitation is an int. Then the goal can just say something like events.find(e where e.chanceOfPrecip < 80). If there isn't a schema than this becomes a massive usability pain, as even if all events contain this field (of which there's no guarantee without a schema), different events can chose different representations of the same field, for example, 80 (int) .8 (double) or 80% (string). Asking the users to define an argument schema is 1) a lesser headache than asking users to deal with the lack of schema in their code and 2) a reasonable expectation of the user who decided what the given event type looks like.

0 replies

dandelany · 2024-11-05T20:20:35Z

dandelany
Nov 5, 2024
Maintainer

We met today as a group to discuss options for this and make some decisions to allow GSFC to continue prototyping.

Decisions made

Don't name them "arguments" - "attributes", "properties", or "metadata" instead
Don't support freeform attributes - if user wants to attach data, they must provide a schema
We need support for complex/nested data types eg. {thing: [{a: b, c: d}]}
Schemas for attributes are immutable and can't change once defined (for now)
Basic UI - "upload your schema file" button (ie. no fancy "schema builder" form UI)
Should have API support (gateway counts as a part of our API)

Important remaining questions:

Format - JSON Schema or ValueSchema
Where [are we validating]? - Gateway (on the way in) or Hasura Action -> Java code
Ergonomics when consuming events - Java code generation/classes or dynamic typing?
- what does it look like to use events in eg. a procedural goal?

While there are still some big unknowns, some of the important questions have been settled, & GSFC team will prototype and meet again with Aerie devs in ~1 week

1 reply

JosephVolosin Nov 5, 2024
Collaborator Author

Including my notes below w. some of the conversation details as well!

Implementation (No Schema vs. Explicit vs. Both)

Dan: It’s a lot to ask the users to pre-defined schemas for all data types that go in it; increases the barrier to entry. Should have some support for free-form, non-validated objects
Theresa: For always strict checking - if we optionally allow it, it has a LOT of implications for previously-validated data (or vice-versa)
- Having the two co-exist is not impossible, but the logistics of supporting both is a lot
Pranav: Asking users to define schemas is similar to asking them to make mission models
Theresa: If we want them to be used in procedural scheduling/constraints, this can be a HUGE headache if we don’t have schemas (see GH comment)
- Technically, the user needs to handle a bunch of edge-cases based on their input because they have no reason to be confident on what the source is saying
Joel: If people can insert data manually, it will not be consistent. If we want to make it for machines to use, we need to create a format (from experience with M20)
Consensus: Strict! No free form data
- No schema required if not using attributes (empty JSON as database default)

Naming

Dan: Against arguments, for properties or attributes
Theresa: Metadata can work too - this exists in Activities and represents something similar
- DOES have strict typing and a schema! And ValueSchema can be re-used potentially for our purposes
- Example Metadata Schema
Jonathan: Properties can be confusing because it conflicts potentially w. other keys (which can be referenced as ‘properties’ in JSON/JavaScript). Too much conflating of names for things
- Theresa: It does seem intuitive that ‘properties’ refers to the ‘properties’ key in an event
Consensus: Change to Attributes!

Format

JSON Schema vs. ValueSchema vs. Metadata Schema
Metadata Schema does NOT support complex objects
Dan: For JSON Schema - standard and something that we already have in Aerie, consistent with other schemas that are defined in JSON Schema. Can use an off-the-shelf validator to catch all edge cases
Theresa: For ValueSchema - especially good for Java code implementations
- Matt: Think we would have to do some work to write a validator - where the ValueSchemas come from in activities, the source of truth is the Java type written in the mission model which generates a parser and provides the ValueSchema as an output
  - Theresa: We might want to do that anyway! Worth considering b/c we don’t validate in the database
- Starts with a Java object, ValueSchema is a JSON made from the Java object which defines what the serialized JSON value of something (ex: Activity Arguments) look like
Theresa: If we have a ValueSchema can we just create the parser to create a Java object out of the input?
- Matt: But what Java object? With the mission model, you define a custom Java type and then the parser is generated for it, and then the ValueSchema is generated for it. We are inverting that - start w. a ValueSchema and work backwards which is not currently done but similar to what is done with EDSL (produce TypeScript from a ValueSchema)
- Theresa: Pretty sure you can hand a JSON Schema to a JSON reader in Java and it can read the object into a class with the correct properties for you. More investigation needed
  - If you’re using this in constraints and goals, you’ll want the object anyway. But where’s the class?
    - What does the Java code that consumes the attributes on Events look like?
Consensus: Undecided for now, wait for post-prototyping tag-up

Validation Schemas

Branch off the Format topic above - where are our schemas validated?
Theresa: Options are API Gateway & Database level
Dan: Biggest argument for anything other than Gateway is language difference - do we want to have ValueSchema/Java? Then we should not put it in Gateway b/c that’s less of a Gateway job
Consensus: Undecided for now, wait for post-prototyping tag-up

API Gateway

Better than UI b/c allows API access
Better for validating that the schema matches up with the inserted event/source attributes
- Joel: Could also be a Hasura Action?
  - Theresa: If we are doing it via. Hasura Actions, the action should be on the entire transaction and not on an individual insert (ex: If we are uploading 1000 events, we want 1000 validations but 1 network request). Theresa will check the logistics on this!
  - Dan: Would still need an endpoint somewhere for this to hit
  - Joel’s idea to use Hasura Actions is based on moving this closer to the Java code
  - Theresa: Hasura Action gets a message body that will send to an end point, message body contains the work you want to do - we don’t need to upload an actual file!
    - Could have a listening endpoint somewhere that waits for Postgres to give it a notification asking it to parse
      - Matt: Postgres sends notifications on commit so this is problematic for rollback
Theresa: Using Gateway to start with is good starting point b/c we need it for file upload and whatever we come up with finally will probably use the Gateway as a starting point

Database

New event type got inserted, let me check the ValueSchema before insertion occurs
- Similar to Permissions schema
Theresa: Does not seem like an unreasonable overhead in the database b/c we don’t think these will change a lot (have not currently discussed updating)
Database seems messy for JSON Schema b/c they are not well-defined like ValueSchema
Possible w. extensions, but hesitant to include new extensions to the database
- Matt: A lot of users run in RDS but would need to validate support with RDS prior to adding extensions

What does the Java code that consumes the attributes on Events look like?

Feeds into the answers for Validation Schemas and Formats
Matt: 2 approaches
- You can treat the data as JSON and not have a custom Java type - dynamically typed approach
- Have Java types that correspond to the event/source types
  - How do we make sure that correspondence is true? Code generation?
    - Do JSON Schema and ValueSchema both work with code generation?
      - ValueSchema = Yes!
      - JSON Schema = Unknown
- Dan: Dynamic typing feels a little silly because we have a schema, if we go through the trouble of defining it we should then be able to use intelligent typing on our attributes
  - Matt: Schema is still valuable - what it’s really saying is once you’ve tested your data on the code there won’t be any surprises - reliability guarantee (w.out ergonomic aid when writing code, and may make the code ugly)
What does a scheduling goal using attributes look like?
- Having an idea of the end-user experience is a separate question but influences dynamic/static typing
Consensus: Undecided for now, wait for post-prototyping tag-up

Can schemas change?

Consensus: Immutable to begin with!

UI

Consensus: Basic, allows user to upload a schema file

API Support

Consensus: Should be included, especially for uploading schema. Gateway is a valid API support for this (Theresa & Matt concur) as opposed to GQL

Path forwards

Figure out which approach(es) we like, and prototype them, discover how much sense they still make and the possible defense for them
Pranav and Joe will prototype
- Theresa: Gateway CAN trigger a Hasura Action from Java code! If we decided to do that
- Re-meet in a week, potentially at the GSFC-Aerie tag-up Monday (11/11) evening (need fwd’d to devs)

JosephVolosin · 2024-11-12T19:30:17Z

JosephVolosin
Nov 12, 2024
Collaborator Author

Architecture Discussion 2 Notes (11/12)

Matt: If there is an operation we think shouldn’t be done, we lock it down via. Hasura permissions
- In this case, only gateway service would be able to make this request
  - No serious impact to locking this down as the only user-approved approach is using the gateway and other approaches would be more admin-y/power user-y in nature (ex: bulk uploads of ‘pre-validated’ data)
Types generated from Mission Model - downside is that the sources/events are not mission model specific
How much of code generation is in-scope for our current iteration on the architecture?
- Dan & Matt: Trade-off is that using ValueSchema makes the code generation easier but the validation harder
  - ValueSchema prototype proves validation is possible and not extensively complicated
ValueSchema does not support optionality
- Top-level of event can be a ‘key: value’ pairs of ‘string: ValueSchema’ which lets you ‘define’ optionality
  - ValueSchema would need to be extended to support this in a nested way
JSONSchema-to-Java could be very complicated
Joel: Code generation for procedural scheduling might be a non-starter as we currently avoid it
- To implement generated code for type-safety in procedural scheduling…
  - Upon adding schema to database, use Hasura Action to generate Java code, then compile to .jar, then expose in UI somewhere (ex: query Aerie for the .jar to download), then use as a library in compilation
  - No infrastructure really configured for this currently
Matt: 3rd option - script queries Schema and saves it locally. At compile time, annotation processor generates Java code
- Regenerates code over-and-over but requires no extra infrastructure
Joel: Another option - ProtoBuf could potentially represent our data and it already has support for code generation, used as an alternative to JsonSchema or ValueSchema)
- If ProtoBuf can generate JsonSchema as well, we can present the dynamic option to user’s and if they want a static option we can tell users to use ProtoBuf (optionally). All we present is the dynamic option
- Could be something that’s implemented in the ‘Aerie Extended Universe’ as this is a unique feature that requires extra work

Conclusions

Static typing is out-of-scope but we want to do it in the future. Explore out-of-the-box thinking for event attributes and other procedural scheduling use cases
JsonSchema is the currently agree’d upon approach for schema’ing
UI will remain as a prototype for now until we decide on how to display the schemas and if JsonSchema is OK with the UI
Joel will look into the ProtoBuf-style solution for JsonSchema but this is not a near-future feature

0 replies

pranav-super · 2024-11-21T15:33:33Z

pranav-super
Nov 21, 2024
Collaborator

Some Implementation Notes

In implementing this feature, we have encountered 3 different options. They are each enumerated here, with links to their implementing branch(es) where applicable. These all utilize JSON Schema.

Implementation 1: Separate Schemas

One way that this implementation can be done is by the use of entirely separated attribute/type schemas. This means that each external source type and each event type has its own attribute/type schema, each of which need to be created and uploaded to AERIE separately. The format of these attribute schemas, regardless of if they are source or event type attribute schemas, would be fundamentally similar, and would look as follows:

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "title": "TypeA",
  "description": "Schema for the attributes of TypeA.",
  "type": "object",
  "properties": {
    "series": {
      "type": "object",
      "properties": {
        "type": { "type": "string" },
        "make": {"type": "string"},
        "iteration": {"type": "number"}
      }
      ...
    }
  }
}

They would be uploaded fro a modal like the following:

In terms of the backend, the process for validating a source would be as such:

First, the user uploads the relevant source and event types to the database. The source and event type tables each have a new column to store jsonb attribute schemas. They must upload all types that pertain to a source (all contained event types and the source type) before uploading said source.
Then, they upload the source. This gets sent from the UI (or CLI) to the gateway, which does the following checks:
- First, the gateway verifies against an internal schema that the external source is formatted correctly (i.e. there is a section for external_events and one for the source, and each of those are formatted logically)
- Then, the gateway pulls the attribute schema for the source, and ensures that the source attributes fit it.
- Following that, the gateway compiles a list of all used event types in the source, pulls each event type attribute schema, and verifies all events against it.
- Finally the gateway uploads the source.

In this process, we make several queries (this can be narrowed down to just 2) against the database, fetching from 2 tables.

This is the implementation that can be found here.

Implementation 2: Unified Type Schemas

The second option (and the third option) do away with separation of type schemas to a degree. This simplification/option relies on the assumption that event types aren't really reused across source types. This, while a restriction, we believe is a very valid assumption as we have yet to encounter any overlap between event types and source types. When there is overlap, which should be rare, redefining a type across two different source type schemas and renaming them slightly could be done to prevent collisions, though that is far from an ideal solution to this potentially rare case.

It is here that we make a fundamental distinction between the second and third options. We mentioned earlier having an internal schema for the external source's formatting. One possible option here is to write the JSON Schema in a vague enough way that references some $defs property to reference external definitions, such as type schemas. This will be elaborated on shortly. The other option is to forget the idea of splitting schemas, and just store the entire internal schema, but including the $defs and any other details, all in one.

We will elaborate on the $defs option first. In JSON Schema, it is possible to define variables, so that definitions for schema components can be reused. For example, consider the following example modified from the JSON Schema documentation:

{
  "$id": "https://example.com/schemas/customer",

  "type": "object",
  "properties": {
    "first_name": { "$ref": "#/$defs/name" },
    "last_name": { "$ref": "#/$defs/name" },
    "shipping_address": { "$ref": "#/$defs/address" },
    "billing_address": { "$ref": "#/$defs/address" }
  },
  "required": ["first_name", "last_name", "shipping_address", "billing_address"],

  "$defs": {
    "name": { "type": "string" }
    "address": { ... }
  }
}

Here, we have a schema for name-like properties and address-like properties, each of which are reused in the main schema. Instead of expanding those subschemas repeatedly, they are sectioned off so that they can be used repeatedly.

An additional feature of $defs and $refs is they can be used across schemas. This means we can provide a schema that just includes $defs and another schema that $refs said $defs, and combine them to form a complete schema. A good example of this is provided in the AJV documentation (AJV is the JSON Schema validator that we are currently using):

const schema = {
  $id: "http://example.com/schemas/schema.json",
  type: "object",
  properties: {
    foo: {$ref: "defs.json#/definitions/int"},
    bar: {$ref: "defs.json#/definitions/str"},
  },
}

const defsSchema = {
  $id: "http://example.com/schemas/defs.json",
  definitions: {
    int: {type: "integer"},
    str: {type: "string"},
  },
}

A simple step can combine the two, and allow the first schema to reference the $defs of the second schema.

Using this and some conditional logic, we have a way to define a separate $defs schema that just defines source_type and event_types all in one file, and we can combine that and transform a basic external source schema, on the fly (in the gateway), to create a type-specific megaschema for validation purposes. This would function as follows:

First, the user uploads the relevant source types to the database. The source type table ONLY has a new column to store jsonb $defs attribute schemas.
Then, they upload the source. This gets sent from the UI (or CLI) to the gateway, which does the following checks:
- First, the gateway checks the external source's type.
- It then pulls the $defs schema from the database for the given type.
- An internal, base schema, is transformed with JSON Schema conditional logic, and is combined with the aforementioned $defs schema, to provide a new, type-specific megaschema for validation purposes.
- Validation of the external source is run against this combined schema, and the source is uploaded if it passes.

This option requires only 1 call to the database, though it does incur some transformation work on the part of the gateway.

The UI would be the same as the first option, just that the creation modal should only have the option to create external source types instead of also having the option to create an external event type.

This implementation can be found here.

Implementation 3: Per-source-type Megaschema

This final option combines the $defs and base schema upfront. So, if the base schema was something like the following:

{
    $id: "source_schema",
    $schema: "http://json-schema.org/draft-07/schema",
    additionalProperties: false,
    description: "The base schema for external sources. Defs and ifs, for specific source/event type attributes, are integrated later.",
    properties: {
        external_events: {
            items: {
                additionalProperties: false,
                properties: {
                    attributes: {
                        type: "object"
                    },
                    duration: { "type": "string" },
                    event_type_name: { "type": "string" },
                    key: { "type": "string" },
                    start_time: { "type": "string" }
                },
                required: ["duration", "event_type_name", "key", "attributes", "start_time"],
                type: "object"
            },
            type: "array"
        },
        source: {
            additionalProperties: false,
            properties: {
                attributes: {
                    type: "object" // WILL BE REPLACED WITH A $ref
                },
                derivation_group_name: { "type": "string" },
                key: { "type": "string" },
                period: {
                    additionalProperties: false,
                    properties: {
                        end_time: {
                            pattern: "^(\\d){4}-([0-3][0-9])-([0-9][0-9])T([0-1][0-9]):([0-5][0-9]):([0-5][0-9])(\\+|-)([0-1][0-9]):([0-5][0-9])$",
                            type: "string"
                        },
                        start_time: {
                            pattern: "^(\\d){4}-([0-3][0-9])-([0-9][0-9])T([0-1][0-9]):([0-5][0-9]):([0-5][0-9])(\\+|-)([0-1][0-9]):([0-5][0-9])$",
                            type: "string"
                        }
                    },
                    required: ["start_time", "end_time"],
                    type: "object"
                },
                source_type_name: { "type": "string" },
                valid_at: {
                    pattern: "^(\\d){4}-([0-3][0-9])-([0-9][0-9])T([0-1][0-9]):([0-5][0-9]):([0-5][0-9])(\\+|-)([0-1][0-9]):([0-5][0-9])$",
                    type: "string"
                }
            },
            required: ["key", "source_type_name", "valid_at", "period", "attributes"],
            type: "object"
        }
    },
    required: ["source", "external_events"],
    title: "SourceTypeA",
    type: "object"
}

which combines with a user provided $defs schema like the following:

{
      "$id": "defs",
      "definitions": {
        "event_types": {
          "EventTypeA": {
            "properties": {
              "series": {
                "properties": {
                  "iteration": { "type": "number" },
                  "make": { "type": "string" },
                  "type": { "type": "string" },
                },
                "required": ["type", "make", "iteration"],
                "type": "object",
              }
            },
            "required": ["series"],
            "type": "object",
          },
          "EventTypeB": {
            "type": "object",
            "required": ["projectUser", "tick"],
            "properties": {
              "projectUser": {
                "type": "string"
              },
              "tick": {
                "type": "number"
              }
            }
          },
          "EventTypeC": {
            "type": "object",
            "required": ["aperture", "subduration"],
            "properties": {
              "aperture": {
                "type": "string"
              },
              "subduration": {
                "type": "string",
                "pattern": "^P(?:\\d+Y)?(?:\\d+M)?(?:\\d+D)?T(?:\\d+H)?(?:\\d+M)?(?:\\d+S)?$"
              }
            }
          }
        },
        "source_type": {
          "SourceTypeA": {
            "type": "object",
            "required": ["version", "wrkcat"],
            "properties": {
              "version": {
                "type": "number"
              },
              "wrkcat": {
                "type": "string"
              }
            }
          }
        }
      }
    }

to produce the following megaschema:

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "title": "SourceTypeA",
  "description": "Schema for the attributes of SourceTypeA and all event types in it (EventTypeA and EventTypeB). May define a metaschema rendering a few of these schema fields unnecessary.",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "external_events": {
      "items": {
        "additionalProperties": false,
        "properties": {
          "attributes": {
            "type": "object"
          },
          "duration": { "type": "string" },
          "event_type_name": { "type": "string" },
          "key": { "type": "string" },
          "start_time": { "type": "string" }
        },
        "if": {
          "properties": {
            "event_type_name": {
              "const": "EventTypeA"
            }
          }
        },
        "then": {
          "properties": {
            "attributes": {
              "$ref": "#/$defs/EventTypeA"
            }
          }
        },
        "else": {
          "if": {
            "properties": {
              "event_type_name": {
                "const": "EventTypeB"
              }
            }
          },
          "then": {
            "properties": {
              "attributes": {
                "$ref": "#/$defs/EventTypeB"
              }
            }
          },
          "else": {
            "properties": {
              "attributes": {
                "$ref": "#/$defs/EventTypeC"
              }
            }
          }
        },
        "required": ["duration", "event_type_name", "key", "attributes", "start_time"],
        "type": "object"
      },
      "type": "array"
    },
    "source": {
      "additionalProperties": false,
      "properties": {
        "attributes": {
          "$ref": "#/$defs/SourceTypeA"
        },
        "derivation_group_name": { "type": "string" },
        "key": { "type": "string" },
        "period":  {
          "additionalProperties": false,
          "properties": {
            "end_time": {
              "pattern": "^(\\d){4}-([0-3][0-9])-([0-9][0-9])T([0-1][0-9]):([0-5][0-9]):([0-5][0-9])(\\+|-)([0-1][0-9]):([0-5][0-9])$",
              "type": "string"
            },
            "start_time": {
              "pattern": "^(\\d){4}-([0-3][0-9])-([0-9][0-9])T([0-1][0-9]):([0-5][0-9]):([0-5][0-9])(\\+|-)([0-1][0-9]):([0-5][0-9])$",
              "type": "string"
            }
          },
          "required": ["start_time", "end_time"],
          "type": "object"
        },
        "source_type_name": { "type": "string" },
        "valid_at": {
          "pattern": "^(\\d){4}-([0-3][0-9])-([0-9][0-9])T([0-1][0-9]):([0-5][0-9]):([0-5][0-9])(\\+|-)([0-1][0-9]):([0-5][0-9])$",
          "type": "string"
        }
      },
      "required": ["key", "source_type_name", "valid_at", "period", "attributes"],
      "type": "object"
    }
  },
  "required": ["source", "external_events"],
  "$defs": {
    "EventTypeA": {
      "type": "object",
      "required": ["series"],
      "properties": {
        "series": {
          "type": "object",
          "properties": {
            "type": { "type": "string" },
            "make": {"type": "string"},
            "iteration": {"type": "number"}
          },
          "required": ["type", "make", "iteration"]
        }
      }
    },
    "EventTypeB": {
      "type": "object",
      "required": ["projectUser", "tick"],
      "properties": {
        "projectUser": {
          "type": "string"
        },
        "tick": {
          "type": "number"
        }
      }
    },
    "EventTypeC": {
      "type": "object",
      "required": ["aperture", "subduration"],
      "properties": {
        "aperture": {
          "type": "string"
        },
        "subduration": {
          "type": "string",
          "pattern": "^P(?:\\d+Y)?(?:\\d+M)?(?:\\d+D)?T(?:\\d+H)?(?:\\d+M)?(?:\\d+S)?$"
        }
      }
    },
    "SourceTypeA": {
      "type": "object",
      "required": ["version", "wrkcat"],
      "properties": {
        "version": {
          "type": "number"
        },
        "wrkcat": {
          "type": "string"
        }
      }
    }
  }
}

this option proposes skipping the computation and just uploading the final object to the database. This would require more work on the user's part, though it is a lot less opaque about what is going on. That being said, validating that this schema above is correct requires a metaschema, which proved to be quite difficult to write correctly. It was in doing so that it became abundantly clear that this option, while more transparent, requires excessive repetition on the user part, and will likely require a user to copy and paste a template when creating new source types, as opposed to just defining a simple $defs schema.

Owing to the above difficulties, there is no implementation available for this as it was sort of ruled out as unnecessarily complicated for users and developers. However, should this be the chosen option, we can flesh out the mega-metaschema.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attributes for External Sources & Events #1591

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Attributes for External Sources & Events #1591

JosephVolosin Oct 31, 2024 Collaborator

What

Why

Example/Use Case

Arguments in an External Source

How

Approach 1: No Schema

Explanation

Example: External Source

Approach 2: Strict Implementation

Explanation

Example: External Event Type

Example: External Event

Problem 1: The user CAN'T make mistakes

Problem 2: UX Friction

Approach 3: Optional Schemas

Questions/Discussion Points

Supporting complex arguments

Do we need arguments for external sources?

Updating Schemas

Terminology: Metadata vs. Arguments vs. Properties

How do we define the schemas?

In the UI

File Input

Conclusion

Replies: 4 comments · 1 reply

Mythicaeda Nov 4, 2024 Collaborator

dandelany Nov 5, 2024 Maintainer

Decisions made

JosephVolosin Nov 5, 2024 Collaborator Author

Implementation (No Schema vs. Explicit vs. Both)

Naming

Format

Validation Schemas

API Gateway

Database

What does the Java code that consumes the attributes on Events look like?

Can schemas change?

UI

API Support

Path forwards

JosephVolosin Nov 12, 2024 Collaborator Author

Architecture Discussion 2 Notes (11/12)

Conclusions

pranav-super Nov 21, 2024 Collaborator

Some Implementation Notes

Implementation 1: Separate Schemas

Implementation 2: Unified Type Schemas

Implementation 3: Per-source-type Megaschema

JosephVolosin
Oct 31, 2024
Collaborator

Replies: 4 comments 1 reply

Mythicaeda
Nov 4, 2024
Collaborator

dandelany
Nov 5, 2024
Maintainer

JosephVolosin Nov 5, 2024
Collaborator Author

JosephVolosin
Nov 12, 2024
Collaborator Author

pranav-super
Nov 21, 2024
Collaborator