Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vertex AI] Add ImageGenerationParameters for input to predict call #14208

Merged
merged 1 commit into from
Dec 3, 2024

Conversation

andrewheard
Copy link
Contributor

Added an encodable type ImageGenerationParameters, and a nested ImageGenerationOutputOptions type, that is passed in parameters in a predict request. The included parameters are from the VisionGenerativeModel schema (download); see the API reference for more details.

VisionGenerativeModel Parameters Schema Reference
title: VisionGenerativeModel
type: object
properties:
  sampleCount:
    type: number
    format: int32
    minimum: 1
    description: The number of generated images.
  sampleImageSize:
    type: string
    description: >
      The size of generated images. Supported sizes are: 64, 256, 512, 1024, 2048 and 4096.
      Default image size is 1024.
  storageUri:
    type: string
    description: >
      The Google Cloud Storage location for saving the generated images.
    pattern: '^gs:\/\/(.+)\/(.+)$'
  negativePrompt:
    type: string
    description: >
      Negative prompt for helping generate the images.
  seed:
    type: number
    format: int32
    description: The RNG seed.
  mode:
    type: string
    description: Editing mode.
    enum:
    - interactive
    - upscale
    - backgroundEditing
  model:
    type: string
    description: Select the underlying model to do the generation.
    enum:
    - muse
    - imagen
    deprecated: true
  aspectRatio:
    type: string
    description: >
      Optional generation mode parameter that controls aspect ratio. Supported aspect ratios: 1:1, 5:4, 3:2, 7:4, 4:3, 16:9, 9:16.
  guidanceScale:
    type: number
    format: float
    description: >
      Optional editing mode parameter that controls strength of alignment to prompt. 0-9 low strength, 10-20 medium strength, 21+ high strength.
  disablePersonFace:
    type: bool
    description: >
      Whether to disable person/face rai filtering.
    deprecated: true
  enablePersonFaceFilter:
    type: bool
    default: false
    description: >
      Whether to enable person/face rai filtering. Default false.
  raiLevel:
    type: number
    format: int32
    description: >
      Level of RAI filtering level. Lower level means more strict filtering set. Currently we have:
      Level 0 (block most)
      Level 1 (block few)
  disableChild:
    type: bool
    description: >
      Whether to disable child rai filtering. Only allowlisted user can set this variable. Otherwise, it will be ignored.
    deprecated: true
  enableChildFilter:
    type: bool
    default: true
    description: >
      Whether to enable child rai filtering. Default true. Only allowlisted user can set this variable. Otherwise, it will be ignored.
  sampleImageStyle:
    type: string
    description: Optional style that will be applied to the generated images.
    enum:
    - photograph
    - digital_art
    - landscape
    - sketch
    - watercolor
    - cyberpunk
    - pop_art
  includeRaiReason:
    type: bool
    description: >
      Whether to include reason why generated images are filtered.
  isProductImage:
    type: bool
    description: >
      Whether use self background editing for product images.
  controlNetConfig:
    type: object
    description: Optional configurations for ControlNet conditions.
    properties:
      enableControlNet:
        type: bool
        description: Whether ControlNet is enabled.
      conditions:
        type: array
        description: Configurations for each condition.
        items:
          type: object
          properties:
            conditionName:
              type: string
              description: The name of the condition used.
              enum:
              - cannyEdges
              - depth
            conditionMapBytesBase64Encoded:
              type: string
              description: >
                When the condition map is provided by the user, we will not compute the condition map on our side.
            conditionWeight:
              type: number
              format: float
              minimum: 0.0
              maximum: 1.0
              default: 1.0
              description: >
                The guidance weight for the condition signal. The higher the weight, the model respects the ControlNet condition more.
      originalImageWeight:
        type: number
        format: float
        minimum: 0.0
        maximum: 1.0
        default: 0.0
        description: >
          The weight for the original image.
          When set to 1.0, the output basically copies the input image.
          When set to 0.0, the output not respect the input image at all.
  outputOptions:
    type: object
    description: Optional output configurations.
    properties:
      mimeType:
        type: string
        description: Output mime type.
        default: image/png
        enum:
        - image/jpeg
        - image/png
      compressionQuality:
        type: number
        description: Optional compression quality if encoding in image/jpeg.
        format: int32
        default: 75
        minimum: 0
        maximum: 100
  upscaleConfig:
    type: object
    description: Optional upscale configurations.
    properties:
      imagePreservationFactor:
        type: number
        description: >
          With higher image preservation factor, the original image pixels are more respected.
          With a lower image preservation factor, the output image will have be more different from
          the input image, but maybe with finer details and fewer noises.
        format: float
        default: 0.5
        minimum: 0.0
        maximum: 1.0
      enhanceInputImage:
        type: bool
        description: >
          Whether to add an image enhancing step before upscaling.
          It is expected to suppress the noise and JPEG compression artifact from the input image.
        default: false
      enableFasterUpscaling:
        type: bool
        description: >
          NOTE: For experimental use, not production-ready.
          Whether to speed up upscaling. This option can't be used with high QPS since it lowers the
          availability of the upscaling API.
        default: false
      upscaleFactor:
        type: string
        description: >
          The factor to which the image will be upscaled.  If not specified, the
          upscale factor will be determined from the longer side of the input image
          and `sampleImageSize`
        enum:
        - x2
        - x4
  editConfig:
    type: object
    description: Optional editing configuration. Only available in imagegeneration@003.
    properties:
      enableClamping:
        type: bool
        default: false
        description: >
          Whether to enable clamping mode, which:
            * Better preserves unmasked area
            * Skips model internal dilation so client can fully control this
      bufferZones:
        type: array
        items:
          type: object
          properties:
            pixels:
              type: int32
              description: Number of pixels to dilate
              minimum: 0
              maximum: 100
            diffusionT:
              type: float
              description: diffusion_t time to do dilation, 1 is start, 0 is end.
              minimum: 0
              maximum: 1
      baseSteps:
        type: int32
        description: Number of steps to take for base sampling.
        minimum: 1
        maximum: 150
        default: 75
      baseGamma:
        type: float
        description: Gamma, controls noise during base sampling.
        minimum: 0
        maximum: 1
      baseGuidanceScale:
        type: array
        description: List of 4 integers controlling strength of text guidance during base sampling.
        items:
          type: int32
          minimum: 0
          maximum: 200
      sr1Steps:
        type: int32
        description: Number of steps to take for sr1 sampling.
        minimum: 1
        maximum: 4
        default: 4
      sr2Steps:
        type: int32
        description: Number of steps to take for sr2 sampling.
        minimum: 1
        maximum: 2
        default: 2
      semanticFilterConfig:
        type: object
        description: Apply object detection semantic filter on image inpainting area.
        properties:
          enableSemanticFilter:
            type: bool
            default: false
            description: Whether to enable semantic filter feature in image inpainting.
          filterClasses:
            type: array
            description: >
              A list of object class names to apply semantic filter. A class name starts with
              Captialized initial letter, e.g. Person.
            items:
              type: string
          filterEntities:
            type: array
            description: >
              A list of object entity ids to apply semantic filter, e.g. entity id for Person
              is /m/01g317.
            items:
              type: string
          intersectRatioThreshold:
            type: float
            description: >
              A threshold to control semantic fitler strength, the lower the threshold the
              stronger the filter strength.
            minimum: 0
            maximum: 1
          additionalSampleCount:
            type: int32
            description: >
              Additional sample added to the request to improve inpainting success rate.
              However the final number of returned image is the same as sampleCount.
            minimum: 0
            maximum: 4
          semanticFilterMode:
            type: string
            description: >
              A string to specify semantic filter experimental mode. This allows semantic filter
              to change the default behavior to filter generated images.
          detectionScoreThreshold:
            type: float
            description: >
              A detection confidence score threshold to decide which detection boxes
              are considered as the valid detections for semantic filter checking.
            minimum: 0
            maximum: 1
          filterClassesOutpainting:
            type: array
            description: >
              For outpainting case.
              A list of object class names to apply semantic filter. A class name starts with
              Captialized initial letter, e.g. Person.
            items:
              type: string
          filterEntitiesOutpainting:
            type: array
            description: >
              For outpainting case.
              A list of object entity ids to apply semantic filter, e.g. entity id for Person
              is /m/01g317.
            items:
              type: string
          intersectRatioThresholdOutpainting:
            type: float
            description: >
              For outpainting case.
              A threshold to control semantic fitler strength, the lower the threshold the
              stronger the filter strength.
            minimum: 0
            maximum: 1
          detectionScoreThresholdOutpainting:
            type: float
            description: >
              For outpainting case.
              A detection confidence score threshold to decide which detection boxes
              are considered as the valid detections for semantic filter checking.
            minimum: 0
            maximum: 1
          filterClassesSpecialInit:
            type: array
            description: >
              For special_init case.
              A list of object class names to apply semantic filter. A class name starts with
              Captialized initial letter, e.g. Person.
            items:
              type: string
          filterEntitiesSpecialInit:
            type: array
            description: >
              For special_init case.
              A list of object entity ids to apply semantic filter, e.g. entity id for Person
              is /m/01g317.
            items:
              type: string
          intersectRatioThresholdSpecialInit:
            type: float
            description: >
              For special_init case.
              A threshold to control semantic fitler strength, the lower the threshold the
              stronger the filter strength.
            minimum: 0
            maximum: 1
          detectionScoreThresholdSpecialInit:
            type: float
            description: >
              For special_init case.
              A detection confidence score threshold to decide which detection boxes
              are considered as the valid detections for semantic filter checking.
            minimum: 0
            maximum: 1
      experimentUseServoBackend:
        type: bool
        description: >
          NOTE: For experimental use, not production-ready.
          Uses self-hosted Servo model servers for requests instead of SUP.
        default: false
      editMode:
        type: string
        description: The editing mode that describes the use case for editing.
        enum:
        - inpainting-remove
        - inpainting-insert
        - outpainting
        default: inpainting-insert
      alternateInitConfig:
        type: object
        description: Enables an alternate init config to add to return candidates.
        properties:
          enabled:
            type: bool
            default: false
            description: Whether to enable this alternate init config.
          maxInpaintingMaskArea:
            type: float
            default: 0.05
            minimum: 0
            maximum: 1
            description: Only use init config if inpainting mask area / total image area is below this threshold.
      experimentalSrVersion:
        type: string
        description: Experimental flag to adjust version of sr to use. Values subject to change.
        default: ""
      experimentalBaseVersion:
        type: string
        description: Experimental flag to adjust base version. Values subject to change.
        default: ""
      embeddingScale:
        type: float
        default: 0.6
        minimum: 0
        maximum: 1
        description: Controls strength of embedding influence on output image.
  language:
    type: string
    description: Language which the prompts and negative prompts are in.
    enum:
    - auto
    - en
    - ja
    - ko
    - hi
  includeSafetyAttributes:
    type: bool
    description: Whether to include the content safety attributes scores in the response.
  modelVariant:
    type: string
    description: The size variant of the model. Only supported in imagegeneration@004 for now.
    enum:
    - large
    - medium
  addWatermark:
    type: bool
    description: Whether to add SynthID watermark to the generated image.

#no-changelog

@andrewheard andrewheard marked this pull request as ready for review December 3, 2024 22:28
@andrewheard andrewheard requested a review from paulb777 December 3, 2024 22:28
case seed
case negativePrompt
case aspectRatio
case safetyFilterLevel = "safetySetting"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why different names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO safetyFilterLevel better expresses the meaning "Adds a filter level to safety filtering." Also planning to group safetyFilterLevel (safetySetting), personGeneration and includeResponsibleAIFilterReason (includeRaiReason) into a SafetySettings struct in the public API so this will be more distinct. That said, all the naming is still in flux.

@andrewheard andrewheard merged commit 78fe33c into vertex-imagen Dec 3, 2024
50 checks passed
@andrewheard andrewheard deleted the ah/vertex-imagen-params branch December 3, 2024 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants