Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackStorm v3.3.0 Pre-release Testing #52

Closed
28 of 35 tasks
nmaludy opened this issue Oct 9, 2020 · 24 comments
Closed
28 of 35 tasks

StackStorm v3.3.0 Pre-release Testing #52

nmaludy opened this issue Oct 9, 2020 · 24 comments

Comments

@nmaludy
Copy link
Member

nmaludy commented Oct 9, 2020

We're ready to prepare the StackStorm v3.3 release and starting pre-release testing..

Release Process Preparation

Per Release Management Schedule @nmaludy is the Release Manager and @blag Assisting for v3.3. They will freeze the master for the major repositories in StackStorm org, follow the StackStorm Release Process which is now available to public, accompanied by the Useful Info for Release managers. Communication is happening in #releasemgmt and #development Slack channels.
The first step is pre-release manual user-acceptance testing for v3.3dev.

Why Manual testing?

StackStorm is very serious about testing and has a lot of it: Unit tests, Integration, Deployment/Integrity checks, Smoke tests and eventually end-2-end tests when automation spins up new AWS instance for each OS/flavor we support, installs real st2 like user would and runs set of st2tests (for each st2 PR, nightly, periodically, during release).

See st2ci and st2cd for more examples and workflows about how StackStorm automation is used to test StackStorm (dogfooding).

That's a perfect way to verify what we already know and codify expectations about how StackStorm should function.

However it's not enough.
There are always new unknowns to discover, edge cases to experience and tests to add. Hence, manual Exploratory Testing is an exercise where entire team gathers together and starts trying (or breaking) new features before the new release. Because we're all different, perceive software differently and try different things we might find new bugs, improper design, oversights, edge cases and more.

This is how StackStorm previously managed to land less major/critical bugs into production.

TL;DR

Install StackStorm v3.3dev unstable packages, try random things in random environments (different OS) and report any regressions found comparing to v3.2:

curl -sSL https://stackstorm.com/packages/install.sh | bash -s -- --user=st2admin --password=Ch@ngeMe --unstable

Extra points for PR hotfixes and adding new or missing test cases.

Major changes

  • Mistral removal
    • Ensure Mistral and Postgres are no longer installed (st2mistral, mistral, postgres, etc)
  • Removal of EL6 support
  • Deprecation of Python 2
    • Ensure proper warnings are shown
  • MongoDB 4.0
    • Ensure MongoDB 4.0 is installed with latest version
    • Ensure no issues or performance regressions are observed
  • Remove authentication headers St2-Api-Key, X-Auth-Token and Cookie from webhook payloads to prevent them from being stored in the database. (security bug fix) #4983
    • Write a webhook rule and send it these headers to the webhook, ensure that the trigger doesn't contain these header values
  • Removal of HipChat support in st2chatops
  • st2web dependencies have been upgraded for security reasons, need to do a thorough walk-through to ensure nothing is broken
  • Depreacation of chef-stackstorm
  • st2-docker revamp based on st2-dockerfiles

Full Changelog

Changes which are recommended to ack, explore, check and try in a random way.

st2

Added

  • Add make command to autogen JSON schema from the models of action, rule, etc. Add check
    to ensure update to the models require schema to be regenerated. (new feature)

  • Improved st2sensor service logging message when a sensor will not be loaded when assigned to a
    different partition (@punkrokk) #4991

  • Add support for a configurable connect timeout for SSH connections as requested in #4715
    by adding the new configuration parameter ssh_connect_timeout to the ssh_runner
    group in st2.conf. (new feature) #4914

    This option was requested by Harry Lee (@tclh123) and contributed by Marcel Weinberg (@winem).

  • Added a FAQ for the default user/pass for the tools/launch_dev.sh script and print out the
    default pass to screen when the script completes. (improvement) #5013

    Contributed by @punkrokk

  • Added deprecation warning if attempt to install or download a pack that only supports
    Python 2. (new feature) #5037

    Contributed by @amanda11

  • Added deprecation warning to each StackStorm service log, if service is running with
    Python 2. (new feature) #5043

    Contributed by @amanda11

  • Added deprecation warning to st2ctl, if st2 python version is Python 2. (new feature) #5044

    Contributed by @amanda11

Changed

  • Switch to MongoDB 4.0 as the default version starting with all supported OS's in st2
    v3.3.0 (improvement) #4972

    Contributed by @punkrokk

  • Added an enhancement where ST2api.log no longer reports the entire traceback when trying to get a datastore value
    that does not exist. It now reports a simplified log for cleaner reading. Addresses and Fixes #4979. (improvement) #4981

    Contributed by Justin Sostre (@saucetray)

  • The built-in st2.action.file_writen trigger has been renamed to st2.action.file_written
    to fix the typo (bug fix) #4992

  • Renamed reference to the RBAC backend/plugin from enterprise to default. Updated st2api
    validation to use the new value when checking RBAC configuration. Removed other references to
    enterprise for RBAC related contents. (improvement)

  • Remove authentication headers St2-Api-Key, X-Auth-Token and Cookie from webhook payloads to
    prevent them from being stored in the database. (security bug fix) #4983

    Contributed by @potato and @knagy

Fixed

  • Fixed a bug where type attribute was missing for netstat action in linux pack. Fixes #4946

    Reported by @scguoi and contributed by Sheshagiri (@Sheshagiri)

  • Fixed a bug where persisting Orquesta to the MongoDB database returned an error
    message: key 'myvar.with.period' must not contain '.'. This happened anytime an
    input, output, publish or context var contained a key with a . within
    the name (such as with hostnames and IP addresses). This was a regression introduced by
    trying to improve performance. Fixing this bug means we are sacrificing performance of
    serialization/deserialization in favor of correctness for persisting workflows and
    their state to the MongoDB database. (bug fix) #4932

    Contributed by Nick Maludy (@nmaludy Encore Technologies)

  • Fix a bug where passing an empty list to a with items task in a subworkflow causes
    the parent workflow to be stuck in running status. (bug fix) #4954

  • Fixed a bug in the example nginx HA template declared headers twice (bug fix) #4966
    Contributed by @punkrokk

  • Fixed a bug in the paramiko_ssh runner where SSH sockets were not getting cleaned
    up correctly, specifically when specifying a bastion host / jump box. (bug fix) #4973

    Contributed by Nick Maludy (@nmaludy Encore Technologies)

  • Fixed a bytes/string encoding bug in the linux.dig action so it should work on Python 3
    (bug fix) #4993

  • Fixed a bug where a python3 sensor using ssl needs to be monkey patched earlier. See also #4832, #4975 and SSLContext infinite recursion in Python 3.6 gevent/gevent#1016 (bug fix) #4976

    Contributed by @punkrokk

  • Fixed bug where action information in RuleDB object was not being parsed properly
    because mongoengine EmbeddedDocument objects were added to JSON_UNFRIENDLY_TYPES and skipped.
    Removed this and added if to use to_json method so that mongoengine EmbeddedDocument
    are parsed properly.

    Contributed by Bradley Bishop (@bishopbm1 Encore Technologies)

  • Fix a regression when updated dnspython pip dependency resulted in
    st2 services unable to connect to mongodb remote host (bug fix) #4997

  • Fixed a regression in the linux.dig action on Python 3. (bug fix) #4993

    Contributed by @blag

  • Fixed a bug in pack installation logging code where unicode strings were not being
    interpolated properly. (bug fix)

    Contributed by @misterpah

  • Fixed a compatibility issue with the latest version of the logging library API
    where the find_caller() function introduced some new variables. (bug fix) #4923

    Contributed by @Dahfizz9897

Removed

  • Removed Mistral workflow engine (deprecation) #5011

    Contributed by Amanda McGuinness (@amanda11 Ammeon Solutions)

  • Removed CentOS 6/RHEL 6 support #4984

    Contributed by Amanda McGuinness (@amanda11 Ammeon Solutions)

  • Removed our fork of codecov-python for CI and have switched back to the upstream version (improvement) #5002

orquesta

Fixed

  • Warn users when there is a loop and no start task identified. (bug fix)
  • Lock global variables during initialization to make them thread safe. (bug fix)
  • Workflow stuck in running if one or more items failed in a with items task. (bug fix)

st2chatops

st2web

Conclusion

Please report findings here and bugs/regressions in respective repositories.
Depending on severity and importance bugs might be fixed before the release or postponed to the next release if they're very minor and not a release blocker.

Issues Found During Release

PRs Merged for Release

TODOs

  • Blog post for orquestaconvert
  • Blog post for release
  • Blog post for exchange/community update
@nmaludy nmaludy assigned nmaludy, amanda11 and blag and unassigned nmaludy and amanda11 Oct 9, 2020
@amanda11
Copy link

amanda11 commented Oct 9, 2020

Raised StackStorm/st2-packages#665 for failing to install previous stable version with one-line installer and -v (--version works fine though). I don't think it's a blocker though, but fix in PR.

@nmaludy
Copy link
Member Author

nmaludy commented Oct 9, 2020

missing changelog in st2chatops: StackStorm/st2chatops#158
also st2web's changelog appears to be totally unused

@amanda11
Copy link

amanda11 commented Oct 9, 2020

@nmaludy Some of those st2 web PRs aren't PRs merging into master,but merging into a feature branch. e.g. StackStorm/st2web#807 - they are the merges into the workflow composer feature branch.
Not all of them but quite a few...

@amanda11
Copy link

amanda11 commented Oct 9, 2020

StackStorm/st2-packages#666 raised for fact stable tries to intsall 3.2 with 3.3 scripts..

@nmaludy
Copy link
Member Author

nmaludy commented Oct 9, 2020

Need to ensure that people migrating form MongoDB 3.4 (previously supported version) follow the upgrade path:

3.4 -> 3.6 -> 4.0

https://docs.mongodb.com/manual/release-notes/3.6-upgrade-standalone/
https://docs.mongodb.com/manual/release-notes/4.0-upgrade-standalone/

submitted issue: StackStorm/st2docs#1026

@amanda11
Copy link

amanda11 commented Oct 9, 2020

Testing so far on CentOS8 after a bash single line install good. No problems found with UI.
Planning to do an upgrade from CentOS 7 3.2.0 bash single line install -> 3.3dev

@nmaludy
Copy link
Member Author

nmaludy commented Oct 9, 2020

I did a testing of CentOS 7 from 3.2.0 -> 3.3.0 and no issues there (puppet-st2 managed).

@amanda11
Copy link

CentOS 8 ansible install all good.

@amanda11
Copy link

CentOS 7 ansible looking good too.

@amanda11
Copy link

Ubuntu 16.04 ansible install looked good. I found a problem on the web-ui with responding to inquiries and not selecting a field. But have also verified that it is a legacy problem and exists on my 3.2.0 install (StackStorm/st2web#809)

@amanda11
Copy link

Also verified the dig fix on CentOS 8. Reproduced failure on 3.2.0, and passed on 3.3.0dev (after installing bind-utils which wasn't installed as a dependency...). Not sure if we want to install that by default, but had to install manually.

@nmaludy
Copy link
Member Author

nmaludy commented Oct 13, 2020

Vagrant commands to test various installs (used on my RHEL box so ignore --provider=libvirt if you want to use VirtualBox):

BOX=centos/7 RELEASE=unstable VERSION=3.3dev vagrant up --provider=libvirt
BOX=centos/8 RELEASE=unstable VERSION=3.3dev vagrant up --provider=libvirt
BOX=generic/ubuntu1604 RELEASE=unstable VERSION=3.3dev vagrant up --provider=libvirt
BOX=generic/ubuntu1804 RELEASE=unstable VERSION=3.3dev vagrant up --provider=libvirt

Testing the boxes after they're up:

$ vagrant ssh
[vagrant@st2vagrant ~]$ sudo su -
[root@st2vagrant ~]# sudo ST2_AUTH_TOKEN=$(st2 auth st2admin -p 'Ch@ngeMe' -t) /opt/stackstorm/st2/bin/st2-self-check

@amanda11
Copy link

@nmaludy Did you test on ubuntu 20.04?
I thought we had problems with it's newer python 3? (supported ubuntus should be 1604 and 1804 I believe for 3.3).

@nmaludy
Copy link
Member Author

nmaludy commented Oct 13, 2020

@amanda11 no i haven't yet, sorry copy/paste fail, i'll change that to 1604

@nmaludy
Copy link
Member Author

nmaludy commented Oct 13, 2020

Self-check passed on all the various boxes!

SELF CHECK SUCCEEDED!
st2-self-check succeeded.

#############################################################
###################################################   #######
###############################################   /~\   #####
############################################   _- `~~~', ####
##########################################  _-~       )  ####
#######################################  _-~          |  ####
####################################  _-~            ;  #####
##########################  __---___-~              |   #####
#######################   _~   ,,                  ;  `,,  ##
#####################  _-~    ;'                  |  ,'  ; ##
###################  _~      '                    `~'   ; ###
############   __---;                                 ,' ####
########   __~~  ___                                ,' ######
#####  _-~~   -~~ _                               ,' ########
##### `-_         _                              ; ##########
#######  ~~----~~~   ;                          ; ###########
#########  /          ;                        ; ############
#######  /             ;                      ; #############
#####  /                `                    ; ##############
###  /                                      ; ###############
#                                            ################

@amanda11
Copy link

Ansible install on Ubuntu 18.04 successful, and quick run-through on UI good.

@amanda11
Copy link

Also did a quick chatops test with slack yesterday on one of the platforms (now forgot which O/S!)

@arm4b
Copy link
Member

arm4b commented Oct 13, 2020

Did a few manual tests and picked up a couple platforms to try. Here is the report:

@nmaludy
Copy link
Member Author

nmaludy commented Oct 13, 2020

@armab st2docs PR for the comments above is started here: StackStorm/st2docs#1028

@arms11
Copy link

arms11 commented Oct 13, 2020

Just deployed using st2-docker and able to verify few of the items already like no St2-Api-Key in the header getting logged! Will definitely consider socializing this to other developers as their templated dev environment before the custom pack is released for the K8s Deployment in the prod. Am planning to get the latest images for stand alone K8s envt to see how this goes. Production cluster has been running on 3.3dev for at least 2 months now and so far so good! Appreciate all the efforts!

@nmaludy
Copy link
Member Author

nmaludy commented Oct 14, 2020

@punkrokk found the issue StackStorm/st2#5057 and i have implemented PRs to fix this:

StackStorm/st2#5058 (into master)
StackStorm/st2#5059 (cherry pick for v3.3)

@m4dcoder
Copy link

I tested CentOS8. Looks good.

@nmaludy
Copy link
Member Author

nmaludy commented Oct 20, 2020

Found a bug in st2ci when trying to run the e2e upgrade test for el8. We need to pass -y in order to import the GPG key for the StackStorm/staging-stable repo. PR is here: StackStorm/st2ci#191

@nmaludy
Copy link
Member Author

nmaludy commented Oct 21, 2020

Found an issue in st2-self-check where it was invoking actions and reporting back "OK" status, but in fact the action failed:

Specifically the action that failed was Attempting Test tests.test_timer_rule...OK! (44s)

The action failed because on the vagrant box for libvirt: generic/ubuntu1604 and generic/1804 the timezone is not set on the box causing the st2timersengine service to not start. Easy fix, simply set the timezone on the vagrant box and restart st2timersengine (this is not a problem on the virtualbox Ubuntu image).

However, the st2-self-check reported success (as seen above), when you check the WebUI the action in fact failed.

TODO: write up an issue for this, investigate and fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants