Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove SASI indices #1328

Merged
merged 12 commits into from
Feb 14, 2019
Merged

Remove SASI indices #1328

merged 12 commits into from
Feb 14, 2019

Conversation

black-adder
Copy link
Contributor

@black-adder black-adder commented Feb 11, 2019

Signed-off-by: Won Jun Jang [email protected]

A lot of Copy Pasta from #793. Resolves #790.

Migration Path:

  1. Run v001tov002part1.sh which will copy dependencies into a csv, update the dependency UDT, create a new dependenciesv2 table, and write dependencies from the csv into the dependenciesv2 table.
  2. Run the collector and query services with the cassandra flag dependency-sasi-disabled=true which will update jaeger to read from the new dependenciesv2 table.
  3. Update spark job to write to the new dependenciesv2 table (this is a TODO).
  4. Run v001tov002part2.sh which will delete the old dependency table and the SASI index.

Users who wish to continue to use the SASI indices don't have to do anything as the cassandra flag dependency-sasi-disabled will default to false. Ideally, users will migrate on their own timeline and we can remove support for the old table in a major version release.

  • Fix crossdock
  • Actually test

query := s.session.Query(depsSelectStmt, endTs.Add(-1*lookback), endTs)
startTs := endTs.Add(-1 * lookback)
var query cassandra.Query
switch s.indexMode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would GetDependencies be simplified by omitting this switch? Instead we might read from SASIDisabled first, and if that fails we can read from SASIEnabled?
When migrating, this would mean that users don't need to redeploy query after moving over the dependency table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of adding the extra roundtrip time. Additionally, if the user has opted in to migrating to the new table, redeploying the query service will not be the most strenuous thing.

Port int `yaml:"port"`
Authenticator Authenticator `yaml:"authenticator"`
DisableAutoDiscovery bool `yaml:"disable_auto_discovery"`
DependencySASIDisabled bool `yaml:"dependency_sasi_disabled"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be simplified to SASI disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd rather it be more explicit but I can simplify

Parent: d.Parent,
Child: d.Child,
CallCount: int64(d.CallCount),
}
if s.indexMode == SASIEnabled {
dep.Source = string(d.Source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assignment need to be conditional? (Also - shouldn't this be set for SASIDisabled?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if you attempt to add the source field to a schema that doesn't support it, c* will error. And good catch, it should be for disabled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - but that doesn't answer my question; the query string for SASIDisabled doesn't include this field anyways

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll test it and let you know

suffixCA = ".tls.ca"
suffixServerName = ".tls.server-name"
suffixVerifyHost = ".tls.verify-host"
suffixDependencySASIDisabled = ".dependency-sasi-disabled"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use sasi-disabled instead of dependency-sasi-disabled?

}

// NewDependencyStore returns a DependencyStore
func NewDependencyStore(
session cassandra.Session,
metricsFactory metrics.Factory,
logger *zap.Logger,
indexMode IndexMode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything preventing someone from passing in an undefined IndexMode, like IndexMode(10) for e.g.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill validate here

@@ -106,7 +106,14 @@ func (f *Factory) CreateSpanWriter() (spanstore.Writer, error) {

// CreateDependencyReader implements storage.Factory
func (f *Factory) CreateDependencyReader() (dependencystore.Reader, error) {
return cDepStore.NewDependencyStore(f.primarySession, f.primaryMetricsFactory, f.logger), nil
var (
sasiDisabled = f.Options.GetPrimary().DependencySASIDisabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following reads simpler:

indexMode := cDepStore.SASIEnabled
if f.Options.GetPrimary().DependencySASIDisabled {
indexMode = cDepStore.SASIDisabled
}

Parent: "goo",
Child: "gle",
Parent: "bi",
Child: "ng",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quit raising entropy

SASIDisabled

depsInsertStmtSASI = "INSERT INTO dependencies(ts, ts_index, dependencies) VALUES (?, ?, ?)"
depsInsertStmt = "INSERT INTO dependenciesv2(ts, date_bucket, dependencies) VALUES (?, ?, ?)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do dependencies_v2


-- compaction strategy is intentionally different as compared to other tables due to the size of dependencies data
CREATE TABLE IF NOT EXISTS jaeger.dependenciesv2 (
date_bucket bigint,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer that we define it as ts_bucket timestamp field. The code can truncate the timestamp to a Day precision, or we can even make it configurable if someone wants finer control (e.g maybe they are going to save deps every minute).

@codecov
Copy link

codecov bot commented Feb 12, 2019

Codecov Report

Merging #1328 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1328   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files         162     163    +1     
  Lines        7326    7367   +41     
======================================
+ Hits         7326    7367   +41
Impacted Files Coverage Δ
plugin/storage/cassandra/dependencystore/model.go 100% <100%> (ø) ⬆️
plugin/storage/cassandra/options.go 100% <100%> (ø) ⬆️
...lugin/storage/cassandra/dependencystore/storage.go 100% <100%> (ø) ⬆️
plugin/storage/cassandra/factory.go 100% <100%> (ø) ⬆️
model/dependencies.go 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e1ee51...18aa185. Read the comment docs.

CHANGELOG.md Outdated
@@ -8,6 +8,17 @@ Changes by Version

##### Breaking Changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing this is technically a non-breaking change. It will become breaking once we start defaulting to _v2

depsInsertStmt = "INSERT INTO dependencies(ts, ts_index, dependencies) VALUES (?, ?, ?)"
depsSelectStmt = "SELECT ts, dependencies FROM dependencies WHERE ts_index >= ? AND ts_index < ?"
// SASIEnabled is used when the dependency table is SASI indexed.
SASIEnabled IndexMode = iota
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it not make sense to simply refer to this as v2 throughout, including constants, variables, the CLI flags? SASI then becomes a side effect that the user doesn't really need to know about.

Parent: d.Parent,
Child: d.Child,
CallCount: int64(d.CallCount),
}
if s.indexMode == SASIDisabled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another argument to refer to this change as "dependencies v2" rather than tying to SASI.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just need to confirm the code works with the old schema

CHANGELOG.md Outdated
Migration Path:

1. Run `plugin/storage/cassandra/schema/migration/v001tov002part1.sh` which will copy dependencies into a csv, update the `dependency UDT`, create a new `dependencies_v2` table, and write dependencies from the csv into the `dependencies_v2` table.
2. Run the collector and query services with the cassandra flag `enable-dependencies-v2=true` which will update jaeger to write and read to and from the new `dependencies_v2` table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cassandra flag enable-dependencies-v2=true

please specify the exact flag name

which will update jaeger to write

which will instruct jaeger to write


1. Run `plugin/storage/cassandra/schema/migration/v001tov002part1.sh` which will copy dependencies into a csv, update the `dependency UDT`, create a new `dependencies_v2` table, and write dependencies from the csv into the `dependencies_v2` table.
2. Run the collector and query services with the cassandra flag `enable-dependencies-v2=true` which will update jaeger to write and read to and from the new `dependencies_v2` table.
3. Update [spark job](https://github.com/jaegertracing/spark-dependencies) to write to the new `dependencies_v2` table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should rather say "update Spark job to version N"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll punt on this until I've actually made the change in spark.

@@ -36,6 +37,8 @@ func (d *Dependency) MarshalUDT(name string, info gocql.TypeInfo) ([]byte, error
return gocql.Marshal(info, d.Child)
case "call_count":
return gocql.Marshal(info, d.CallCount)
case "source":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if you run this code against the schema where UDT was not upgraded? Will gocql simply not invoke this function for "source"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

}

// Sanitize sanitizes the DependencyLink.
func (d DependencyLink) Sanitize() DependencyLink {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApplyDefaults

fi


while IFS="," read ts dependency; do bucket=`date +"%Y%m%d" -d "$ts"`; echo "$bucket,$ts,$dependency"; done < dependencies.csv > dependencies_datebucket.csv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write it in normal multi-line syntax?

echo "About to delete $row_count rows."
confirm

cqlsh -e "DROP INDEX IF EXISTS $keyspace.ts_index"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$keyspace.ts_index

naming 🤦‍♂️

Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>

// IsValid returns true if the Version is a valid one.
func (i Version) IsValid() bool {
return i < end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a less confusing name than 'end', e.g. 'versionEnumEnd'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, Version(-1) is valid according to this function

Signed-off-by: Won Jun Jang <[email protected]>
@black-adder black-adder merged commit 7e51957 into master Feb 14, 2019
@ghost ghost removed the review label Feb 14, 2019
@black-adder black-adder deleted the remove_sasi_indices branch February 14, 2019 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants