Skip to content

Commit

Permalink
Monitor: add OpenCensus learn.
Browse files Browse the repository at this point in the history
  • Loading branch information
JamesBonddu committed Mar 13, 2020
1 parent 6a5dfc7 commit 47153d6
Show file tree
Hide file tree
Showing 18 changed files with 942 additions and 2 deletions.
4 changes: 4 additions & 0 deletions DevOps/airflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# 使用airflow来做etl


https://gtoonstra.github.io/etl-with-airflow/principles.html
12 changes: 12 additions & 0 deletions DevOps/敏捷开发/JIRA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# JIRA


https://www.atlassian.com/zh/agile/tutorials/creating-your-agile-board

https://webeye.atlassian.net/secure/ShowConstantsHelp.jspa?decorator=popup#IssueTypes

https://support.atlassian.com/jira-software-cloud/docs/manage-epics-on-the-roadmap/

## 自动化

https://support.atlassian.com/jira-software-cloud/docs/automate-your-jira-cloud-processes-and-workflows/
10 changes: 10 additions & 0 deletions Q&A/python-q&a.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,13 @@ https://stackoverflow.com/questions/12047847/super-object-not-calling-getattr
## __getattr__
https://www.cnblogs.com/xybaby/p/6280313.html
## 65192 Task was destroyed but it is pending!
https://stackoverflow.com/questions/33505066/python3-asyncio-task-was-destroyed-but-it-is-pending-with-some-specific-condit
https://stackoverflow.com/questions/27402796/multiple-loops-with-asyncio
GCP 9 10 2天数据
120 changes: 120 additions & 0 deletions Q&A/全链路监控.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
## 日志标准提炼


请求日志标准

date, user_id, 登录人,部门,业务类型, 登录人邮箱,请求实际ip,实际serverip,请求方法, 请求url,请求参数, 请求状态码, 错误traceback, 标签用于事件分类


https://mp.weixin.qq.com/s?__biz=MzU0OTE4MzYzMw==&mid=2247488015&idx=2&sn=563ae34159fa560cfcdaee83cfbdb263&chksm=fbb29bf1ccc512e7d5447ffa316358ae8e780fb19f0242ae8b2b77264bd1353a7239a1d830cd&mpshare=1&scene=1&srcid=&sharer_sharetime=1577937494737&sharer_shareid=2a07c120ead5e9a5e0c0fc8b3f05a6dd&key=e941abab93429f934cb25f2a26a2d6f05d55d8c9ceb509ca986152450bffd4677e85a8d338510e479bec763923844e2cfe495e80d3ad7d4861b4f0e344a0167b82baabc013a6ff8407921e6e2c16a89c&ascene=1&uin=MTE5MjQ0MzcwOQ%3D%3D&devicetype=Windows+10&version=62080079&lang=zh_CN&exportkey=A%2BG28iXG5s6DdrKpoSLy%2Fzo%3D&pass_ticket=n3xbx4dv1fXXVF5kGO5UxxxuczfB4o6g7YUqOLOtgRRwmY9p10xNZYgXjc8oCif3

## 全链路监控

OpenTracing


### Dapper

https://bigbully.github.io/Dapper-translation/

https://research.google/pubs/pub36356/

https://imfox.io/2017/10/14/hunter-1/

## 蚂蚁金服 SOFATracer

https://www.lizenghai.com/archives/47135.html

## 阿里鹰眼

https://www.infoq.cn/article/kMPZTgJqs7VJC5vkVCR2


http://jm.taobao.org/about/

## zipkin

https://www.cnblogs.com/zhangs1986/p/8879744.html

https://carey.akhack.com/2018/10/12/django%E5%88%86%E5%B8%83%E5%BC%8F%E9%93%BE%E8%B7%AF%E7%9B%91%E6%8E%A7%E4%B9%8BZipkin/


https://riboseyim.github.io/2018/05/18/DevOps-OpenTracing/

https://www.sofastack.tech/projects/sofa-tracer/traceid-generated-rule/

https://ixyzero.com/blog/archives/4438.html

https://leancloudblog.com/terms/full-link-tracing/


## google Stackdriver Trace

https://cloud.google.com/trace/docs/viewing-details



## 业务全链路智能化监控


https://juejin.im/entry/5ba06ee4f265da0ab87372ee


https://www.sofastack.tech/projects/sofa-tracer/traceid-generated-rule/


### 数据采集标准化

采集日志规范规范
- 基础信息
- 时间
- traceId
- ip
- 应用名
- 调用信息
- 业务码
- 接口
- 文件名称
- 方法名称
- 延时
- 是否成功
- 错误信息
- 错误码
- 类型
- 摘要
- 用户信息
- 设备信息

参考
https://juejin.im/entry/5ba06ee4f265da0ab87372ee


##### traceID

https://twitter.github.io/finagle/docs/com/twitter/finagle/tracing/TraceId.html


## jager

https://pjw.io/articles/2018/05/18/jaeger-tutorial/

## opencensus google

https://opencensus.io/

https://opencensus.io/community/talks/

## jager和opencensus对比

https://medium.com/jaegertracing/jaeger-and-opentelemetry-1846f701d9f2

https://www.infoq.cn/article/pb3QOmLVoBN6IZS8hBZy


## 亚马逊

https://aws.amazon.com/cn/xray/

https://docs.aws.amazon.com/xray-sdk-for-python/latest/reference/

https://www.sofastack.tech/projects/sofa-tracer/traceid-generated-rule/
169 changes: 169 additions & 0 deletions Q&A/告警监控.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
## 告警监控

- 收集
- 存储
- 报警规则
- 报警行为

https://zhuanlan.zhihu.com/p/27382099

https://dbaplus.cn/news-141-2038-1.html


http://fex.baidu.com/blog/2014/05/build-performance-monitor-in-7-days/



https://dbaplus.cn/news-141-2038-1.html


## 告警监控系统设计

https://juejin.im/post/5cf0e202f265da1b8b2b44fa


http://www.yunweipai.com/archives/23202.html

https://juejin.im/entry/5ba06ee4f265da0ab87372ee


## 重要监控系统设计文章



http://download.xuliangwei.com/jiankong.html

https://testerhome.com/topics/19188

https://segmentfault.com/a/1190000015308746


## 设计实现

### 设计目标

- 故障分析定位
应用系统出问题,首要想到的是,通过监控系统从复杂的系统间找到问题,定位问题所在。
- 系统问题预警
该预警一般要求实时性,同时一般会考虑性能与实时性的中和。所起到的目的就是在客户反馈问题前,能做到预警,做的更高级一点的就是能够根据预警做到自动处理,取代以前的人工干预的部分。
- 提供历史趋势分析
这个比较好理解,随着时间历史监控数据的增多,根据历史数据做一些监控预测分析一类的简单工作还是比较容易的,比如:下个月的交易量,下个促销日的并发量,下个版本发布后,能够承受的uv等以及衍生出来的CPU,memory,磁盘使用量等预测信息。
- 提供可视化报表等,辅助分析
通过可视化的仪表盘能够直接获取系统运行状态,资源使用情况,以及服务运行状态等直观信息。

### 数据采集标准化

采集日志规范规范
- 基础信息
- 时间
- traceId
- ip
- 应用名
- 调用信息
- 业务码
- 接口
- 文件名称
- 方法名称
- 方法
- 延时
- 是否成功
- 错误信息
- 错误码
- 类型
- 摘要
- 扩展信息
- 压测
- 业务身份

参考:

https://juejin.im/entry/5ba06ee4f265da0ab87372ee

### 数据采集

Open Census 是一组用于各种语言的库,使您可以收集应用程序指标和分布式跟踪,然后将数据实时传输到您选择的后端。开发人员和管理员可以分析此数据,以了解应用程序的运行状况并调试问题.

如何在我的项目中使用OpenCensus?
我们提供用于Go,Java,C#,Node.js,C ++,Ruby,Erlang / Elixir,Python,Scala和PHP的库。

支持的后端包括Azure Monitor,Datadog,Instana,Jaeger,New Relic,SignalFX,Google Cloud Monitoring + Trace和Zipkin。您还可以添加对其他后端的支持。

#### opencensus 使用

```sh
pipenv install opencensus opencensus-ext-django
```

https://opencensus.io/

https://opentelemetry.io/

https://opencensus.io/quickstart/python/

https://cloud.google.com/monitoring/custom-metrics/open-census?hl=zh-cn

https://docs.microsoft.com/en-us/azure/azure-monitor/app/opencensus-python-request

https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.metricDescriptors

https://stackoverflow.com/tags/opencensus/hot?filter=all


### 数据存储


### 数据分析

Zipkin展示分析


架构

https://zipkin.io/pages/architecture.html


```
sudo docker run -d -p 9411:9411 openzipkin/zipkin
$ sudo docker logs -f 86861199ee03
oo
oooo
oooooo
oooooooo
oooooooooo
oooooooooooo
ooooooo ooooooo
oooooo ooooooo
oooooo ooooooo
oooooo o o oooooo
oooooo oo oo oooooo
ooooooo oooo oooo ooooooo
oooooo ooooo ooooo ooooooo
oooooo oooooo oooooo ooooooo
oooooooo oo oo oooooooo
ooooooooooooo oo oo ooooooooooooo
oooooooooooo oooooooooooo
oooooooo oooooooo
oooo oooo
________ ____ _ _____ _ _
|__ /_ _| _ \| |/ /_ _| \ | |
/ / | || |_) | ' / | || \| |
/ /_ | || __/| . \ | || |\ |
|____|___|_| |_|\_\___|_| \_|
:: version 2.20.1 :: commit 7cbe4d0 ::
2020-03-13 03:44:49.589 INFO 1 --- [ main] z.s.ZipkinServer : Starting ZipkinServer on 86861199ee03 with PID 1 (/zipkin/BOOT-INF/classes started by zipkin in /zipkin)
2020-03-13 03:44:49.594 INFO 1 --- [ main] z.s.ZipkinServer : The following profiles are active: shared
2020-03-13 03:44:50.397 WARN 1 --- [ main] i.m.c.i.b.j.JvmGcMetrics : GC notifications will not be available because com.sun.management.GarbageCollectionNotificationInfo is not present
2020-03-13 03:44:50.486 INFO 1 --- [ main] c.l.a.c.Flags : com.linecorp.armeria.verboseExceptions: rate-limit=10 (default)
2020-03-13 03:44:50.494 INFO 1 --- [ main] c.l.a.c.Flags : com.linecorp.armeria.verboseSocketExceptions: false (default)
2020-03-13 03:44:50.494 INFO 1 --- [ main] c.l.a.c.Flags : com.linecorp.armeria.verboseResponses: false (default)
2020-03-13 03:44:50.522 INFO 1 --- [ main] c.l.a.c.Flags : com.linecorp.armeria.useEpoll: true (default)
2020-03-13 03:44:50.523 INFO 1 --- [ main] c.l.a.c.Flags : com.linecorp.armeria.maxNumConnections: 2147483647 (default)
2020-03-13 03:44:50.523 INFO 1 --- [ main] c.l.a.c.Flags : com.linecorp.armeria.numCommonWorkers: 8 (default)
2020-03-13 03:44:50.524 INFO 1 --- [ main] c.l.a.c.Flags : com.linecorp.armeria.numCommonBlockingTaskThreads: 200 (default)
2020-03-13 03:44:50.525 INFO 1 --- [ main] c.l.a.c.Flags : com.linecorp.armeria.defaultMaxRequestLength: 10485760 (default)
2020-03-13 03:44:50.526 INFO 1 --- [ main] c.l.a.c.Flags : com.linecorp.armeria.defaultMaxResponseLength: 10485760 (default)
```
18 changes: 18 additions & 0 deletions Q&A/普罗米修斯.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# 普罗米修斯


https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/promql/prometheus-promql-best-praticase

https://prometheus.io/docs/prometheus/latest/getting_started/

https://www.qikqiak.com/post/alertmanager-of-prometheus-in-practice/

https://www.infoq.cn/article/ObvBIBflGNgboDBNk4Iy

https://blog.csdn.net/lijiaocn/article/details/81865120

https://aleiwu.com/post/prometheus-bp/

## Alertmanager

https://www.jianshu.com/p/fd0b018539cd
Loading

0 comments on commit 47153d6

Please sign in to comment.