侵蚀是什么意思| 1997年的牛是什么命| 红色配什么颜色| 整夜失眠是什么病| 儿童贫血吃什么补血最快| 短裤搭配什么鞋子| 花椰菜是什么菜| 一什么阳光填量词| 人为什么会死| 1.22是什么星座| 91视频是什么| 头疼去医院挂什么科| 甲状腺4b级是什么意思| 码子是什么意思| 吃一个海参相当于吃了什么| 1898年属什么生肖| 印度为什么用手吃饭| 子不教父之过是什么意思| 加盟店是什么意思| 挠头什么意思| 清关什么意思| 属马女和什么属相最配| 什么是招风耳图片| 切尔西是什么意思| 前胸后背出汗多是什么原因| 为什么一动就满头大汗| 红细胞计数偏低是什么意思| 什么动物没有眼睛| vsop是什么酒| 北面是什么档次的牌子| wrangler是什么牌子| 老打嗝是什么原因| 8月6号什么星座| 精华液是干什么用的| 人总放屁是什么原因| 蕴字五行属什么| 维生素d什么牌子的好| 塞来昔布是什么药| 糖吃多了有什么危害| cini是什么意思| 为什么四川总地震| abcd是什么意思| 海豹是什么动物| 慰安妇是什么意思| 风湿因子高是什么原因引起的| 辣椒为什么会辣| 老公生日送什么礼物好| 狗不理是什么意思| 囊是什么结构| 强迫症吃什么药| 梦见一个人说明什么| 核桃和什么一起打豆浆| 乐字五行属什么| 腺肌症是什么| 头层牛皮除牛反绒是什么意思| 水痘是什么原因引起的| 未羊是什么意思| 孕妇肠胃炎能吃什么药| 保税区是什么意思| 打摆子是什么病| 系带断裂有什么影响吗| 一级警长是什么级别| 脂蛋白a高吃什么能降下来| 尿里有潜血是什么原因| ot什么意思| but什么意思| 乳胶是什么意思| 头晕目眩吃什么药| 半联动是什么意思| 眼袋大用什么方法消除| 水垢是什么| 少阳病是什么意思| 等边三角形又叫什么三角形| 甲状腺双叶结节什么意思| 男性手心热是什么原因| 猪大肠炒什么好吃| 大陆人去香港需要什么证件| 安赛蜜是什么| 结甲是什么病| 身上长瘊子是什么原因| 眼睛发红是什么原因| 小孩包皮挂什么科| 哮喘病有什么症状| 干眼症是什么原因引起的| 大腿后侧疼痛什么原因| resp是什么| 梦见自己掉头发是什么意思| 不自觉摇头是什么病| 谨守是什么意思| 25岁属什么| 健康证查什么| 老人身上痒是什么原因| 大校相当于政府什么官| 症结是什么意思| 鹿的角像什么| eo什么意思| 母仪天下是什么意思| 胃左边疼是什么原因| 有恃无恐什么意思啊| 眼睛不舒服是什么原因| 腋下臭是什么原因| 待我长发及腰时下一句是什么| 过敏性哮喘吃什么药| 不然呢是什么意思| 78年的马是什么命| 六月初六是什么节日| 勾引是什么意思| 梦见自己光脚走路是什么意思| 鼻塞是什么原因| 动物园有什么动物| 为什么会胃痛| 茯苓有什么功效和作用| sorona是什么面料| 抽脂手术对身体有什么副作用| 仔字五行属什么| 风情万种的意思是什么| 心肌酶高有什么危害| 头顶痛吃什么药| 吃了避孕药后几天出血是什么原因| 什么动物吃蛇| 神迹是什么意思| 吃黑米有什么好处和坏处| mds是什么意思| 眼镜轴位是什么| 怀孕喝酒会有什么反应| 什么人容易得精神病| 慢性盆腔炎吃什么药效果好| 什么叫认知能力| 染色体异常是什么原因导致的| 去医院检查是否怀孕挂什么科| 蓟是什么意思| 过期啤酒有什么用途| 阁楼是什么意思| 排卵期过后是什么期| 什么牌子的风扇好| 腰封是什么意思| 喉咙里痰多是什么原因| 安徽菜属于什么菜系| 验血挂什么科| 喝什么茶去火排毒祛痘| 牙齿疼是什么原因引起的| 陌上是什么意思| 什么是润年| 经期同房需要注意什么| 天干地支是什么意思| 谐星是什么意思| 呼吸不畅是什么原因| 尿频尿量少是什么原因| 贤者模式是什么意思| 餐巾纸属于什么垃圾| 普瑞巴林胶囊治什么病| 1998年属什么生肖| 甘甜是什么意思| 胆囊结石吃什么食物好| 小代表什么生肖| 上颌窦炎是什么症状| 阴性和阳性是什么意思| 甲状腺结节什么引起的| 囟门是什么| 城市的夜晚霓虹灯璀璨是什么歌| 712什么星座| 什么雷声| smz是什么药| 中华文化的精髓是什么| 梦见男朋友是什么意思| 虫洞是什么| 早上吃鸡蛋有什么好处| 剖腹产吃什么下奶快| 属牛幸运色是什么颜色| 四菜一汤是什么意思| cm和mm有什么区别| 什么叫自私的人| 长脸适合什么刘海| 卵圆孔未闭挂什么科| 孕妇做唐筛是检查什么| 红糖荷包蛋有什么功效| 什么降血糖| 7月4号是什么节日| 结婚23年是什么婚| 什么可以补气血| 迈巴赫是什么车| 紧张性头痛吃什么药| 电商属于什么行业| 陕西八大怪是什么| 恶趣味什么意思| 什么都不放的冬瓜清汤| 什么药治牙疼最快| 糖耐量受损是什么意思| 吃什么补肾最快最有效| 大姨妈来了吃什么| 第三产业是什么| 奥斯卡是什么意思| 史诗级什么意思| 猫鼬是什么动物| 尼哥是什么意思| 番薯是什么时候传入中国的| 周期是什么| 百香果不能和什么一起吃| 吃什么降血压最快最好方法| 什么呢| c2可以开什么车| exp是什么日期| 补脾吃什么食物最好| 阴茎插入阴道什么感觉| 儿保做些什么检查项目| 阴道炎用什么洗| 为什么不能随便看手相| 益气是什么意思| 肌酐高吃什么药好| 鹅蛋脸适合什么刘海| nf是什么意思| 同样的药为什么价格相差很多| 宫颈囊肿是什么原因| hpv病毒是什么原因引起的| 西安有什么好吃的| 乙肝病毒表面抗体弱阳性什么意思| 主胰管不扩张是什么意思| adp是什么意思| 食管反流什么症状| 鼓上蚤是什么意思| 阳是什么意思| 三叉戟是什么车| 人参果吃了有什么好处| 农历7月28日是什么星座| 拔萝卜什么意思| 肌肉拉伤挂什么科| 龙吃什么食物| 来月经量少吃什么可以增加月经量| 颜控什么意思| 年上年下是什么意思| 女性分泌物增多发黄是什么原因| 你好是什么意思| 缺钾是什么原因造成的| 1976年五行属什么| 手表五行属什么| 百香果的籽有什么功效| 色令智昏是什么意思| 2003属什么| 高血压中医叫什么病| 乙肝表面抗原高是什么意思| 酸菜鱼放什么配菜好吃| 根茎叶属于什么器官| esd是什么意思| pco是什么意思| 高铁上什么东西不能带| 日加西念什么| 哺乳期感冒能吃什么药| 太虚是什么意思| 戊肝抗体igg阳性是什么意思| 诸法无我是什么意思| bp是什么意思医学上面| 2月27号是什么星座| 什么是不饱和脂肪酸| 裂纹舌是什么原因引起的| 五行木是什么颜色| 肩宽适合穿什么样的衣服| 男性夜间盗汗是什么原因| 肝内高回声结节是什么意思| 减肥吃什么水果好| 彩超和ct有什么区别| 大熊猫生活在什么地方| 猫砂是干什么用的| 今期难过美人关是什么生肖| 肆无忌惮是什么意思| 百度Jump to content

《中国制造2025》出台 明确制造强国路线图

From Wikitech
百度   李和风对贯彻落实全国统战部长会议精神,做好各民主党派、侨联和留学人员联谊会工作提出三点建议。

Cirrus Streaming Updater

The Cirrus Streaming Updater (SUP) updates the elasticsearch indexes for each and every mediawiki edit. The updater consists of two applications, a producer and a consumer. One producer runs per datacenter reading the events from various topics and generating a unified stream of updates that need to be applied to the search clusters. One consumer runs per cluster group (eqiad, codfw, cloudelastic) to write to. The producer only reads events from it's local datacenter. The consumer reads events from all producers in all datacenters.

The chain of events between a user clicking the 'save page' button and elasticsearch being updated is roughly as follows:

  • MW core approves of the edit and generates an event in the mediawiki.page-change stream.
  • The event is read by the streaming updater producer in the same datacenter, aggregated with related events, and results in a cirrussearch.update_pipeline.update event.
  • The consumer recieves the event, fetches the content of the update from the mediawiki api, batches together many updates, and performs a bulk write request to the appropriate elasticsearch cluster (each consumer writes to a cluster group of three elasticsearch clusters).

In addition to page change events, a significant source of updates are mediawiki.cirrussearch.page_rerender events. These events represent changes that did not change the revision_id, but likely changed the rendered result of the page (ex: propagated template updates). While these are the higher volume inputs, the updater reads one more set of topics related to the CirrusSearch Weighted Tags functionality. These come in over a variety of streams and are generally metadata generated by async processes such as ML models or batch data processing.

There is a DPE Deep Dive from 2025-08-07 "Search Update Pipeline - Feeding search with Flink stream processing" (Recording)

Backfilling/Reindexing

The streaming updater can also backfill existing indices for periods of time where updates, for whatever reason, were not written to the elasticsearch cluster. The backfill uses a custom helm release, which runs a second copy of the standard consumer with specific constraints on kafka offsets and wikis to process. See the "Backfill Batch" section of the cirrus-streaming-updater README.

In-place reindexing is similar, but uses different arguments. There is the top level entrypoint (python -m cirrus_reindexer) for backfilling, and then another one (python -m cirrus_reindexer.reindex_all) . See the Cirrus Reindex Orchestrator repo for more details.

San(e)itizing

San(e)itizing is a process to keep the CirrusSearch indices sane. Its primary purpose is to make sure pages that are out of date or missing in the search index will be (re-)indexed.

This process has a secondary purpose of ensuring all indexed pages have been rendered from wikitext within the last few months. It accomplishes this by indexing every n-th page it visits in such a way that after n loops over the dataset all pages will have been re-indexed.

This loop algorithm has been ported to the SUP consumer application as an additional, optional source of cirrussearch.update_pipeline.update events. It lives besides the regular kafka source, but only produces events locally at a constant, low rate.

Troubleshooting the Streaming Updater

This script outputs the kubernetes logs based on datacenter, helmfile release, and kubernetes environment.

Warning: Obsolete Documentation

This update process described below was replaced in early 2024. Nothing in this section is actively used anymore, but we keep the documentation around for historical purposes.

Realtime updates

The CirrusSearch extension updates the elasticsearch indexes for each and every mediawiki edit. The chain of events between a user clicking the 'save page' button and elasticsearch being updated is roughly as follows:

  • MW core approves of the edit and inserts the LinksUpdate object into DeferredUpdates
  • DeferredUpdates runs the LinksUpdate in the web request process, but after closing the connection to the user (so no extra delays).
  • When LinksUpdate completes it runs a LinksUpdateComplete hook which CirrusSearch listens for. In response to this hook CirrusSearch inserts CirrusSearch\Job\LinksUpdate for this page into the job queue (backed by Kafka in wmf prod).
  • The CirrusSearch\Job\LinksUpdate job runs CirrusSearch\Updater::updateFromTitle() to re-build the document that represents this page in elasticsearch. For each wikilink that was added or removed this inserts CirrusSearch\Job\IncomingLinkCount to the job queue.
  • The CirrusSearch\Job\IncomingLinkCount job runs CirrusSearch\Updater::updateLinkedArticles() for the title that was added or removed.

Other processes that write to elasticsearch (such as page deletion) are similar. All writes to elasticsearch are funneled through the CirrusSearch\Updater class, but this class does not directly perform writes to the elasticsearch database. This class performs all the necessary calculations and then creates the CirrusSearch\Job\ElasticaWrite job to actually make the request to elasticsearch. When the job is run it creates CirrusSearch\DataSender which transforms the job parameters into the full request and issues it. This is done so that any updates that fail (network errors, cluster maintenance, etc) can be re-inserted into the job queue and executed at a later time without having to re-do all the heavy calculations of what actually needs to change.

Batch updates from the database

CirrusSearch indices can also be populated from the database to bootstrap brand new clusters or to backfill existing indices for periods of time where updates, for whatever reason, were not written to the elasticsearch cluster. These updates are performed with the forceSearchIndex.php maintenance script, the usage of which is described in multiple parts of the #Administration section.

Batch updates use a custom job type, the CirrusSearch\Job\MassIndex job. The main script iterates the entire page table and inserts jobs in batches of 10 titles. The MassIndex job kicks off CirrusSearch\Updater::updateFromPages() to perform the actual updates. This is the same process as CirrusSearch\Updater::updateFromTitle, updateFromTitle simply does a couple extra checks around redirect handling that is unnecessary here.

Scheduled batch updates from analytics network

Jobs are scheduled in the WMF analytics network by the search platform airflow instance to collect together various information collected there and ship it back to elasticsearch. The airflow jobs build one or more files per wiki containing elasticsearch bulk update statements, uploads them to swift, and sends a message over kafka indicating availability of new information to import. The mjolnir-msearch-daemon running on search-loader instances in the production network recieve the kafka messages, download the bulk updates from swift, and pipe them into the appropriate elasticsearch clusters. This includes information such as page popularity and ml predictions from various wmf projects (link recommendation, ores, more in the future).

Saneitizer (background repair process)

The saneitizer is a process to keep the CirrusSearch indices sane. It's primary purpose is to compare the revision_id held in cirrussearch and the primary wiki databases, to verify that cirrus pages are properly being updated. Pages that have a mismatched revision_id in cirrussearch and sent to the indexing pipeline to be reindexed.

The saneitizer has a secondary purpose of ensuring all indexed pages have been rendered from wikitext within the last few months. It accomplishes this by indexing every n'th page it visits is such a way that after n loops over the dataset all pages will have been re-indexed.

TODO fill in info on the saneitizer (leaving as stub for now)

Autocomplete indices

Autocomplete indices build daily via a systemd timer on mwmaint servers. Specifically, mediawiki_job_cirrus_build_completion_indices_eqiad.timer calls mediawiki_job_cirrus_build_completion_indices_eqiad.service which runs a bash script cirrus_build_completion_indices.sh , which in turn calls the CirrusSearch maintenance script UpdateSuggesterIndex.php .

Job queue

CirrusSearch uses the mediawiki job queue for all operations that write to the indices. The jobs can be roughly split into a few groups, as follows:

Primary update jobs

These are triggered by the actions of either users or adminstrators.

  • DeletePages - Removes titles from the search index when they have been deleted.
  • LinksUpdate - Updates a page after it has been edited.
  • MassIndex - Used by the forceSearchIndex.php maintenance script to distribute indexing load across the job runners.

Secondary update jobs

These are triggered by primary update jobs to update pages or indices other than the main document.

  • OtherIndex - Updates the commonswiki index with information about file uploads to all wikis to prevent showing the user duplicate uploads.
  • IncomingLinkCount - Triggers against the linked page when a link is added or removed from a page. Updates the list of links coming into a page from other pages on the same wiki. This is an expensive operation, and the live updates are disabled in wmf. Instead the incoming links counts are calculated in a batch by the incoming_links_weekly dag on the search-platform airflow instance and shipped as a batch update from the analytics network.

Backend write jobs

These are triggered by primary and secondary update jobs to represent an individual write request to the cluster. One job is inserted for every cluster to write to. In the backend the jobqueue is configured to partition the jobs by cluster into a separate queues. This partitioning ensures slowdowns indexing to one cluster do not cause similar slowdowns in the remaining clusters.

  • ElasticaWrite
蓝莓什么味道 拜有利主要是治疗什么 乳酸菌是什么 拱是什么意思 小白龙叫什么
梦到头发白了是什么意思 梦见玫瑰花是什么预兆 d3是什么 为什么8到10周容易胎停 老是口腔溃疡是什么原因
胆小如鼠的意思是什么 镇长什么级别 总口渴是什么原因 7月17日是什么星座 手指代表什么生肖
眼睛过敏用什么眼药水 跳蚤长什么样子 偏旁和部首有什么区别 黄芪什么季节喝最好 电波系是什么意思
鼻子流黄水是什么原因1949doufunao.com 办理港澳通行证需要什么证件hcv8jop5ns3r.cn 屁眼痒是什么原因0735v.com 什么是脱敏hcv7jop9ns1r.cn 亚撒西什么意思hcv8jop0ns5r.cn
三次元是什么hcv8jop0ns3r.cn 荷叶是什么的什么0297y7.com 医生为什么看瞳孔知道没救了hcv9jop2ns7r.cn 黑豆熟地水功效是什么hcv9jop1ns7r.cn 什么是腺样体面容hcv9jop4ns3r.cn
感冒了吃什么水果比较好inbungee.com 限期使用日期是什么意思wmyky.com 乙肝五项145阳性是什么意思hcv8jop4ns3r.cn 常州为什么叫龙城hcv7jop6ns3r.cn 什么叫知己hcv8jop6ns4r.cn
肚子疼腹泻是什么原因weuuu.com 两个日是什么字hcv8jop6ns3r.cn 隐翅虫是什么hcv7jop7ns2r.cn 叶酸补什么hcv7jop6ns6r.cn 主治医生是什么级别hcv7jop5ns4r.cn
百度