Lightning-fast unified analytics engine

闪电般的统一分析引擎

Latest News
最新消息

Archive

存档

Current Committers

当前的提交者

Name Organization
Sameer Agarwal Facebook
Michael Armbrust Databricks
Joseph Bradley Databricks
Matthew Cheah Palantir
Felix Cheung Uber
Mosharaf Chowdhury University of Michigan, Ann Arbor
Bryan Cutler IBM
Jason Dai Intel
Tathagata Das Databricks
Ankur Dave UC Berkeley
Aaron Davidson Databricks
Thomas Dudziak Facebook
Erik Erlandson Red Hat
Robert Evans Oath
Wenchen Fan Databricks
Joseph Gonzalez UC Berkeley
Thomas Graves Oath
Stephen Haberman LinkedIn
Mark Hamstra ClearStory Data
Seth Hendrickson Cloudera
Herman van Hovell Databricks
Yin Huai Databricks
Shane Huang Intel
Holden Karau Google
Cody Koeninger Nexstar Digital
Andy Konwinski Databricks
Hyukjin Kwon Hortonworks
Ryan LeCompte Quantifind
Haoyuan Li Alluxio
Xiao Li Databricks
Davies Liu Juicedata
Cheng Lian Databricks
Yanbo Liang Hortonworks
Sean McNamara Oracle
Xiangrui Meng Databricks
Mridul Muralidharam Hortonworks
Andrew Or Princeton University
Kay Ousterhout LightStep
Sean Owen unaffiliated
Tejas Patil Facebook
Nick Pentreath IBM
Anirudh Ramanathan Google
Imran Rashid Cloudera
Charles Reiss University of Virginia
Josh Rosen Databricks
Sandy Ryza Remix
Kousuke Saruta NTT Data
Saisai Shao Hortonworks
Prashant Sharma IBM
Ram Sriharsha Databricks
DB Tsai Apple
Takuya Ueshin Databricks
Marcelo Vanzin Cloudera
Shivaram Venkataraman University of Wisconsin, Madison
Zhenhua Wang Huawei
Patrick Wendell Databricks
Andrew Xia Alibaba
Reynold Xin Databricks
Burak Yavuz Databricks
Matei Zaharia Databricks, Stanford
Shixiong Zhu Databricks

Becoming a Committer

成为一个提交者

To get started contributing to Spark, learn how to contribute – anyone can submit patches, documentation and examples to the project.

要开始为Spark做出贡献,学习如何贡献——任何人都可以提交补丁、文档和示例到项目中。

The PMC regularly adds new committers from the active contributors, based on their contributions to Spark. The qualifications for new committers include:

PMC根据他们对Spark的贡献,定期向积极贡献者添加新的提交者。新提交人的资格包括:

  1. Sustained contributions to Spark: Committers should have a history of major contributions to Spark. An ideal committer will have contributed broadly throughout the project, and have contributed at least one major component where they have taken an “ownership” role. An ownership role means that existing contributors feel that they should run patches for this component by this person.
  2. 对Spark的持续贡献:提交者应该有对Spark做出重大贡献的历史。一个理想的提交者将在整个项目中做出广泛的贡献,并且至少贡献了一个重要的组成部分,他们在其中扮演了“所有权”的角色。所有权角色意味着现有的贡献者认为他们应该为这个组件运行补丁。
  3. Quality of contributions: Committers more than any other community member should submit simple, well-tested, and well-designed patches. In addition, they should show sufficient expertise to be able to review patches, including making sure they fit within Spark’s engineering practices (testability, documentation, API stability, code style, etc). The committership is collectively responsible for the software quality and maintainability of Spark.
  4. 贡献的质量:提交者比任何其他社区成员都要提交简单的、经过良好测试的、精心设计的补丁。此外,它们应该显示出足够的专业知识来审查补丁,包括确保它们符合Spark的工程实践(可测试性、文档、API稳定性、代码风格等)。提交人对Spark的软件质量和可维护性共同负责。
  5. Community involvement: Committers should have a constructive and friendly attitude in all community interactions. They should also be active on the dev and user list and help mentor newer contributors and users. In design discussions, committers should maintain a professional and diplomatic approach, even in the face of disagreement.
  6. 社区参与:提交者在所有社区互动中都应具有建设性和友好的态度。他们还应该积极参与开发和用户列表,并帮助指导新的贡献者和用户。在设计讨论中,即使面对不同意见,提交人也应保持专业和外交的态度。

The type and level of contributions considered may vary by project area – for example, we greatly encourage contributors who want to work on mainly the documentation, or mainly on platform support for specific OSes, storage systems, etc.

考虑到的贡献类型和水平可能因项目领域而异——例如,我们极大地鼓励了那些希望主要从事文档工作的贡献者,或者主要是针对特定的OSes、存储系统等的平台支持。

Review Process

评审过程

All contributions should be reviewed before merging as described in Contributing to Spark. In particular, if you are working on an area of the codebase you are unfamiliar with, look at the Git history for that code to see who reviewed patches before. You can do this using git log --format=full <filename>, by examining the “Commit” field to see who committed each patch.

所有的贡献都应该在合并之前进行评审,以促成Spark。特别地,如果您正在开发一个您不熟悉的代码库领域,那么查看一下该代码的Git历史,看看谁曾经查看过补丁。您可以使用git日志-format=full ,通过检查“提交”字段来查看谁提交了每个补丁。

How to Merge a Pull Request

如何合并拉请求?

Changes pushed to the master branch on Apache cannot be removed; that is, we can’t force-push to it. So please don’t add any test commits or anything like that, only real patches.

无法删除推送到Apache主分支的更改;也就是说,我们不能强迫它。所以请不要添加任何测试提交或类似的东西,只有真正的补丁。

All merges should be done using the dev/merge_spark_pr.py script, which squashes the pull request’s changes into one commit. To use this script, you will need to add a git remote called “apache” at https://git-wip-us.apache.org/repos/asf/spark.git, as well as one called “apache-github” at git://github.com/apache/spark. For the apache repo, you can authenticate using your ASF username and password. Ask Patrick if you have trouble with this or want help doing your first merge.

所有合并都应该使用dev/merge_spark_pr。py脚本,它将pull请求的更改压缩为一个提交。要使用这个脚本,您需要在https://git-wip-us.apache.org/repos/asf/spark.git中添加一个名为“apache”的git远程服务器,并在git://github.com/apache/spark上添加一个名为“apache-github”的文件。对于apache repo,可以使用ASF用户名和密码进行身份验证。问问帕特里克,你是否有问题,或者希望帮助你做第一次合并。

The script is fairly self explanatory and walks you through steps and options interactively.

该脚本是相当自解释的,并通过交互式的步骤和选项引导您。

If you want to amend a commit before merging – which should be used for trivial touch-ups – then simply let the script wait at the point where it asks you if you want to push to Apache. Then, in a separate window, modify the code and push a commit. Run git rebase -i HEAD~2 and “squash” your new commit. Edit the commit message just after to remove your commit message. You can verify the result is one change with git log. Then resume the script in the other window.

如果您想在合并之前修改一个提交——这应该用于琐碎的触摸——那么只需让脚本在它询问您是否想要推动Apache的时候等待。然后,在一个单独的窗口中,修改代码并推动提交。运行git rebase -i头~2和“挤压”您的新提交。在删除提交消息之后,编辑提交消息。您可以通过git日志来验证结果是一个更改。然后在另一个窗口中恢复脚本。

Also, please remember to set Assignee on JIRAs where applicable when they are resolved. The script can’t do this automatically. Once a PR is merged please leave a comment on the PR stating which branch(es) it has been merged with.

另外,请记得在JIRAs上设置一个受让人,当他们被解决的时候。这个脚本不能自动执行。一旦一个PR被合并,请留下评论,说明它已经被合并的分支(es)。

Policy on Backporting Bug Fixes

反向移植错误修复的策略。

From pwendell:

从pwendell:

The trade off when backporting is you get to deliver the fix to people running older versions (great!), but you risk introducing new or even worse bugs in maintenance releases (bad!). The decision point is when you have a bug fix and it’s not clear whether it is worth backporting.

当反向移植时,您可以将补丁交付给运行旧版本的人(很好!),但是您可能会在维护版本中引入新的甚至更糟糕的bug(糟糕!)决策点是当您有一个bug修复时,还不清楚它是否值得回溯。

I think the following facets are important to consider:

我认为以下几个方面很重要:

  • Backports are an extremely valuable service to the community and should be considered for any bug fix.
  • 对于社区来说,Backports是一项非常有价值的服务,应该考虑任何bug修复。
  • Introducing a new bug in a maintenance release must be avoided at all costs. It over time would erode confidence in our release process.
  • 在维护版本中引入新的错误必须不惜一切代价避免。随着时间的推移,它会侵蚀人们对我们发布过程的信心。
  • Distributions or advanced users can always backport risky patches on their own, if they see fit.
  • 发行版或高级用户如果认为合适的话,可以自己支持风险补丁。

For me, the consequence of these is that we should backport in the following situations:

对我来说,这些情况的后果是,我们应该在以下情况下:

  • Both the bug and the fix are well understood and isolated. Code being modified is well tested.
  • bug和修复都很好理解和隔离。正在修改的代码进行了很好的测试。
  • The bug being addressed is high priority to the community.
  • 正在处理的bug是社区的首要任务。
  • The backported fix does not vary widely from the master branch fix.
  • 在主分支修复中,后移植修复不会有很大的变化。

We tend to avoid backports in the converse situations:

在相反的情况下,我们倾向于避免背靠背:

  • The bug or fix are not well understood. For instance, it relates to interactions between complex components or third party libraries (e.g. Hadoop libraries). The code is not well tested outside of the immediate bug being fixed.
  • 错误或修复没有被很好地理解。例如,它涉及复杂组件或第三方库之间的交互(例如Hadoop库)。除了正在修复的直接错误外,代码还没有经过良好的测试。
  • The bug is not clearly a high priority for the community.
  • 对于社区来说,这个bug并不是很重要。
  • The backported fix is widely different from the master branch fix.
  • backport修复与主分支修复有很大的不同。