Lightning-fast unified analytics engine

闪电般的统一分析引擎

This guide documents the best way to make various types of contribution to Apache Spark, including what is required before submitting a code change.

本指南提供了为Apache Spark提供各种类型的贡献的最佳方法,包括在提交代码更改之前所需要的内容。

Contributing to Spark doesn’t just mean writing code. Helping new users on the mailing list, testing releases, and improving documentation are also welcome. In fact, proposing significant code changes usually requires first gaining experience and credibility within the community by helping in other ways. This is also a guide to becoming an effective contributor.

对Spark的贡献不仅仅意味着编写代码。也欢迎在邮件列表、测试版本和改进文档方面帮助新用户。事实上,提出重大的代码变更通常需要在社区中通过其他方式获得经验和信誉。这也是成为一个有效贡献者的指南。

So, this guide organizes contributions in order that they should probably be considered by new contributors who intend to get involved long-term. Build some track record of helping others, rather than just open pull requests.

因此,这个指南组织了一些贡献,以便他们应该被那些打算长期参与的新参与者考虑。建立一些帮助他人的记录,而不是仅仅打开拉请求。

Contributing by Helping Other Users

帮助其他用户。

A great way to contribute to Spark is to help answer user questions on the user@spark.apache.org mailing list or on StackOverflow. There are always many new Spark users; taking a few minutes to help answer a question is a very valuable community service.

帮助Spark的一个好方法是帮助回答user@spark.apache.org邮件列表或StackOverflow上的用户问题。总是有许多新的Spark用户;花几分钟时间来回答一个问题是非常有价值的社区服务。

Contributors should subscribe to this list and follow it in order to keep up to date on what’s happening in Spark. Answering questions is an excellent and visible way to help the community, which also demonstrates your expertise.

贡献者应该订阅这个列表并遵循它,以跟上在Spark中发生的事情的最新进展。回答问题是帮助社区的一种很好的很明显的方式,这也展示了你的专业知识。

See the Mailing Lists guide for guidelines about how to effectively participate in discussions on the mailing list, as well as forums like StackOverflow.

查看邮件列表指南,指导如何有效地参与邮件列表的讨论,以及像StackOverflow这样的论坛。

Contributing by Testing Releases

为测试版本

Spark’s release process is community-oriented, and members of the community can vote on new releases on the dev@spark.apache.org mailing list. Spark users are invited to subscribe to this list to receive announcements, and test their workloads on newer release and provide feedback on any performance or correctness issues found in the newer release.

Spark的发布过程是面向社区的,社区成员可以在dev@spark.apache.org邮件列表上对新版本进行投票。Spark用户被邀请订阅此列表以接收公告,并在更新版本中测试其工作负载,并对新版本中发现的任何性能或正确性问题提供反馈。

Contributing by Reviewing Changes

通过回顾修改贡献

Changes to Spark source code are proposed, reviewed and committed via Github pull requests (described later). Anyone can view and comment on active changes here. Reviewing others’ changes is a good way to learn how the change process works and gain exposure to activity in various parts of the code. You can help by reviewing the changes and asking questions or pointing out issues – as simple as typos or small issues of style. See also https://spark-prs.appspot.com/ for a convenient way to view and filter open PRs.

通过Github pull请求(稍后描述),建议修改Spark源代码,并对其进行审查和提交。任何人都可以查看和评论这里的动态变化。查看其他人的更改是了解更改过程如何工作并在代码的各个部分中获得活动的好方法。你可以通过回顾这些变化、提出问题或指出问题来帮助你——就像拼写错误或者小问题一样简单。请参阅https://spark-prs.appspot.com/,以方便地查看和筛选打开的PRs。

Contributing Documentation Changes

文档修改贡献

To propose a change to release documentation (that is, docs that appear under https://spark.apache.org/docs/), edit the Markdown source files in Spark’s docs/ directory, whose README file shows how to build the documentation locally to test your changes. The process to propose a doc change is otherwise the same as the process for proposing code changes below.

要建议更改发布文档(即出现在https://spark.apache.org/docs/)中的文档,在Spark的docs/目录中编辑Markdown源文件,其README文件显示了如何在本地构建文档来测试您的更改。提出doc更改的过程与提出代码更改的过程是一样的。

To propose a change to the rest of the documentation (that is, docs that do not appear under https://spark.apache.org/docs/), similarly, edit the Markdown in the spark-website repository and open a pull request.

要建议对文档的其余部分进行更改(也就是说,不出现在https://spark.apache.org/docs/)的文档,类似地,在spark网站存储库中编辑Markdown,并打开一个pull请求。

Contributing User Libraries to Spark

贡献用户库来激发。

Just as Java and Scala applications can access a huge selection of libraries and utilities, none of which are part of Java or Scala themselves, Spark aims to support a rich ecosystem of libraries. Many new useful utilities or features belong outside of Spark rather than in the core. For example: language support probably has to be a part of core Spark, but, useful machine learning algorithms can happily exist outside of MLlib.

正如Java和Scala应用程序可以访问大量的库和实用程序,它们都不是Java或Scala本身的一部分,Spark的目标是支持丰富的库生态系统。许多新的有用的实用工具或特性属于Spark,而不是核心。例如:语言支持可能是核心火花的一部分,但是,有用的机器学习算法可以在MLlib之外愉快地存在。

To that end, large and independent new functionality is often rejected for inclusion in Spark itself, but, can and should be hosted as a separate project and repository, and included in the spark-packages.org collection.

为此目的,大型独立的新功能常常被拒绝用于在Spark本身中包含,但是,可以并且应该作为一个单独的项目和存储库来托管,并包含在Spark -packages.org集合中。

Contributing Bug Reports

造成错误报告

Ideally, bug reports are accompanied by a proposed code change to fix the bug. This isn’t always possible, as those who discover a bug may not have the experience to fix it. A bug may be reported by creating a JIRA but without creating a pull request (see below).

理想情况下,bug报告伴随着修改代码来修复bug。这并不总是可能的,因为那些发现bug的人可能没有经验来修复它。可以通过创建JIRA来报告bug,但不需要创建一个pull请求(见下文)。

Bug reports are only useful however if they include enough information to understand, isolate and ideally reproduce the bug. Simply encountering an error does not mean a bug should be reported; as below, search JIRA and search and inquire on the Spark user / dev mailing lists first. Unreproducible bugs, or simple error reports, may be closed.

Bug报告只是有用的,但是如果它们包含足够的信息来理解、隔离和理想地复制Bug。仅仅遇到错误并不意味着应该报告错误;下面,搜索JIRA和搜索并询问Spark用户/ dev邮件列表。不可复制的bug或简单的错误报告可能会被关闭。

It is possible to propose new features as well. These are generally not helpful unless accompanied by detail, such as a design document and/or code change. Large new contributions should consider spark-packages.org first (see above), or be discussed on the mailing list first. Feature requests may be rejected, or closed after a long period of inactivity.

也有可能提出新的功能。除非有详细的说明,例如设计文档和/或代码更改,否则这些通常是没有帮助的。大型的新贡献应该首先考虑spark-packages.org(见上面),或者先在邮件列表中进行讨论。特性请求可能会被拒绝,或者在长时间的不活动之后关闭。

Contributing to JIRA Maintenance

导致JIRA维护

Given the sheer volume of issues raised in the Apache Spark JIRA, inevitably some issues are duplicates, or become obsolete and eventually fixed otherwise, or can’t be reproduced, or could benefit from more detail, and so on. It’s useful to help identify these issues and resolve them, either by advancing the discussion or even resolving the JIRA. Most contributors are able to directly resolve JIRAs. Use judgment in determining whether you are quite confident the issue should be resolved, although changes can be easily undone. If in doubt, just leave a comment on the JIRA.

考虑到Apache Spark JIRA中所提出的大量问题,不可避免地有些问题是重复的,或者变得过时,或者最终被固定,或者不能被复制,或者从更详细的信息中获益,等等。通过推进讨论,甚至解决JIRA,帮助确定这些问题并解决它们是很有用的。大多数贡献者能够直接解决JIRAs。用判断来决定你是否有信心这个问题应该得到解决,尽管改变很容易就可以解决。如果有疑问,请在JIRA上留下评论。

When resolving JIRAs, observe a few useful conventions:

在解决JIRAs时,观察一些有用的约定:

  • Resolve as Fixed if there’s a change you can point to that resolved the issue
    • Set Fix Version(s), if and only if the resolution is Fixed
    • Set Assignee to the person who most contributed to the resolution, which is usually the person who opened the PR that resolved the issue.
    • In case several people contributed, prefer to assign to the more ‘junior’, non-committer contributor
  • For issues that can’t be reproduced against master as reported, resolve as Cannot Reproduce
    • Fixed is reasonable too, if it’s clear what other previous pull request resolved it. Link to it.
  • If the issue is the same as or a subset of another issue, resolved as Duplicate
    • Make sure to link to the JIRA it duplicates
    • Prefer to resolve the issue that has less activity or discussion as the duplicate
  • If the issue seems clearly obsolete and applies to issues or components that have changed radically since it was opened, resolve as Not a Problem
  • If the issue doesn’t make sense – not actionable, for example, a non-Spark issue, resolve as Invalid
  • If it’s a coherent issue, but there is a clear indication that there is not support or interest in acting on it, then resolve as Won’t Fix
  • Umbrellas are frequently marked Done if they are just container issues that don’t correspond to an actionable change of their own

Preparing to Contribute Code Changes

准备修改代码。

Choosing What to Contribute

选择什么做出贡献

Spark is an exceptionally busy project, with a new JIRA or pull request every few hours on average. Review can take hours or days of committer time. Everyone benefits if contributors focus on changes that are useful, clear, easy to evaluate, and already pass basic checks.

Spark是一个异常繁忙的项目,平均每隔几个小时就会有一个新的JIRA或pull请求。审查可能需要几个小时或几天的提交时间。如果贡献者关注的是有用的、清晰的、易于评估的,并且已经通过了基本的检查,那么每个人都会受益。

Sometimes, a contributor will already have a particular new change or bug in mind. If seeking ideas, consult the list of starter tasks in JIRA, or ask the user@spark.apache.org mailing list.

有时候,一个贡献者已经有了一个特别的新变化或bug。如果您想要了解一些想法,请参考JIRA中的starter任务列表,或者询问user@spark.apache.org邮件列表。

Before proceeding, contributors should evaluate if the proposed change is likely to be relevant, new and actionable:

在进行之前,投稿者应评估提议的变更是否有可能是相关的、新的和可执行的:

  • Is it clear that code must change? Proposing a JIRA and pull request is appropriate only when a clear problem or change has been identified. If simply having trouble using Spark, use the mailing lists first, rather than consider filing a JIRA or proposing a change. When in doubt, email user@spark.apache.org first about the possible change
  • Search the user@spark.apache.org and dev@spark.apache.org mailing list archives for related discussions. Use search-hadoop.com or similar search tools. Often, the problem has been discussed before, with a resolution that doesn’t require a code change, or recording what kinds of changes will not be accepted as a resolution.
  • Search JIRA for existing issues: https://issues.apache.org/jira/browse/SPARK
  • Type spark [search terms] at the top right search box. If a logically similar issue already exists, then contribute to the discussion on the existing JIRA and pull request first, instead of creating a new one.
  • Is the scope of the change matched to the contributor’s level of experience? Anyone is qualified to suggest a typo fix, but refactoring core scheduling logic requires much more understanding of Spark. Some changes require building up experience first (see above).

MLlib-specific Contribution Guidelines

MLlib-specific贡献指南

While a rich set of algorithms is an important goal for MLLib, scaling the project requires that maintainability, consistency, and code quality come first. New algorithms should:

虽然丰富的算法是MLLib的重要目标,但是扩展项目需要维护、一致性和代码质量。新算法:

  • Be widely known
  • Be used and accepted (academic citations and concrete use cases can help justify this)
  • Be highly scalable
  • Be well documented
  • Have APIs consistent with other algorithms in MLLib that accomplish the same thing
  • Come with a reasonable expectation of developer support.
  • Have @Since annotation on public classes, methods, and variables.

Code Review Criteria

代码评审标准

Before considering how to contribute code, it’s useful to understand how code is reviewed, and why changes may be rejected. Simply put, changes that have many or large positives, and few negative effects or risks, are much more likely to be merged, and merged quickly. Risky and less valuable changes are very unlikely to be merged, and may be rejected outright rather than receive iterations of review.

在考虑如何贡献代码之前,了解代码是如何被审查的,以及为什么变更可能被拒绝是很有用的。简单地说,有许多或大的积极因素,很少有负面影响或风险的变化,更有可能被合并,并迅速合并。风险和不太有价值的变更不太可能被合并,并且可能被直接拒绝,而不是接受反复的审查。

Positives

阳性

  • Fixes the root cause of a bug in existing functionality
  • Adds functionality or fixes a problem needed by a large number of users
  • Simple, targeted
  • Maintains or improves consistency across Python, Java, Scala
  • Easily tested; has tests
  • Reduces complexity and lines of code
  • Change has already been discussed and is known to committers

Negatives, Risks

底片、风险

  • Band-aids a symptom of a bug only
  • Introduces complex new functionality, especially an API that needs to be supported
  • Adds complexity that only helps a niche use case
  • Adds user-space functionality that does not need to be maintained in Spark, but could be hosted externally and indexed by spark-packages.org
  • Changes a public API or semantics (rarely allowed)
  • Adds large dependencies
  • Changes versions of existing dependencies
  • Adds a large amount of code
  • Makes lots of modifications in one “big bang” change

Contributing Code Changes

贡献代码更改

Please review the preceding section before proposing a code change. This section documents how to do so.

在提出代码更改之前,请查看前面的部分。本节将介绍如何这样做。

When you contribute code, you affirm that the contribution is your original work and that you license the work to the project under the project’s open source license. Whether or not you state this explicitly, by submitting any copyrighted material via pull request, email, or other means you agree to license the material under the project’s open source license and warrant that you have the legal authority to do so.

当您贡献代码时,您确认贡献是您的原始工作,并且您在项目的开源许可下将工作授权给项目。无论你是否明确声明,通过拉请求、电子邮件或其他方式提交任何受版权保护的材料,你同意在项目的开源许可下许可该材料,并保证你有合法的权限这样做。

JIRA

JIRA

Generally, Spark uses JIRA to track logical issues, including bugs and improvements, and uses Github pull requests to manage the review and merge of specific code changes. That is, JIRAs are used to describe what should be fixed or changed, and high-level approaches, and pull requests describe how to implement that change in the project’s source code. For example, major design decisions are discussed in JIRA.

通常,Spark使用JIRA来跟踪逻辑问题,包括bug和改进,并使用Github pull请求来管理对特定代码更改的评审和合并。也就是说,JIRAs被用来描述什么应该是固定的或改变的,以及高级的方法,以及pull请求描述如何在项目的源代码中实现变更。例如,JIRA中讨论了主要的设计决策。

  1. Find the existing Spark JIRA that the change pertains to.
    1. Do not create a new JIRA if creating a change to address an existing issue in JIRA; add to the existing discussion and work instead
    2. Look for existing pull requests that are linked from the JIRA, to understand if someone is already working on the JIRA
  2. If the change is new, then it usually needs a new JIRA. However, trivial changes, where the what should change is virtually the same as the how it should change do not require a JIRA. Example: Fix typos in Foo scaladoc
  3. If required, create a new JIRA:
    1. Provide a descriptive Title. “Update web UI” or “Problem in scheduler” is not sufficient. “Kafka Streaming support fails to handle empty queue in YARN cluster mode” is good.
    2. Write a detailed Description. For bug reports, this should ideally include a short reproduction of the problem. For new features, it may include a design document.
    3. Set required fields:
      1. Issue Type. Generally, Bug, Improvement and New Feature are the only types used in Spark.
      2. Priority. Set to Major or below; higher priorities are generally reserved for committers to set. JIRA tends to unfortunately conflate “size” and “importance” in its Priority field values. Their meaning is roughly:
        1. Blocker: pointless to release without this change as the release would be unusable to a large minority of users
        2. Critical: a large minority of users are missing important functionality without this, and/or a workaround is difficult
        3. Major: a small minority of users are missing important functionality without this, and there is a workaround
        4. Minor: a niche use case is missing some support, but it does not affect usage or is easily worked around
        5. Trivial: a nice-to-have change but unlikely to be any problem in practice otherwise
      3. Component
      4. Affects Version. For Bugs, assign at least one version that is known to exhibit the problem or need the change
    4. Do not set the following fields:
      1. Fix Version. This is assigned by committers only when resolved.
      2. Target Version. This is assigned by committers to indicate a PR has been accepted for possible fix by the target version.
    5. Do not include a patch file; pull requests are used to propose the actual change.
  4. If the change is a large change, consider inviting discussion on the issue at dev@spark.apache.org first before proceeding to implement the change.

Pull Request

把请求

  1. Fork the Github repository at https://github.com/apache/spark if you haven’t already
  2. Clone your fork, create a new branch, push commits to the branch.
  3. Consider whether documentation or tests need to be added or updated as part of the change, and add them as needed.
  4. Run all tests with ./dev/run-tests to verify that the code still compiles, passes tests, and passes style checks. If style checks fail, review the Code Style Guide below.
  5. Open a pull request against the master branch of apache/spark. (Only in special cases would the PR be opened against other branches.)
    1. The PR title should be of the form [SPARK-xxxx][COMPONENT] Title, where SPARK-xxxx is the relevant JIRA number, COMPONENT is one of the PR categories shown at spark-prs.appspot.com and Title may be the JIRA’s title or a more specific title describing the PR itself.
    2. If the pull request is still a work in progress, and so is not ready to be merged, but needs to be pushed to Github to facilitate review, then add [WIP] after the component.
    3. Consider identifying committers or other contributors who have worked on the code being changed. Find the file(s) in Github and click “Blame” to see a line-by-line annotation of who changed the code last. You can add @username in the PR description to ping them immediately.
    4. Please state that the contribution is your original work and that you license the work to the project under the project’s open source license.
  6. The related JIRA, if any, will be marked as “In Progress” and your pull request will automatically be linked to it. There is no need to be the Assignee of the JIRA to work on it, though you are welcome to comment that you have begun work.
  7. The Jenkins automatic pull request builder will test your changes
    1. If it is your first contribution, Jenkins will wait for confirmation before building your code and post “Can one of the admins verify this patch?”
    2. A committer can authorize testing with a comment like “ok to test”
    3. A committer can automatically allow future pull requests from a contributor to be tested with a comment like “Jenkins, add to whitelist”
  8. After about 2 hours, Jenkins will post the results of the test to the pull request, along with a link to the full results on Jenkins.
  9. Watch for the results, and investigate and fix failures promptly
    1. Fixes can simply be pushed to the same branch from which you opened your pull request
    2. Jenkins will automatically re-test when new commits are pushed
    3. If the tests failed for reasons unrelated to the change (e.g. Jenkins outage), then a committer can request a re-test with “Jenkins, retest this please”. Ask if you need a test restarted. If you were added by “Jenkins, add to whitelist” from a committer before, you can also request the re-test.
  10. If there is a change related to SparkR in your pull request, AppVeyor will be triggered automatically to test SparkR on Windows, which takes roughly an hour. Similarly to the steps above, fix failures and push new commits which will request the re-test in AppVeyor.

The Review Process

评审过程

  • Other reviewers, including committers, may comment on the changes and suggest modifications. Changes can be added by simply pushing more commits to the same branch.
  • Lively, polite, rapid technical debate is encouraged from everyone in the community. The outcome may be a rejection of the entire change.
  • Reviewers can indicate that a change looks suitable for merging with a comment such as: “I think this patch looks good”. Spark uses the LGTM convention for indicating the strongest level of technical sign-off on a patch: simply comment with the word “LGTM”. It specifically means: “I’ve looked at this thoroughly and take as much ownership as if I wrote the patch myself”. If you comment LGTM you will be expected to help with bugs or follow-up issues on the patch. Consistent, judicious use of LGTMs is a great way to gain credibility as a reviewer with the broader community.
  • Sometimes, other changes will be merged which conflict with your pull request’s changes. The PR can’t be merged until the conflict is resolved. This can be resolved by, for example, adding a remote to keep up with upstream changes by git remote add upstream https://github.com/apache/spark.git, running git fetch upstream followed by git rebase upstream/master and resolving the conflicts by hand, then pushing the result to your branch.
  • Try to be responsive to the discussion rather than let days pass between replies

Closing Your Pull Request / JIRA

关闭您的Pull请求/ JIRA。

  • If a change is accepted, it will be merged and the pull request will automatically be closed, along with the associated JIRA if any
    • Note that in the rare case you are asked to open a pull request against a branch besides master, that you will actually have to close the pull request manually
    • The JIRA will be Assigned to the primary contributor to the change as a way of giving credit. If the JIRA isn’t closed and/or Assigned promptly, comment on the JIRA.
  • If your pull request is ultimately rejected, please close it promptly
    • … because committers can’t close PRs directly
    • Pull requests will be automatically closed by an automated process at Apache after about a week if a committer has made a comment like “mind closing this PR?” This means that the committer is specifically requesting that it be closed.
  • If a pull request has gotten little or no attention, consider improving the description or the change itself and ping likely reviewers again after a few days. Consider proposing a change that’s easier to include, like a smaller and/or less invasive change.
  • If it has been reviewed but not taken up after weeks, after soliciting review from the most relevant reviewers, or, has met with neutral reactions, the outcome may be considered a “soft no”. It is helpful to withdraw and close the PR in this case.
  • If a pull request is closed because it is deemed not the right approach to resolve a JIRA, then leave the JIRA open. However if the review makes it clear that the issue identified in the JIRA is not going to be resolved by any pull request (not a problem, won’t fix) then also resolve the JIRA.

Code Style Guide

代码风格指南

Please follow the style of the existing codebase.

请遵循现有代码库的风格。

  • For Python code, Apache Spark follows PEP 8 with one exception: lines can be up to 100 characters in length, not 79.
  • For R code, Apache Spark follows Google’s R Style Guide with three exceptions: lines can be up to 100 characters in length, not 80, there is no limit on function name but it has a initial lower case latter and S4 objects/methods are allowed.
  • For Java code, Apache Spark follows Oracle’s Java code conventions. Many Scala guidelines below also apply to Java.
  • For Scala code, Apache Spark follows the official Scala style guide, but with the following changes, below.

Line Length

线的长度

Limit lines to 100 characters. The only exceptions are import statements (although even for those, try to keep them under 100 chars).

将行限制为100个字符。唯一的例外是import语句(尽管对于那些语句,尝试将它们保持在100字符以下)。

Indentation

缩进

Use 2-space indentation in general. For function declarations, use 4 space indentation for its parameters when they don’t fit in a single line. For example:

一般使用2-空格缩进。对于函数声明,当它们不适合单独一行时,请使用4空间缩进。例如:

// Correct:
if (true) {
  println("Wow!")
}
 
// Wrong:
if (true) {
    println("Wow!")
}
 
// Correct:
def newAPIHadoopFile[K, V, F <: NewInputFormat[K, V]](
    path: String,
    fClass: Class[F],
    kClass: Class[K],
    vClass: Class[V],
    conf: Configuration = hadoopConfiguration): RDD[(K, V)] = {
  // function body
}
 
// Wrong
def newAPIHadoopFile[K, V, F <: NewInputFormat[K, V]](
  path: String,
  fClass: Class[F],
  kClass: Class[K],
  vClass: Class[V],
  conf: Configuration = hadoopConfiguration): RDD[(K, V)] = {
  // function body
}

Code documentation style

代码文档风格

For Scala doc / Java doc comment before classes, objects and methods, use Java docs style instead of Scala docs style.

在类、对象和方法之前的Scala doc / Java doc注释中,使用Java文档风格而不是Scala文档风格。

/** This is a correct one-liner, short description. */
 
/**
 * This is correct multi-line JavaDoc comment. And
 * this is my second line, and if I keep typing, this would be
 * my third line.
 */
 
/** In Spark, we don't use the ScalaDoc style so this
  * is not correct.
  */

For inline comment with the code, use // and not /* .. */.

对于内联注释与代码,使用//而不是/* ..* /。

// This is a short, single line comment
 
// This is a multi line comment.
// Bla bla bla
 
/*
 * Do not use this style for multi line comments. This
 * style of comment interferes with commenting out
 * blocks of code, and also makes code comments harder
 * to distinguish from Scala doc / Java doc comments.
 */
 
/**
 * Do not use scala doc style for inline comments.
 */

Imports

进口

Always import packages using absolute paths (e.g. scala.util.Random) instead of relative ones (e.g. util.Random). In addition, sort imports in the following order (use alphabetical order within each group):

总是使用绝对路径(例如scala.util.Random)导入包,而不是使用相对路径(例如util.Random)。另外,按照以下顺序对进口进行排序(在每组中使用字母顺序):

  • java.* and javax.*
  • scala.*
  • Third-party libraries (org.*, com.*, etc)
  • Project classes (org.apache.spark.*)

The IntelliJ import organizer plugin can organize imports for you. Use this configuration for the plugin (configured under Preferences / Editor / Code Style / Scala Imports Organizer):

IntelliJ import organizer插件可以为您组织导入。为插件使用此配置(配置在首选项/编辑器/代码样式/ Scala导入组织者):

import java.*
import javax.*
 
import scala.*
 
import *
 
import org.apache.spark.*

Infix Methods

中缀方法

Don’t use infix notation for methods that aren’t operators. For example, instead of list map func, use list.map(func), or instead of string contains "foo", use string.contains("foo"). This is to improve familiarity to developers coming from other languages.

不要为那些不是操作符的方法使用中缀表示法。例如,代替list map func,使用list.map(func),或者代替字符串包含“foo”,使用string.contains(“foo”)。这是为了提高对来自其他语言的开发人员的熟悉程度。

Curly Braces

花括号

Put curly braces even around one-line if, else or loop statements. The only exception is if you are using if/else as an one-line ternary operator.

如果,else或loop语句,将花括号括在一行中。唯一的例外是如果您使用if/else作为单行三元操作符。

// Correct:
if (true) {
  println("Wow!")
}
 
// Correct:
if (true) statement1 else statement2
 
// Wrong:
if (true)
  println("Wow!")

Return Types

返回类型

Always specify the return types of methods where possible. If a method has no return type, specify Unit instead in accordance with the Scala style guide. Return types for variables are not required unless the definition involves huge code blocks with potentially ambiguous return values.

在可能的情况下,总是指定方法的返回类型。如果方法没有返回类型,则按照Scala风格指南指定单元。除非定义涉及具有潜在模糊返回值的巨大代码块,否则不需要返回变量类型。

// Correct:
def getSize(partitionId: String): Long = { ... }
def compute(partitionId: String): Unit = { ... }
 
// Wrong:
def getSize(partitionId: String) = { ... }
def compute(partitionId: String) = { ... }
def compute(partitionId: String) { ... }
 
// Correct:
val name = "black-sheep"
val path: Option[String] =
  try {
    Option(names)
      .map { ns => ns.split(",") }
      .flatMap { ns => ns.filter(_.nonEmpty).headOption }
      .map { n => "prefix" + n + "suffix" }
      .flatMap { n => if (n.hashCode % 3 == 0) Some(n + n) else None }
  } catch {
    case e: SomeSpecialException =>
      computePath(names)
  }

If in Doubt

如果有疑问

If you’re not sure about the right style for something, try to follow the style of the existing codebase. Look at whether there are other examples in the code that use your feature. Feel free to ask on the dev@spark.apache.org list as well.

如果你不确定某样东西的正确风格,试着遵循现有代码库的风格。看看在使用您的特性的代码中是否还有其他的例子。也可以在dev@spark.apache.org列表上询问。